Job Description:
- Maintain, optimize, and expand ETL data pipeline infrastructure (Python and Scala based) for processing terrabyte-sized datasets using big data processing technologies like Apache Spark.
- Provision dynamic server infrastructure in AWS cloud using technologies including Kubernetes, Docker, AWS EC2, and AWS Batch, and AWS Lambda.
- Develop and maintain data lake solutions for the data warehouse and support rich visualization using D3 and custom JavaScript charts for the Analytics platform.
- Architect and support a scalable, robust, and secure web application backend built using Python and NodeJS.
- Create and optimize a secure client-facing REST API with enterprise-grade SLA.
- Implement and Deploy code using CI/CD pipelines with Terraform, Ansible, Docker, Git, and Shell Scripting.
- Collaborate with the Data Science teams to implement data products and automate training of AI/ML models using Scikit-learn, Pandas, Numpy, and PyTorch.
- Build analytic tools that utilize the data pipeline to provide actionable insights into customer analytics, operational efficiency, and other key business performance metrics.
- Implement and adhere to industry security processes, standards, and best practices to ensure SOC2 compliance.
- Possess real-world experience and working knowledge of various data stores including Postgres, DynamoDB, Redis, and Elasticsearch.
May Telecommute.