What You Will Do
Define and extract data from multiple sources, integrate disparate data into a common data model, and integrate data into a target database, application, or file using efficient programming processes
Document, and test complex data systems that bring together data from disparate sources, making it available to data scientists, and other users using scripting and/or programming languages
Improve, deploy, and maintain models prepared by Data Scientists into production-grade cloud systems using scalable, efficient machine learning operations pipelines
Write and refine code to ensure performance and reliability of data extraction and processing
Lead requirements gathering sessions with business and technical staff to distill technical requirement from business requests
Develop advanced SQL queries to extract data for analysis and model construction
Own delivery of large, complex data and ML engineering projects
Design and develop scalable, efficient data pipeline processes to handle data ingestion, cleansing, transformation, integration, and validation required to provide access to prepared data sets to analysts and data scientists
Ensure performance and reliability of data processes
Document and test data processes including performance of through data validation and verification
Collaborate with cross functional team to resolve data quality and operational issues and ensure timely delivery of products
Develop and implement scripts for database and data process maintenance, monitoring, and performance tuning
Analyze and evaluate databases in order to identify and recommend improvements and optimization
Design advanced eye-catching visualizations to convey information to users
What You Bring
Bachelor's degree and 5 years of experience with Oracle, Data Warehouses and Data Lakes, Big Data platforms and programming in Python, R or other related language.
Understanding of ML Algorithms, experience creating & executing efficient MLOps pipelines, and tuning ML models
In lieu of degree, 7 years of the experience as stated above.
Hiring Preferences
Experience with Python analytics packages such as scikit-learn, NumPy, Pandas etc.
Exposure to clinical, healthcare, or insurance data
Experience working in AWS and/or using Linux based systems
Experience working with Amazon SageMaker
Expert in the deployment of ML models on big data platforms
Demonstrated experience performing data validation on prepared data sets
Experience in software engineering practices and software development methodologies
Ability to communicate clearly and effectively both orally and in writing
Ability to work independently
Reliable task estimation skills
Excellent quantitative, problem solving and analytic skills
Ability to document data discovery findings, feature selection process, model determination process, and interpretation of insights
Ability to collaborate effectively with business stakeholders, performance consultants, data scientists, and other data engineers
Ability to quickly become expert in operational processes and data for various lines of business
Ability to troubleshoot and document findings and recommendations
Comfortable briefing internal stakeholders on findings
Proactively communicate risks and problems to leadership
Ability to keep up with a rapidly evolving technology space
Salary Range
$100,500.00 - $182,700.00