CloudIngest · 1 day ago
Data Engineer – GMP
CloudIngest is seeking a Data Engineer to design, build, and maintain data pipelines for their analytics layer. The role requires collaboration with a Power BI Developer to ensure data accessibility and proper structure for reporting. Experience with GMP is essential for this position.
Responsibilities
Following EDB standard, design and establish AWS S3 bucket structure for data lake (Bronze/Silver/Gold zones) with Red CCI security controls
Build new data pipelines to extract data from the selected SaaS-based HSE systems
Enhance/extend data pipelines using the dataset from AWS EDB Asset related data products and digital solutions
MES selection to be finalized by the end of Jan 2027 - Build data ingestion pipeline for selected MES via REST API (Lambda-based extraction). If a different MES is chosen, change course accordingly
Data pipeline to SaaS based HSE tools to integrate data into the data warehouse
Implement CDC pipeline for LabVantage LIMS using AWS DMS from Oracle database (or Azure Data Factory)
Develop Bronze-to-Silver transformations using AWS Glue or Azure Data Factory depending on data domains
Configure AWS Glue Data Catalog with appropriate metadata and Red CCI classification tags
Connect to EDB marketplace for enterprise reference data
Build Silver-to-Gold transformations creating batch-centric data products
Implement data quality checks and monitoring dashboards
Develop orchestration workflows using AWS Step Functions
Support Power BI Developer in validating OneLake shortcut connectivity to Gold zone
Document data lineage, schema definitions, and pipeline architecture
Monitor pipeline health, troubleshoot failures, and optimize performance
Collaborate with Power BI Developer on data model requirements and data quality issues
Support data governance and security compliance reviews
Respond to ad-hoc data requests from engineers
Coordinate with enterprise EDB team on data sharing agreements and standards
Qualification
Required
GMP experience is a must
Design and establish AWS S3 bucket structure for data lake (Bronze/Silver/Gold zones) with Red CCI security controls
Build new data pipelines to extract data from the selected SaaS-based HSE systems
Enhance/extend data pipelines using the dataset from AWS EDB Asset related data products and digital solutions
Build data ingestion pipeline for selected MES via REST API (Lambda-based extraction)
Data pipeline to SaaS based HSE tools to integrate data into the data warehouse
Implement CDC pipeline for LabVantage LIMS using AWS DMS from Oracle database (or Azure Data Factory)
Develop Bronze-to-Silver transformations using AWS Glue or Azure Data Factory depending on data domains
Configure AWS Glue Data Catalog with appropriate metadata and Red CCI classification tags
Connect to EDB marketplace for enterprise reference data
Build Silver-to-Gold transformations creating batch-centric data products
Implement data quality checks and monitoring dashboards
Develop orchestration workflows using AWS Step Functions
Support Power BI Developer in validating OneLake shortcut connectivity to Gold zone
Document data lineage, schema definitions, and pipeline architecture
Monitor pipeline health, troubleshoot failures, and optimize performance
Collaborate with Power BI Developer on data model requirements and data quality issues
Support data governance and security compliance reviews
Respond to ad-hoc data requests from engineers
Coordinate with enterprise EDB team on data sharing agreements and standards