Apply on Employer Site

The Evolvers Group · 23 hours ago

Enterprise Data Platform Engineer

Washington, DC

Contract

Onsite

Senior Level, Lead/Staff

10+ years exp

The Evolvers Group is seeking an Enterprise Data Platform Engineer to design, build, and operate data pipelines on the Enterprise Data Platform using Databricks and Apache Spark. This role involves collaborating with various teams to deliver secure and performant data pipelines, while ensuring data quality and operational efficiency.

ConsultingHealth CareInformation TechnologyProject ManagementSoftware

H1B Sponsor Likely

Responsibilities

Build and maintain end-to-end pipelines in Databricks using Spark (PySpark) for ingestion, transformation, and publication of curated datasets

Implement streaming / near-real-time patterns using Spark Structured Streaming (or equivalent), including state management, checkpointing, and recovery

Design incremental processing, partitioning strategies, and data layout/file sizing approaches to optimize performance and cost

Develop reusable pipeline components (common libraries, parameterized jobs, standardized patterns) to accelerate delivery across domains

Develop and operationalize workflows in Python and R for data preparation, analysis support, and research-ready extracts

Package code for repeatable execution (dependency management, environment reproducibility, job configuration)

Implement data quality controls for batch and streaming (schema enforcement, completeness/validity checks, late/duplicate event handling, reconciliation)

Build pipeline observability: logging, metrics, alerting, and dashboards; support on-call/ incident response and root-cause analysis

Create runbooks and operational procedures for critical pipelines and streaming services

Ensure secure handling of sensitive data and apply least-privilege principles in pipeline design and execution

Contribute lineage notes, dataset definitions, and operational documentation to support reuse and auditability

Use version control and CI/CD practices for notebooks/code (code reviews, automated testing where feasible, deployment/promotion across environments)

Collaborate with stakeholders to refine requirements, define SLAs, and deliver incrementally with measurable outcomes

Implement Lakeflow/Delta Live Tables (DLT) pipelines with data quality expectations, materialized views, and streaming tables; design pipeline DAGs and maintain declarative ETL workflows

Design and implement medallion architecture patterns (Bronze/Silver/Gold) with appropriate data quality gates, schema evolution strategies, and layer-specific optimization techniques (OPTIMIZE, VACUUM, Z-ordering/liquid clustering)

Develop and maintain comprehensive testing strategies including unit tests for transformation logic, integration tests for end-to-end pipelines, and data quality validation using frameworks like Great expectations or deequ

Perform data modeling and schema design for dimensional models, slowly changing dimensions (SCD), and analytical structures; collaborate on entity definitions and grain decisions

Contribute to Unity Catalog governance by registering datasets with metadata/descriptions/tags, implementing row/column-level security where required, and maintaining accurate lineage information

Qualification

DatabricksApache SparkPythonRSQLData Quality ChecksCI/CDData ModelingDelta LakeAWS Data ServicesObservabilityTechnical CommunicationProblem SolvingCollaborationDocumentation

Required

10+ years of data engineering experience, including production Spark-based batch pipelines and streaming implementations

Strong proficiency in Python and R for data engineering and analytical workflows

Hands-on experience with Databricks and Apache Spark, including Structured Streaming (watermarking, stateful processing concepts, checkpointing, exactly-once/at-least-once tradeoffs)

Strong SQL skills for transformation and validation

Experience building production-grade pipelines: idempotency, incremental loads, backfills, schema evolution, and error handling

Experience implementing data quality checks and validation for both batch and event streams (late arrivals, deduplication, event-time vs processing-time)

Observability skills: logging/metrics/alerting, troubleshooting, and performance tuning (partitions, joins/shuffles, caching, file sizing)

Proficiency with Git and CI/CD concepts for data pipelines, Databricks asset bundling, Databricks application deployments, and proficiency using Databricks CLI

Experience with lakehouse table formats and patterns (e.g., Delta tables) including compaction/optimization and lifecycle management

Familiarity with orchestration patterns (Databricks Workflows/Jobs) and dependency management

Experience with governance controls (catalog permissions, secure data access patterns, metadata/lineage expectations)

Knowledge of message/event platforms and streaming ingestion patterns (e.g., Kafka/Kinesis equivalents) and sink patterns for serving layers

Experience collaborating with research/analytics stakeholders and translating analytical needs into engineered data products

Strong problem-solving and debugging across ingestion transformation serving

Clear technical communication and documentation discipline

Ability to work across product/architecture/governance teams in a regulated environment

Deep Delta Lake expertise including time travel, Change Data Feed (CDF), MERGE operations, CLONE, table constraints, and optimization techniques; understanding of liquid clustering and table maintenance best practices

Experience with Lakeflow/Delta Live Tables (DLT) including expectations framework, materialized vs. streaming table patterns, and declarative pipeline design

Proficiency with testing frameworks (pytest, Great Expectations, deequ) and test-driven development practices for production data pipelines

Data modeling skills including dimensional modeling (star/snowflake schemas), medallion architecture implementation, and slowly changing dimension (SCD) pattern implementation

AWS data services experience including S3 optimization, IAM role configuration for data access, and CloudWatch integration; understanding of cost optimization patterns

Bachelor's degree in a related field or equivalent experience

Databricks Certified Apache Spark Developer Associate

Databricks Certified Data Engineer Associate or Professional

AWS Certified Developer Associate

AWS Certified Data Engineer Associate

AWS Certified Solution Architect Associate

Company

The Evolvers Group

The Evolvers Group is a management and technology consulting firm that believes in the power of the idea.

Founded in 1998

Flower Mound, Texas, USA

51-200 employees

https://evolversgroup.com

H1B Sponsorship

The Evolvers Group has a track record of offering H1B sponsorships. Please note that this does not guarantee sponsorship for this specific role. Below presents additional info for your reference. (Data Powered by US Department of Labor)

Distribution of Different Job Fields Receiving Sponsorship

Represents job field similar to this job

Trends of Total Sponsorships

2025 (19)

2024 (20)

2023 (39)

2022 (21)

2021 (23)

2020 (15)

Funding

Current Stage

Growth Stage

Company data provided by crunchbase