Data Engineer

Artech - Henrico, VA

Apply Now

Job Description

Job Title: Data Engineer - Spark & Real-Time Data Processing Location: Richmond, VA Duration: 6 MonthsRole Overview We are seeking an experienced Data Engineer with strong expertise in Apache Spark-based ETL pipelines and real-time data processing. The ideal candidate will design, build, and optimize scalable data platforms that support batch and streaming workloads, enabling analytics, reporting, and data-driven decision-making across the organization. Key Responsibilities Data Engineering & ETL Development Design, develop, and maintain Spark-based ETL pipelines for large-scale batch data processing. Build reusable, fault-tolerant data frameworks for ingesting, transforming, and loading structured and semi-structured data. Optimize Spark jobs for performance, scalability, and cost efficiency. Real-Time & Streaming Data Processing: Develop real-time and near real-time data pipelines using technologies such as Spark Structured Streaming, Kafka, or Kinesis. Process high-volume event streams with low latency and high reliability. Implement windowing, watermarking, and stateful stream processing patterns. Data Platforms & Storage: Integrate data from multiple sources including APIs, databases, logs, and message queues. Design and manage data storage solutions using data lakes and lakehouse architectures (S3, ADLS, HDFS, Delta Lake, Iceberg). Ensure data quality, consistency, and schema evolution across pipelines. Cloud & Infrastructure: Deploy and manage data pipelines on cloud platforms (AWS, Azure, or GCP). Work with managed Spark platforms such as Databricks, EMR, or Synapse. Implement CI/CD pipelines for data workflows using Git, Jenkins, or GitHub Actions. Monitoring, Reliability & Security Implement logging, monitoring, and alerting for batch and streaming pipelines. Troubleshoot data failures, performance bottlenecks, and production incidents. Ensure data security, access control, and compliance with enterprise standards. Collaboration & Documentation: Collaborate with data scientists, analysts, and product teams to understand data requirements. Document data models, pipelines, and operational procedures. Participate in code reviews and contribute to data engineering best practices. Required Skills & Qualifications Core Technical Skills: Strong experience with Apache Spark (Spark SQL, DataFrames, Structured Streaming) Proficiency in Python (PySpark) and/or Scala Experience building batch and streaming ETL pipelines Strong SQL skills for data transformation and analysis Streaming & Messaging: Hands-on experience with Kafka, Kinesis, Pub/Sub, or similar streaming platforms Understanding of event-driven architectures and stream processing concepts Data Storage & Formats: Experience with data lakes / lake house architectures Familiarity with Parquet, Avro, ORC, Delta Lake, Iceberg Cloud & DevOps: Experience with AWS, Azure, or GCP Knowledge of containerization and orchestration (Docker, Kubernetes - nice to have) Experience with workflow orchestration tools (Airflow, Dagster, or Prefect) Nice-to-Have Skills Experience with real-time analytics and low-latency systems Knowledge of CDC (Change Data Capture) tools such as Debezium Exposure to ML data pipelines and feature stores Experience working in high-volume, regulated environments

Created: 2026-03-04

➤

Login

Create Account

Data Engineer