Data and Software Engineer
McLean, VA
$200K to $250K
TS/SCI and FSP
The Data & Software Engineer works with a small team to build complex data flows for a custom application. Successful candidate will have advanced Python programming skills, familiarity with Java, an understanding of data security, privacy, governance and compliance principles and a demonstrated history of building production data pipelines and ETL workflows at scale. Candidate must have experience:
· Building end-to-end data pipelines leveraging Python
Using orchestration tools to deploy data pipelines, including configuring and updating Spark Jobs
· Containerizing and deploying applications in cloud environments like AWS.
· Working with MySQL and PostgreSQL including performance tuning, schema design, and query optimization for complex, analytical workloads.
· Leveraging industry standard tools for code control (Git, IaaC control, etc.)
· Working with data catalogs, tracking data lineage and handling a variety of data formats, including Geospatial.
· Using Bash scripting for automation and data processing tasks
· Integrating Al/ML services and models
Responsibilities:
· Work with stakeholders to understand data requirements, assess feasibility, and design appropriate solutions with minimal oversight
· Leverage strong problem-solving and debugging skills for data quality issues, pipeline failures, and performance bottlenecks
· Leverage a background in large-scale data migration or platform modernization efforts
· Contribute to data engineering documentation, best practices, and design patterns.
Required Skills:
Minimum of 5 years' experience with:
· Apache Spark & PySpark
· Advanced Python skills (including Pandas & NumPy)
· Docker, Podman
· AWS S3, Lambda & Step functions
· Apache Iceberg, Airflow, etc.
· SQL (with Trino)
· NoSQL, DynamoDB
· Unity Catalog OSS, Apache Polaris
· Apache Superset
· Terraform or CloudFormation
· OpenLineage
· H3, PostGIS