AI Infrastructure Engineer

Chantilly, Virginia

Top Secret or TS/SCI is required to start
$200K to $250K
Chantilly, VA

Deploy and optimize self-hosted LLM inference servers (vLLM, Ollama, and similar).
Containerize AI workloads using Docker and orchestrate production environments with Kubernetes, including GPU scheduling.
Build and maintain AI serving infrastructure, including gateways, load balancing, authentication, TLS, and rate limiting.
Optimize GPU utilization, memory management, quantization, batching, and capacity planning to balance performance and cost.
Develop and maintain CI/CD pipelines, observability, monitoring, and incident response processes.

Hands-on experience deploying and serving Large Language Models (LLMs) in production.
Strong experience with Docker and production Kubernetes environments, including GPU scheduling.
Deep understanding of self-hosted AI infrastructure, including model formats, quantization, GPU memory management, batching, and inference optimization.
Experience supporting production applications with networking, reverse proxies, load balancing, authentication, and TLS.
Proficiency with Linux administration and Python and/or Bash scripting.
Ownership mindset with the ability to operate and improve production AI infrastructure.

Experience with CUDA, NVIDIA drivers, GPU Operators, or other GPU infrastructure technologies.
Experience with Infrastructure as Code (Terraform, Helm).
Familiarity with observability and monitoring tools such as Prometheus and Grafana.
Experience building Retrieval-Augmented Generation (RAG) pipelines and working with vector databases (pgvector, Qdrant, Weaviate).
Experience with LLM gateway tools such as LiteLLM.