AI Infrastructure Engineer
Top Secret or TS/SCI is required to start
$200K to $250K
Chantilly, VA
What You'll Do
- Deploy and optimize self-hosted LLM inference servers (vLLM, Ollama, and similar).
- Containerize AI workloads using Docker and orchestrate production environments with Kubernetes, including GPU scheduling.
- Build and maintain AI serving infrastructure, including gateways, load balancing, authentication, TLS, and rate limiting.
- Optimize GPU utilization, memory management, quantization, batching, and capacity planning to balance performance and cost.
- Develop and maintain CI/CD pipelines, observability, monitoring, and incident response processes.
What You'll Bring (Required)
- Hands-on experience deploying and serving Large Language Models (LLMs) in production.
- Strong experience with Docker and production Kubernetes environments, including GPU scheduling.
- Deep understanding of self-hosted AI infrastructure, including model formats, quantization, GPU memory management, batching, and inference optimization.
- Experience supporting production applications with networking, reverse proxies, load balancing, authentication, and TLS.
- Proficiency with Linux administration and Python and/or Bash scripting.
- Ownership mindset with the ability to operate and improve production AI infrastructure.
Nice to Have
- Experience with CUDA, NVIDIA drivers, GPU Operators, or other GPU infrastructure technologies.
- Experience with Infrastructure as Code (Terraform, Helm).
- Familiarity with observability and monitoring tools such as Prometheus and Grafana.
- Experience building Retrieval-Augmented Generation (RAG) pipelines and working with vector databases (pgvector, Qdrant, Weaviate).
- Experience with LLM gateway tools such as LiteLLM.