AI Infrastructure Engineer. vLLM contributor.
I build and debug LLM inference systems on GPU-accelerated Kubernetes. I contributed a fix to vLLM's speculative decoding engine for DeepSeek-R1. Before this, I spent 12 years as a solutions architect at IBM, HP, Iron Mountain, and Kronos. I took time off to raise my family and run a real estate business. Now I'm back building infrastructure.
Work
vLLM — Speculative decoding fix
open sourceFound a logic failure in structured-output detection for DeepSeek-R1. Collaborated with core maintainers on the fix.
"Great analysis and work on a fix! This is a severe issue." — vLLM Maintainer
C++ · Python · CUDA · Speculative Decoding
LLM inference platform
infrastructureMulti-model serving on AWS EKS. vLLM, Triton, FastAPI request router. GPU node affinity, custom scheduling, cost-per-token observability.
AWS EKS · vLLM · Triton · FastAPI · Prometheus · Grafana · NVIDIA DCGM
Agentic fraud detection system
distributed systemsEvent-driven runtime using Ray, Kafka, Go, Python. Supervisor agent pattern. Persistent state with zero-loss processing during node failures.
Ray · Kafka · Go · Python
Writing
Why LLM inference needs two different GPUs Mar 2026 ↗ First-principles security at the edge Jan 2026 ↗ Evidence-first technical due diligence agent Feb 2026 ↗ all posts →Background
12 years as a solutions architect at IBM, HP, Iron Mountain, and Kronos. Distributed systems, 99.99% uptime, multi-terabyte data migrations, cross-stack debugging from hardware to application layer. Took time away to raise a family and run a real estate consulting business. Came back to engineering focused on GPU infrastructure and LLM inference.
Contact
nfrounjian@gmail.com
Boston, MA