Nancy Bethala-Frounjian

AI Infrastructure Engineer. vLLM contributor.

I build and debug LLM inference systems on GPU-accelerated Kubernetes. I contributed a fix to vLLM's speculative decoding engine for DeepSeek-R1. Before this, I spent 12 years as a solutions architect at IBM, HP, Iron Mountain, and Kronos. I took time off to raise my family and run a real estate business. Now I'm back building infrastructure.

Work

vLLM — Speculative decoding fix

open source

Found a logic failure in structured-output detection for DeepSeek-R1. Collaborated with core maintainers on the fix.

PR #34978 ↗ Issue #34650 ↗

"Great analysis and work on a fix! This is a severe issue." — vLLM Maintainer

C++ · Python · CUDA · Speculative Decoding

LLM inference platform

infrastructure

Multi-model serving on AWS EKS. vLLM, Triton, FastAPI request router. GPU node affinity, custom scheduling, cost-per-token observability.

github ↗

AWS EKS · vLLM · Triton · FastAPI · Prometheus · Grafana · NVIDIA DCGM

Agentic fraud detection system

distributed systems

Event-driven runtime using Ray, Kafka, Go, Python. Supervisor agent pattern. Persistent state with zero-loss processing during node failures.

github ↗

Ray · Kafka · Go · Python

Writing

Why LLM inference needs two different GPUs Mar 2026 ↗ First-principles security at the edge Jan 2026 ↗ Evidence-first technical due diligence agent Feb 2026 ↗ all posts →

Background

12 years as a solutions architect at IBM, HP, Iron Mountain, and Kronos. Distributed systems, 99.99% uptime, multi-terabyte data migrations, cross-stack debugging from hardware to application layer. Took time away to raise a family and run a real estate consulting business. Came back to engineering focused on GPU infrastructure and LLM inference.

Contact

nfrounjian@gmail.com
Boston, MA