Building scalable inference systems for large models (hybrid sharding, NCCL pipelines, encrypted inference) and mechanistic models for single‑cell biology (scFATE).
COLM 2025 (main conference)
NeurIPS 2024 Workshop — RBFM
KDD 2024 Workshop — SDBD
Preprint
AI Lead and Core Systems Contributor · New York, USA · 2023–present
Machine Learning Engineer (Independent Consulting) · 2022–2023
Machine Learning Engineer — Lead · 2021–2022
Research Engineer · 2019–2021
Hybrid sharding on heterogeneous GPU miners for decentralized inference.
DiT and U‑Net partitioned into 1F1B pipeline stages with activation checkpointing, weight sharding, fused attention, and frame‑chunked text‑to‑video. NCCL collectives for scale.
Block‑wise sharding across miners with Ulysses‑style sequence parallelism, paged KV caching, and speculative decoding. Over twenty thousand tokens per second aggregate on large models.
Indexed one hundred fifty thousand regulatory documents with embeddings and vector search. Produced cited answers from a large language model.
Implemented QANet with causal and synthesized attention. Reached F1 seventy and exact match seventy.
NASNet, VGG, and ResNet ensemble with heatmaps for interpretation. Accuracy eighty‑five percent and a simple web tool.
Thesis: Contrastive framework for spatial transcriptomics with structure‑aware transformers and graph corruption to reduce over‑smoothing. thesis
Selected coursework with grades: NLP (A), Parallel & Distributed Systems (A), AI (A‑), Advanced Programming (A), Computer Architecture (A), Algorithms (B+), Deep Learning for Perception (B), Data Mining (B).
Python, C++, Go, Java, Bash
PyTorch, vLLM, TensorFlow, Hugging Face, scikit‑learn, SpaCy, OpenCV, Optuna, LangChain
CUDA, NCCL, Triton, NATS, Docker, Kubernetes, Kafka, Flink, MLflow, AWS EC2 and S3, Linux
AI systems engineer and researcher. Lead contributor to the Nesa Orchestrator with over twelve million pulls. Work focuses on hybrid sharding across heterogeneous GPUs, encrypted inference with equivariant transforms, and high‑throughput diffusion and LLM serving. In computational biology, co‑authored work on mechanistic generative capsule attention for predicting single‑cell responses, under review at Nature Methods.