AI Systems & Computational Biology

Building scalable inference systems for large models (hybrid sharding, NCCL pipelines, encrypted inference) and mechanistic models for single‑cell biology (scFATE).

Portrait of Ahmad Farhan

Publications

Mechanistic generative capsule attention networks predict single-cell responses to novel perturbations

Ahmad Farhan, Suraj Verma, Le Minh Thao Doan, Mohammed Moustapha Anwar, Aida Rodriguez‑Jimenez, Maria Angeles Juanes, Annalisa Occhipinti, Claudio Angione

Meta‑Learning for Speeding Up Large Model Inference in Decentralized Environments

Yipeng Du, Zihao Wang, Ahmad Farhan, Claudio Angione, Harry Yang, Fielding Johnston, James P. Buban, Patrick Colangelo, Yue Zhao, Yuzhe Yang

COLM 2025 (main conference)

Towards Secure and Private AI: A Framework for Decentralized Inference

Hongyang Zhang, Yue Zhao, Chao Yang, Ahmad Farhan, Fielding Johnston

NeurIPS 2024 Workshop — RBFM

Model Agnostic Hybrid Sharding for Heterogeneous Distributed Inference

Claudio Angione, Yue Zhao, Harry Yang, Ahmad Farhan, Fielding Johnston, James Buban, Patrick Colangelo

KDD 2024 Workshop — SDBD

Encrypted Large Model Inference: The Equivariant Encryption Paradigm

James Buban, Hongyang Zhang, Claudio Angione, Harry Yang, Ahmad Farhan, Seyfal Sultanov, Michael Du, Xuran Ma, Zihao Wang, Yue Zhao, Arria Owlia, Fielding Johnston, Patrick Colangelo

Preprint

Experience

Nesa Research

AI Lead and Core Systems Contributor · New York, USA · 2023–present

  • Lead contributor to Orchestrator, a hybrid‑sharding framework for decentralized model serving with more than twelve million pulls.
  • Distributed diffusion pipelines with NCCL collectives and 1F1B stages. Activation checkpointing, weight sharding, fused attention, and frame‑chunked text‑to‑video for higher throughput and lower memory.
  • LLM serving with block‑wise sharding and Ulysses‑style sequence parallelism, paged KV caching, and communication–compute overlap. Sustained over twenty thousand tokens per second on large models.
  • Routing and hot‑swap system with reputation signals and constrained reinforcement learning for automatic model placement and rollback safety.
  • Equivariant encryption for inference via learned orthogonal transforms for transformers and diffusion. Preprint and demo available.
  • Helped raise multiple millions in seed rounds. Selected for Binance Labs and BNB Chain MVB Season 7.
  • Advised by Dr. James P. Buban and Patrick Colangelo.

Upwork

Machine Learning Engineer (Independent Consulting) · 2022–2023

  • Top Rated with a one hundred percent client success rate and more than twenty thousand dollars in four months.
  • Fine‑tuned LLaMA and Mistral with parameter‑efficient methods for domain adaptation. Delivered agents with retrieval augmentation and planning.

Neurog.ai

Machine Learning Engineer — Lead · 2021–2022

  • In‑bed pose estimation using depth, infrared, and RGB. Built labeling tools and a multimodal dataset from more than seventy participants.
  • Stacked Hourglass and Pyramid Residual networks for long‑wave infrared and depth. Reached eighty‑one percent PCKh at 0.5 on the internal test set.

Knowledge Platform

Research Engineer · 2019–2021

  • DLRM‑based recommender for more than fifty thousand students. Improved measured learning outcomes by about eight percent across four releases.
  • Owned data and model pipelines from experimentation to deployment with monitoring and feedback loops.

Selected Work

Nesa Orchestrator

Hybrid sharding on heterogeneous GPU miners for decentralized inference.

Distributed Diffusion Pipelines

DiT and U‑Net partitioned into 1F1B pipeline stages with activation checkpointing, weight sharding, fused attention, and frame‑chunked text‑to‑video. NCCL collectives for scale.

LLM Sharding & Serving

Block‑wise sharding across miners with Ulysses‑style sequence parallelism, paged KV caching, and speculative decoding. Over twenty thousand tokens per second aggregate on large models.

Equivariant Encryption

End‑to‑end inference encryption via learned orthogonal transforms that preserve model symmetries for transformers and diffusion.

Selected Projects

Regulations QA with RAG

Indexed one hundred fifty thousand regulatory documents with embeddings and vector search. Produced cited answers from a large language model.

SQuAD 2.0 without PLMs

Implemented QANet with causal and synthesized attention. Reached F1 seventy and exact match seventy.

Chest X‑ray Diagnosis

NASNet, VGG, and ResNet ensemble with heatmaps for interpretation. Accuracy eighty‑five percent and a simple web tool.

Education

Teesside University — MSc Artificial Intelligence

Distinction (4.0/4.0), 2023–2025 · Middlesbrough, UK

Thesis: Contrastive framework for spatial transcriptomics with structure‑aware transformers and graph corruption to reduce over‑smoothing. thesis

FAST — BS Computer Science

GPA 3.01/4.0 (final two years 3.51/4.0), 2015–2019 · Islamabad, Pakistan

Selected coursework with grades: NLP (A), Parallel & Distributed Systems (A), AI (A‑), Advanced Programming (A), Computer Architecture (A), Algorithms (B+), Deep Learning for Perception (B), Data Mining (B).

Teaching

Technical Skills

Languages

Python, C++, Go, Java, Bash

Machine Learning

PyTorch, vLLM, TensorFlow, Hugging Face, scikit‑learn, SpaCy, OpenCV, Optuna, LangChain

Systems

CUDA, NCCL, Triton, NATS, Docker, Kubernetes, Kafka, Flink, MLflow, AWS EC2 and S3, Linux

About

AI systems engineer and researcher. Lead contributor to the Nesa Orchestrator with over twelve million pulls. Work focuses on hybrid sharding across heterogeneous GPUs, encrypted inference with equivariant transforms, and high‑throughput diffusion and LLM serving. In computational biology, co‑authored work on mechanistic generative capsule attention for predicting single‑cell responses, under review at Nature Methods.

Contact

Email: afarhan.nu@gmail.com

GitHub: iafarhan · LinkedIn: imafarhan

Last updated 2025-09-25.