Jinwei Yao

My English name is Kivi.

Master of Science in Computer Science • Sibel Computer Science Center • UIUC

201 N Goodwin Ave, Urbana, IL 61801

Taken at Zakynthos, Greece

Taken at Bali, Indonesia

❄️ Imagination...dissolves, diffuses, dissipates, in order to re-create.

— Samuel Taylor Coleridge

❄️

Background

I am currently in the second year at UIUC, pursuing a (research-based) Master of Science in Computer Science (MSCS) . I am co-advised by Prof. Jiaxuan You(UIUC) and Prof. Ge Liu (UIUC) , whose guidance has shaped my research perspective on efficient modeling. In addition, I have also gained invaluable insights into systems from Prof. Fan Lai (UIUC) in his wonderful GenAI system course, which together inspired my current research interest in system–modeling co-design.

Before my journey at UIUC, I spent one year at EPFL as a fellowship PhD student in distributed systems, where I laid my academic foundations. Leaving peaceful and beautiful Switzerland is a hard decision: after one year of thinking and discussion with my career mentor Prof. Katerina Argyraki, I followed my heart to explore ML System research. In Zhejiang University, I obtained my Bachelor's degree in Electronic Science and Technology, with Outstanding Thesis Award for designing FPGA subsystem for GNN acceleration, where was the start of my MLSys research. During my ML System research journey, I was lucky to work with wonderful advisors-- Prof. Zeke Wang(Zhejiang University), Prof. Tao Lin(Westlake University), and Prof. Binhang Yuan(HKUST).

At UIUC, my research focuses on system-algorithm-modeling co-design for large models. System is my start point but I am doing algorithms/modeling as well. You can find my research interests as follows.

Research Interest

An overview of past research.I am currently focusing on efficient and effective system-algorithm co-designs.

Research Goal: To advance modeling–algorithm–system co-design for large-scale machine learning systems by bridging generative model design, system implementation, and hardware constraints.

“A deeper understanding of the theoretical framework that shapes the design space is essential to building systems that surpass the performance, scalability, and robustness of the state of the art; and, conversely, the problems that arise when addressing systems’ pain points can serve as a compass to guide us to exciting new theory.” — Theory meets Practice @ Cornell, by Professor Lorenzo Alvisi

My research aims to bridge theory and practice in machine learning systems.

Research Interests: Machine learning systems (MLSys), with a focus on the interaction between sequential and parallel generation in language modeling and inference systems.

Core Research Questions: How can we systematically trade off efficiency and effectiveness of large models through joint modeling, algorithmic, and system-level design?

In details, I categorize my research interests into three interdependent dimensions:

System: Efficiency & Robustness in LLM Infrastructure
- LLM Inference Efficiency: How to provide cheap and fast LLM inference services?
- LLM Training Efficiency: How to train LLMs with limited resources while ensuring robustness?
- ML-System SLO Trade-off: How to balance ML performance metrics (e.g., accuracy, perplexity) with system metrics (e.g., latency, throughput)?
Modeling: Beyond Auto-Regressive Patterns
- How can we rethink or extend generative modeling paradigms beyond the auto-regressive (AR) models?
- How to design architectures that are more expressive and efficient than AR models?
- How to unify multimodal inputs (e.g., text, vision, code) into a shared and coherent representation space?
Algorithm: Hardware-Aware Algorithm Design
- How to co-design algorithms with low-level primitives to maximize hardware utilization?

Research Philosophy

I. The Principle of System and Algorithm Co-design

[Algorithm → System] Pure system researchers are too late to know the promising ML algorithms in advance. Effective system design requires anticipating, not reacting to, emerging ML algorithms.
[System → Algorithm] More is Different. Algorithms must be evaluated by their behavior at scale under real system constraints.
[Quality-Efficiency Tradeoffs] Quality is prioritized before efficiency at the beginning, but efficiency determines the end.
[Hardware Lottery] Eventually, algorithms don't survive just by being smart, but by being efficient on current hardware.

II. My definition of Good Research in Machine Learning Systems

[Two Ends] Two ends are both fine: fast delivery (e.g., system) on practical projects, or slow science with principles (e.g., theory).
[Open-source] Great open-source (like SGLang, FlashInfer, etc.) is impactful.
[Identify the True Bottleneck] Don't optimize for the sake of optimization; solve the bottleneck that the next generation of models will scream for.

Open Source Contributions

sglang — Leading the SGLang diffusion LLM team. Contributor and learner in this wonderful community.
- Initiated block diffusion serving with a flexible decoding algorithm interface (blog post)
lm-evaluation-harness — lead the integration of SGLang as a backend in lm-eval-harness.

Miscellaneous

I am active in sharing paper readings on my another Github Blog and Zhihu(知乎). I like 🏀, 💪, 📚, 🐱, and 🎬.

News

Jan 5, 2026

🎉 I was honored to receive a Siebel School Outstanding Teaching Assistant Award Nomination for my teaching service in the Spring 2025 semester.

Mar 20, 2025

🎉 ResearchTown(LLM agents for automatic research) got accepted by ICML’25. Code is released here.

Jan 22, 2025

🎉 DeFT(tree-attention algorithm for efficient LLM inference including reasoning) got accepted by ICLR’25’ as Spotlight(Top 5%)! Code is released here.

Aug 16, 2024

😄 Enrolled at MSCS Program and began a new semester at UIUC.

Mar 2, 2024

🎉 DeFT got accepted by ICLR’24 AGI Workshop as Oral Presentation!

Sep 1, 2023

💻 After one-year of consideration and suggested by my career mentor, I decided to leave EPFL for ML System Research as no profs in EPFL are interested in ML System.

Aug 30, 2022

💻 Enrolled EPFL as a PhD student with a fellowship from CS department.

Jun 30, 2022

💪 Graduated from Zhejiang University and received the B.Eng. in Electronic Science and Technology (with Outstanding Graduation Award and Outstanding Thesis Award).