Jinwei Yao
My English name is Kivi.
Master of Science in Computer Science • Sibel Computer Science Center • UIUC
201 N Goodwin Ave, Urbana, IL 61801
Background
I am currently in the second year at UIUC, pursuing a (research-based) Master of Science in Computer Science (MSCS) . My advisor is Prof. Jiaxuan You . I have also gained invaluable insights into modeling from Prof. Ge Liu (UIUC) and into systems from Prof. Fan Lai (UIUC) , which together inspired my current research interest in system–modeling co-design.
Before my journey at UIUC, I spent one year at EPFL as a fellowship PhD student in distributed systems, where I laid my academic foundations. Leaving peaceful and beautiful Switzerland is a hard decision: after one year of thinking and discussion with my career mentor Prof. Katerina Argyraki, I followed my heart to explore ML System research. In Zhejiang University, I obtained my Bachelor's degree in Electronic Science and Technology, with Outstanding Thesis Award for designing FPGA subsystem for GNN acceleration, where was the start of my MLSys research. During my ML System research journey, I was lucky to work with wonderful advisors-- Prof. Zeke Wang(Zhejiang University), Prof. Tao Lin(Westlake University), and Prof. Binhang Yuan(HKUST).
At UIUC, my research focuses on system-algorithm-modeling co-design for large models. System is my start point but I am doing algorithms/modeling as well. You can find my research interests as follows.
Research Interest
Research Goal: To advance modeling–algorithm–system co-design for large-scale machine learning systems by bridging generative model design, system implementation, and hardware constraints.
Research Interests: Machine learning systems (MLSys), with a focus on the interaction between sequential and parallel generation in language modeling and inference systems.
Core Research Questions: How can we systematically trade off efficiency and effectiveness of large models through joint modeling, algorithmic, and system-level design?
In details, I categorize my research interests into three interdependent dimensions:
-
System: Efficiency & Robustness in LLM Infrastructure
- LLM Inference Efficiency: How to provide cheap and fast LLM inference services?
- LLM Training Efficiency: How to train LLMs with limited resources while ensuring robustness?
- ML-System SLO Trade-off: How to balance ML performance metrics (e.g., accuracy, perplexity) with system metrics (e.g., latency, throughput)?
-
Modeling: Beyond Auto-Regressive Patterns
- How can we rethink or extend generative modeling paradigms beyond the auto-regressive (AR) models?
- How to design architectures that are more expressive and efficient than AR models?
- How to unify multimodal inputs (e.g., text, vision, code) into a shared and coherent representation space?
-
Algorithm: Hardware-Aware Algorithm Design
- How to co-design algorithms with low-level primitives to maximize hardware utilization?
Research Philosophy
I. The Principle of System and Algorithm Co-design
- [Algorithm → System] Pure system researchers are too late to know the promising ML algorithms in advance. Effective system design requires anticipating, not reacting to, emerging ML algorithms.
- [System → Algorithm] More is Different. Algorithms must be evaluated by their behavior at scale under real system constraints.
- [Quality-Efficiency Tradeoffs] Quality is prioritized before efficiency at the beginning, but efficiency determines the end.
- [Hardware Lottery] Eventually, algorithms don't survive just by being smart, but by being efficient on current hardware.
II. My definition of Good Research in Machine Learning Systems
- [Two Ends] Two ends are both fine: fast delivery (e.g., system) on practical projects, or slow science with principles (e.g., theory).
- [Open-source] Great open-source (like SGLang, FlashInfer, etc.) is impactful.
- [Identify the True Bottleneck] Don't optimize for the sake of optimization; solve the bottleneck that the next generation of models will scream for.
Open Source Contributions
-
sglang
— Leading the SGLang diffusion LLM team. Contributor and learner in this wonderful community.
- Initiated block diffusion serving with a flexible decoding algorithm interface (blog post)
- lm-evaluation-harness — lead the integration of SGLang as a backend in lm-eval-harness.
Miscellaneous
I am active in sharing paper readings on my another Github Blog and Zhihu(知乎). I like 🏀, 💪, 📚, 🐱, and 🎬.
News
🎉 I was honored to receive a Siebel School Outstanding Teaching Assistant Award Nomination for my teaching service in the Spring 2025 semester.
🎉 ResearchTown(LLM agents for automatic research) got accepted by ICML’25. Code is released here.
🎉 DeFT(tree-attention algorithm for efficient LLM inference including reasoning) got accepted by ICLR’25’ as Spotlight(Top 5%)! Code is released here.
😄 Enrolled at MSCS Program and began a new semester at UIUC.
🎉 DeFT got accepted by ICLR’24 AGI Workshop as Oral Presentation!
💻 After one-year of consideration and suggested by my career mentor, I decided to leave EPFL for ML System Research as no profs in EPFL are interested in ML System.
💻 Enrolled EPFL as a PhD student with a fellowship from CS department.
💪 Graduated from Zhejiang University and received the B.Eng. in Electronic Science and Technology (with Outstanding Graduation Award and Outstanding Thesis Award).