About Me

Hi, my name is Yin Lin. I am a senior algorithm engineer at Tongyi Lab, Alibaba Group, where I explore AI-driven solutions for enhancing data analytics and management. My current research focuses on the intersection of AI agents and data management, developing systems that leverage intelligent agents to better process, analyze, and reason over data.

Before joining Alibaba, I earned my Ph.D. from the University of Michigan, Ann Arbor, where I was advised by Dr. H. V. Jagadish. My thesis research focused on data equity systems.

If you are looking for a research intern position or interested in collaboration on AI agents, data management, or related areas, feel free to reach out!

What's New
  • SIGMOD 2026 Our demo paper AmbiSQL: Interactive Ambiguity Detection and Resolution for Text-to-SQL is accepted!
  • New Semantic operators integrated into Data-Juicer for large-scale training data processing.
  • New DojoZero: An agent arena for real-time data stream prediction in sports events. View AI agents compete in realtime at dojozero.live.

Education & Experience

Education

Sept. 2019 – Dec. 2024

University of Michigan

Ph.D., Computer Science and Engineering

Sept. 2015 – June 2019

Shanghai Jiao Tong University

B.S., Computer Science

Experience

May 2023 – Aug. 2023

Alibaba Group

Research Intern, Data Analytics and Intelligence Lab (DAIL), Damo Academy

June 2022 – Aug. 2022

Microsoft Research

Research Intern, Data Management, Exploration and Mining (DMX)

May 2018 – Jul. 2018

University of Waterloo

Summer Intern, Software Architecture Group

Research Interest

I am currently exploring how AI agents can better manage, reason over, and interact with data. Here are some of my ongoing projects:

Ambiguity Resolution in Analytical Queries

SIGMOD 2026 Demo

Natural language queries to databases are often ambiguous. AmbiSQL is an interactive system that detects and resolves ambiguities in text-to-SQL translation, enabling users to clarify their intent and receive accurate query results.

Semantic Operators for Data Processing

Semantic operators bring AI-powered data transformations to large-scale data processing pipelines. These operators are now part of Data-Juicer, enabling intelligent filtering, enrichment, and transformation of training data for foundation models.

DojoZero: Agent Arena

GitHub Open-Source Project

DojoZero is an agent arena where AI agents react to real-time data streams to make predictions for sports events. It serves as a testbed for evaluating agents' capabilities in dynamic, time-sensitive decision-making scenarios.

AI-Driven Data Analytics

Exploring novel agent applications for data analytics, including using LLMs for feature engineering and data engineering tasks.

Publications

1

AmbiSQL: Interactive Ambiguity Detection and Resolution for Text-to-SQL [pdf]

Zhongjun Ding, Yin Lin*, Tianjing Zeng, Rong Zhu, Bolin Ding, Jingren Zhou (* corresponding author)

SIGMOD 2026 (Demo)

2

Large Language Models as Pretrained Data Engineers: Techniques and Opportunities [pdf]

Yin Lin, Bolin Ding, Jingren Zhou

IEEE Data Engineering Bulletin 2025

3

Efficient Row-Level Lineage Leveraging Predicate Pushdown [pdf]

Yin Lin, Cong Yan

CoRR, 2024, Arxiv/2412.16864

4

Mitigating Subgroup Unfairness in Machine Learning Classifiers: A Data-Driven Approach [pdf]

Yin Lin, Samika Gupta, H. V. Jagadish

ICDE 2024

5

SMARTFEAT: Efficient Feature Construction through Feature-Level Foundation Model Interactions [pdf]

Yin Lin, Bolin Ding, H. V. Jagadish, Jingren Zhou

CIDR 2024

6

Predicate Pushdown for Data Science Pipelines [pdf]

Cong Yan, Yin Lin, Yeye He

SIGMOD 2023 (Best Paper Award)

7

Representation Bias in Data: A Survey on Identification and Resolution Techniques [pdf]

Nima Shahbazi, Yin Lin, Abolfazl Asudeh, H. V. Jagadish

ACM Computing Surveys

Highly cited survey in the field of AI fairness
8

OREO: Detection of Cherry-picked Generalizations [pdf]

Yin Lin, Brit Youngmann, Yuval Moskovitch, H. V. Jagadish, Tova Milo

VLDB 2022 (Demo)

9

On Detecting Cherry-picked Generalizations [pdf]

Yin Lin, Brit Youngmann, Yuval Moskovitch, H. V. Jagadish, Tova Milo

VLDB 2022

10

Identifying Insufficient Data Coverage in Databases with Multiple Relations [pdf]

Yin Lin, Yifan Guan, Abolfazl Asudeh, H. V. Jagadish

VLDB 2020

11

MithraDetective: A System for Cherry-picked Trendlines Detection [pdf]

Yoko Nagafuchi, Yin Lin, Kaushal Mamgain, Abolfazl Asudeh, H. V. Jagadish, You (Will) Wu, Cong Yu

CoRR, 2020, Arxiv/2010.08807

12

On Structural vs. Proximity-based Temporal Node Embeddings [pdf]

Puja Trivedi, Alican Büyükçakır, Yin Lin, Yinlong Qian, Di Jin, Danai Koutra

MLG@KDD 2020

13

R2-Tree: An Efficient Indexing Scheme for Server-Centric Data Center Networks [pdf]

Yin Lin, Xinyi Chen, Xiaofeng, Guihai Chen

DEXA 2018

Scholarships and Awards

Best Paper Award at SIGMOD 2023
NSF Travel Award for ICDE 2024
Rackham Dean's and Named PhD Fellowship, University of Michigan
Outstanding Undergraduate, Shanghai Jiao Tong University
Chun Tsung Scholar, Shanghai Jiao Tong University

Professional Service

Program Committees / Reviewers: NeurIPS, TKDE, CIKM, IEEE BigData, AIBSD (AAAI Workshop on AI with Biased or Scarce Data), ReLM (AAAI Workshop on Responsible Language Models)