About Me

I am currently a Postdoctoral Fellow at Centre for Perceptual and Interactive Intelligence (CPII), Hong Kong. Before that, I had three wonderful years working with amazing students as a Research Assistant Professor at Southern University of Science and Technology (SUSTech). I obtained my Ph.D. degree in Computer Science and Engineering from the Chinese University of Hong Kong (CUHK) in 2020, under the kind and inspiring supervision of Professor James Cheng.

Research Interest

My interests are mainly systems that make large-scale data processing efficient, which include systems for training and serving machine learning models, and systems for managing and processing database queries. The recent topics I am working on are systems for graph learning and graph processing, systems for serving machine learning models, and systems and algorithms for vector data search and management. I have some mottos for research.

  • Good research work should be in-depth, insightful and interesting, aim high.
  • Problem is more important than solution, see fundamental problems not fancy solutions.
  • Discussions clarify thoughts and spark inspirations, ask foolish questions.

Publications

# for the corresponding author.

QEVIS: Multi-grained Visualization of Distributed Query Execution. Qiaomu Shen, Zhengxin You, Xiao Yan, Chaozu Zhang, Ke Xu, Dan Zeng, Jianbin Qin, Bo Tang. IEEE TVCG 2023.

gSampler: General and Efficient GPU-based Graph Sampling for Graph Learning. Ping Gong, Renjie Liu, Zunyao Mao, Zhenkun Cai, Xiao Yan#, Cheng Li#, Minjie Wang, Zhuozhao Li. ACM SOSP 2023.

Multi-domain Recommendation with Embedding Disentangling and Domain Alignment. Wentao Ning, Xiao Yan, Weiwen Liu, Reynold Cheng, Rui Zhang, Bo Tang. CIKM 2023.

Analyzing and combating attribute bias for face restoration. Zelin Li, Dan Zeng, Xiao Yan, Qiaomu Shen, Bo Tang. IJCAI 2023.

Dgi: An easy and efficient framework for gnn model evaluation. Peiqi Yin, Xiao Yan#, Jinjing Zhou, Qiang Fu, Zhenkun Cai, James Cheng, Bo Tang, Minjie Wang. ACM SIGKDD 2023.

Extracting Top- Frequent and Diversified Patterns in Knowledge Graphs. Jian Zeng, Xiao Yan, Yan Li, Mingji Han, Bo Tang. TKDE 2023.

FEC: Efficient Deep Recommendation Model Training with Flexible Embedding Communication. Kaihao Ma, Xiao Yan#, Zhenkun Cai, Yuzhen Huang, Yidi Wu, James Cheng. SIGMOD 2023.

Effective and efficient pagerank-based positioning for graph visualization. Shiqi Zhang, Renchi Yang, Xiaokui Xiao, Xiao Yan, Bo Tang. SIGMOD 2023.

Speeding Up End-to-end Query Execution via Learning-based Progressive Cardinality Estimation. Fang Wang, Xiao Yan#, Man Lung Yiu, Shuai LI, Zunyao Mao, Bo Tang#. SIGMOD 2023.

DSP: Efficient GNN training with multiple GPUs. Zhenkun Cai, Qihui Zhou, Xiao Yan#, Da Zheng, Xiang Song, Chenguang Zheng, James Cheng, George Karypis. PPoPP 2023.

GHive: accelerating analytical query processing in apache hive via CPU-GPU heterogeneous computing. Haotian Liu, Bo Tang#, Jiashu Zhang, Yangshen Deng, Xiao Yan#, Xinying Zheng, Qiaomu Shen, Dan Zeng, Zunyao Mao, Chaozu Zhang, Zhengxin You, Zhihao Wang, Runzhe Jiang, Fang Wang, Man Lung Yiu, Huan Li, Mingji Han, Qian Li, Zhenghai Luo. SoCC 2022.

Automatic meta-path discovery for effective graph-based recommendation. Wentao Ning, Reynold Cheng, Jiajun Shen, Nur Al Hasan Haldar, Ben Kao, Xiao Yan, Nan Huo, Wai Kit Lam, Tian Li, Bo Tang. CIKM 2022.

Manu: a cloud native vector database management system. Rentong Guo, Xiaofan Luan, Long Xiang, Xiao Yan, Xiaomeng Yi, Jigao Luo, Qianya Cheng, Weizhi Xu, Jiarui Luo, Frank Liu, Zhenshan Cao, Yanliang Qiao, Ting Wang, Bo Tang, Charles Xie. VLDB 2022.

T-LevelIndex: Towards Efficient Query Processing in Continuous Preference Space. Jiahao Zhang, Bo Tang, Man Lung Yiu, Xiao Yan, Keming Li. SIGMOD 2022.

CheetahKG: A Demonstration for Core-based Top- Frequent Pattern Discovery on Knowledge Graphs. Bo Tang, Jian Zeng, Qiandong Tang, Chuan Yang, Qiaomu Shen, Xiao Yan, Dan Zeng. ICDE 2022.

Face2exp: Combating data biases for facial expression recognition. Dan Zeng, Zhiyuan Lin, Xiao Yan, Yuting Liu, Fei Wang, Bo Tang. CVPR 2022.

Tensoropt: Exploring the tradeoffs in distributed dnn training with auto-parallelism. Zhenkun Cai, Xiao Yan#, Kaihao Ma, Yidi Wu, Yuzhen Huang, James Cheng, Teng Su, Fan Yu. TPDS 2021.

GAIPS: Accelerating Maximum Inner Product Search with GPU. Long Xiang, Xiao Yan#, Lan Lu, Bo Tang. SIGIR 2021.

Vertex-centric visual programming for graph neural networks. Yidi Wu, Yuntao Gui, Tatiana Jin, James Cheng, Xiao Yan, Peiqi Yin, Yufei Cai, Bo Tang, Fan Yu. SIGMOD 2021.

DGCL: an efficient communication library for distributed GNN training. Zhenkun Cai, Xiao Yan#, Yidi Wu, Kaihao Ma, James Cheng, Fan Yu. EuroSys 2021.

Towards Efficient MaxBRNN Computation for Streaming Updates. Wentao Ning, Xiao Yan, Bo Tang. ICDE 2021

Fast core-based top-k frequent pattern discovery in knowledge graphs. Jian Zeng, Xiao Yan, Mingji Han, Bo Tang. ICDE 2021.

Timestamped state sharing for stream analytics. Yunjian Zhao, Zhi Liu, Yidi Wu, Guanxian Jiang, James Cheng, Kunlong Liu, Xiao Yan. TPDS 2021.

Elastic deep learning in multi-tenant GPU clusters. Yidi Wu, Kaihao Ma, Xiao Yan#, Zhi Liu, Zhenkun Cai, Yuzhen Huang, James Cheng, Han Yuan, Fan Yu. TPDS 2021.

Convolutional embedding for edit distance. Xinyan Dai, Xiao Yan#, Kaiwen Zhou, Yuxuan Wang, Han Yang, James Cheng. SIGIR 2020.

Norm-explicit quantization: Improving vector quantization for maximum inner product search. Xinyan Dai, Xiao Yan, Kelvin KW Ng, Jiu Liu, James Cheng. AAAI 2020.

Understanding and improving proximity graph based maximum inner product search. Jie Liu, Xiao Yan, Xinyan Dai, Zhirong Li, James Cheng, Ming-Chang Yang. AAAI 2020.

Pyramid: A general framework for distributed similarity search on large-scale datasets. Shiyuan Deng, Xiao Yan, KW Ng Kelvin, Chenyu Jiang, James Cheng. Big Data 2019.

Grasper: A high performance distributed system for OLAP on property graphs. Hongzhi Chen, Changji Li, Juncheng Fang, Chenghuan Huang, James Cheng, Jian Zhang, Yifan Hou, Xiao Yan. SoCC 2019.

Tangram: bridging immutable and mutable abstractions for distributed data analytics. Yuzhen Huang, Xiao Yan, Guanxian Jiang, Tatiana Jin, James Cheng, An Xu, Zhanhao Liu, Shuo Tu. USENIX ATC 2019.

A general and efficient querying method for learning to hash. Jinfeng Li, Xiao Yan, Jian Zhang, An Xu, James Cheng, Jie Liu, Kelvin KW Ng, Ti-chung Cheng. SIGMOD 2018.

G-miner: an efficient task-oriented graph mining system. Hongzhi Chen, Miao Liu, Yunjian Zhao, Xiao Yan, Da Yan, James Cheng. EuroSys 2018.

Norm-ranging lsh for maximum inner product search. Xiao Yan, Jinfeng Li, Xinyan Dai, Hongzhi Chen, James Cheng. NeuralPS 2018.

Flexps: Flexible parallelism control in parameter server architecture. Yuzhen Huang, Tatiana Jin, Yidi Wu, Zhenkun Cai, Xiao Yan, Fan Yang, Jinfeng Li, Yuying Guo, James Cheng. VLDB 2018.

Losha: A general framework for scalable locality sensitive hashing. Jinfeng Li, James Cheng, Fan Yang, Yuzhen Huang, Yunjian Zhao, Xiao Yan, Ruihao Zhao. SIGIR 2017.

Services

Conference Reviewers

  • International Conference on Learning Representations (ICLR), PC, 2022, 2023
  • International Conference on Machine Learning (ICML), PC, 2020, 2021, 2022, 2023
  • Advances in Neural Information Processing Systems (NeurIPS), PC, 2020, 2021, 2022, 2023
  • AAAI Conference on Artificial Intelligence (AAAI), SPC, 2021, 2022
  • International Joint Conference on Artificial Intelligence (IJCAI), SPC, 2021, 2022
  • IEEE Conference on Computer Vision and Pattern Recognition (CVPR), PC, 2021, 2022

Journal Reviewers

  • IEEE Transactions on Knowledge and Data Engineering (TKDE)
  • IEEE Transactions on Neural Networks and Learning Systems (TNNLS)
  • Very Large Data Base Journal (VLDBJ)
  • Data Mining and Knowledge Discovery (DAMI)
  • Pattern Recognition

Contact

  • Email
    yanxiaosunny@gmail.com