Cao, Zhenxiao
- Location: Xi’an, Shaanxi, P.R. China
- Email: realalanc@qq.com / realalanc029@gmail.com / alancao@stu.xjtu.edu.cn
News about me!
- I am going to start my PhD in HKUST this fall!
- “MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction” has been accepted by ICLR2025!!!!
Education
Bachelor of Science in Computer Science and Technology
Xi’an Jiaotong University, 2025
Studying Computer Science in XJTU while focusing on working in AI and Bioinformatics
PhD in Computer Science and Engineering
The Hong Kong University of Science and Technology, expected to graduate in 2029 (I HOPE SO)
Under the guidance of Hao Chen and Bonnie Danqing Zhu
Publications
Conference Papers
- “MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction” has been accepted by ICLR2025
Preprints
- “AlphaFold Database Debiasing for Robust Inverse Folding” is under review with available preprints in ArXiv
Research Interests
AI for Biology
- LLMs for biologial moleculars
- Predictions of the feature and structure of biologial moleculars
- Explainable AIs in biology
- AIGC in biology (like protein design, small molecular design)
- Biological image analysis and CV for biology
Machine Learning
- DL network structure design
- Explainable AIs
- Causal inference
Feel free to connect me for any research interests and communications
Research Experience
Internship at SenseTime
Year: 2023
Status: Finished
- Developed a smart sensor to monitor Alzheimer patients via CNN on an embedding device, involving quantization from a Yolov5-based inference network. Tasks included face identification, landmark detection, and signal processing for heart rate.
First-author-level works
- MeToken: Uniform Micro-environment Token Boosts Post-Translational Modification Prediction
With Cheng Tan Google Scholar and Stan Z.Li Google Scholar Year: 2024
Status: Accepted by ICLR2025
- Developed a deep learning framework for PTM prediction, aiming to achieve state-of-the-art results in multi-class prediction using sequence and structural data.
- Our model is based on VQ-VAE, which constructs graphs by capturing sequence neighbors and structural neighbors. It is divided into two tasks: pre-training the VQ-VAE and extracting intermediate embeddings for downstream tasks.
- AlphaFold Database Debiasing for Robust Inverse Folding
With Cheng Tan Google Scholar and Stan Z.Li Google Scholar Year: 2024
Status: Under Review
- As can be known, protein prediction data often has biases against the ground truth. Our task is to build an algorithm to pair-to-pair refine and enhance the dataset.
- How Effective is In-Context Learning with Large Language Models for Rare Cell Identification in Single-Cell Expression Data?
Year: 2024
Status: Under Review
- Using LLM as an extractor to identify rare cells from data, which can be used in different biological domains. We found that cross-query based in-context learning provides stable performance, independent of the data size, and achieves SOTA on some datasets.
- A Chain-of-thought Reasoning Breast Ultrasound Dataset Covering All Histopathology Categories
Year: 2024
Status: Under Review
- Developing a medical image dataset, including different types of medical images in different types of breast cancers. Focusing on rare cases, our dataset tends to provide a competative dataset and benchmarks for future downstream tasts.
- Using CoT technique as a tool enhancing the explanibility and trustworthy of AI diagnose also reaching the SOTA performance.
Other works
- Monomer inference in fruit fly’s centromere
Year: 2023
Status: Under Review
- Analyzed the structure of the centromere in Drosophila melanogaster, focusing on monomer analysis from long-read gap-free sequence data.
Awards
- China National Biology Olympics (CNBO) Silver
Year: 2020
- Awarded a Silver Medal in CNBO for proficiency in biology.
- International Genetic Machine Olympics (iGEM) Gold
Year: 2022-2023
- Awarded Gold as a member of the iGEM team for innovative work on cellular automata modeling.
- The whole project is focused on building a light-induced autolysis on engineered bacteria to emerge genocides,
while the cell lysis cycle controls the amount of genocides.
Other Contributions
- Building py-Cicero as a free open-source developer
- Developing a Python-version Cicero software for calculating single-cell chromatin co-accessibility.
- PR and bug fix in PDBminer
- Contributed bug fixes and enhancements to PDBminer, an open-source software for retrieving structural data via UniProt IDs.
Professional Skills
Language
- Chinese (native)
- English (TOEFL 103)
Programming
- Familiar with basic Linux commands
- Proficient in Python; competent in R, C, C++, and Java
Biology
- Proficient in biochemistry, molecular biology, cellular biology, genetics, and other related fields
- Experienced in wet lab techniques such as PCR and electrophoresis.
Other Interesting Thinng About Me
Model UN
- I was a paticipant in Model UN in my middle school and high school times, with getting some prizes.
Language Learning
- I am a fan in language learning, I am learning Russian, and having further plan in learning Japanese.
Leadership Trainning
- I once joined a leadership trainning held by ITCILO, learned some skills and facts about life planning.
- I am interested in and willing to join or develop a AI4Bio community.