project name
profile image

Tianang Leng

PhD Candidate @ UPenn UPenn

I am a PhD student in Bioengineering at University of Pennsylvania, advised by Prof. Cesar de la Fuente. Previously, I received my bachelor's degree in Artificial Intelligence from Huazhong University of Science and Technology. I am fortunate to work under the supervision of Prof. Jianyang Zeng at Westlake University, Prof. Xiaohui Xie in University of California, Irvine and Prof. Dongrui Wu in Huazhong University of Science and Technology.

In the era of large language models, my research objective is to build models that can interact with and truly understand the real world—both the macroscopic world of physical objects and the microscopic 3D molecular world—so as to develop a knowledgeable, reliable AI biochemical expert capable of autonomous scientific discovery. To this end, my previous work has centered on:

  • Language model reasoning: how task or problem difficulty systematically affects the required reasoning depth/trajectory of LLMs and how this in turn influences accuracy.
  • Molecular understanding and generation: the advantages of discrete diffusion–style language models for molecular representation and design, featuring applications to antimicrobial peptides design.
  • Medical image analysis: few-shot, high-precision segmentation of pathological organs. Going forward, this focus should be switched to vision-language models for cell-level perturbation–response understanding, coupled with automated, grounded natural-language reasoning outputs to close the loop from observation to scientific hypothesis.
If you would like to discuss my research or potential collaborations further, feel free to contact me. I'm open to connect and collaborate with each other.

Selected Publications

When Reasoning Meets Its Laws

NeurIPS 2025 Workshop on Efficient Reasoning (Oral)

Junyu Zhang, Yifan Sun, Tianang Leng, Jingyan Shen, Liu Ziyin, Paul Pu Liang, Huan Zhang

The paper introduces LORE, a framework that defines how large reasoning models should scale their thinking and accuracy with problem complexity. It builds a benchmark (LORE-BENCH) to test two key properties—monotonicity and compositionality—and finds current models are mostly monotonic but not compositional; fine-tuning to enforce the compute rule improves overall reasoning performance.
[Paper]

Predicting and generating antibiotics against future pathogens with ApexOracle

NeurIPS 2025 2nd Workshop on Multi-modal Foundation Models and Large Language Models for Life Sciences

Tianang Leng, Fangping Wan, Marcelo Der Torossian Torres, Cesar de la Fuente-Nunez

ApexOracle centers on a discrete diffusion language model (DLM) that does two jobs at once: it learns chemistry-aware features of molecules and it can run the diffusion process backward to generate new antimicrobial candidates. Because the DLM is trained to reconstruct SELFIES molecules and to predict a large panel of molecular descriptors, the features it learns are very informative, not just token-level. This DLM backbone is then plugged into a multimodal architecture that also ingests pathogen genomics (Evo2) and pathogen text knowledge (Me-LLaMA), and cross-attends these with the molecule features — so the model is actually reasoning over "molecule + genome + text" together. That multimodal fusion is what lets ApexOracle stay effective on unseen or future strains: given a new genome or description, it can both predict activity and guide the DLM to sample a molecule that should work for that specific pathogen.
[Paper]

AI in biomaterials discovery: generating self-assembling peptides with resource-efficient deep learning

Nature Machine Intelligence

Tianang Leng, Cesar de la Fuente-Nunez

Here we argue that resource-efficient hybrid RNN deep-learning models can generatively design self-assembling peptides with strong self-organization, underscoring AI’s expanding role in biomaterials discovery.
[Paper]

Undergrad Publication

Self-sampling meta SAM: enhancing few-shot medical image segmentation with meta-learning

WACV 2024

Tianang Leng, Yiming Zhang, Kun Han, Xiaohui Xie

We introduce three novel modules for foundational vision models for efficient few-shot medical image segmentation, experiments on popular abdominal CT and MRI dataset showed average improvements of 10.21% and 1.80% in terms of DSC, respectively.
[PDF] [Code]

Awards and Scholarships

  • Outstanding Undergraduates in Term of Academic Performance (top 1%) (2024)
  • National Scholarship (top 1.8%) (2022)
  • School merit scholarship (top 6%) (2022)
  • National Scholarship (top 1.8%) (2021)
  • School merit scholarship (top 6%) (2021)
  • Freshmen study for merit scholarships (2021)

Ancient Interesting Projects

Kaggle: Predict Student Performance from Game Play

Utilizing insights from BERT and MAE, we developed a sequence classification model to accurately predict students' performance through a series of gameplay actions, earning us a prestigious bronze medal (top 5.6%).
[Contest Link]

2022 World Robot Contest - BCI Controlled Robot Contest

Utilizing Euclidean Alignment and Independent Component Analysis (ICA), we generated a substantial volume of high-quality data, successfully mitigating the over-fitting issue prevalent in EEG based Brain-Computer Interfaces. Impressively, I secured the 20th position among 283 global teams predominantly comprised of master's and Ph.D. students.
[Contest Link] [Code]

2022 HUST-Cambridge Foundations of Data Science - Masked Face Recognition

By incorporating a lightweight Convolutional Block Attention Module into ArcFace and optimizing the training with challenging images isolated using MCTNN, I successfully honed the network's focus on distinguishing features such as the eyes and hairline, thereby enhancing the performance by a notable 4%.
[PDF]

Undergraduate Innovation and Entrepreneurship Project - Intelligent football game analysis

By adeptly merging a transformer-based tracker with YOLOv5, I successfully developed a system capable of mitigating player and ball occlusion issues, enabling automatic and precise tracking of the small soccer ball at real-time FPS through a whole game.