Surbhi Kumar | PhD Candidate

About me

I am a Ph.D. Candidate in Mathematics at The University of Texas at Dallas working in graph representation learning under the supervision of Professor Baris Coskunuzer. My research develops mathematically grounded frameworks that integrate Topological Data Analysis and spectral geometry with deep learning models such as graph neural networks and graph transformers to encode multiscale structural information in graph-structured data.

A central goal of my work is to translate topological descriptors into stable and interpretable representations that improve learning under limited supervision and distribution shift, particularly in biomedical domains such as functional brain networks and molecular graphs for ligand-based virtual screening. I am experienced in CUDA-enabled GPU programming and scalable model development on high-performance computing (HPC) clusters.

Interests

Graph Representation Learning
Topological Data Analysis
Spectral Geometry for Graphs
Graph Neural Networks and Graph Transformers
Reliable and Leakage-Free Evaluation Protocols

Education

Ph.D. in Mathematics

The University of Texas at Dallas · Expected May 2028

M.S. in Mathematics (Data Science Specialization)

The University of Texas at Dallas · Expected Dec 2026

M.Sc. in Mathematics

National Institute of Technology, Rourkela

Bachelor of Science (Majors: Mathematics, Computer Science, Physics)

Guru Nanak Dev University

Projects

TopU-LBVS: A Realistic Multi-Target Benchmark for Ligand-Based Virtual Screening (In Progress)

A realistic multi-target ligand-based virtual screening benchmark based on the TopU95 collection derived from CHEMBL35. My contribution focuses on designing a rigorous, domain-shift-aware benchmarking protocol that mitigates inflated performance caused by unrealistic decoys and biased train–test splits. The unified framework reduces artificial class separation and enables fair comparison across classical ML, GNNs, and foundation models, providing a faithful assessment of robustness and early enrichment in practical molecular discovery settings.

Same-Graph Cross-Task Transfer in GNNs (Under Review, ICML 2026)

Same-Graph Cross-Task Transfer in GNNs: Protocols and Predictors formalizes transfer between node classification and link prediction on the same graph under a leakage-free evaluation framework. The protocol fixes node and edge splits, excludes evaluated edges from message passing, and controls negative sampling to prevent artificial signal reuse. Results show transfer is directional and predictable, with NC to LP benefiting homophilic graphs, while LP to NC can induce negative transfer under naive reuse. The work also introduces the CoTask Score to measure shared encoder utility across tasks.

Low-Shot Graph Learning with Topological and Spectral Embeddings (Accepted, LoG 2025)

This work studies graph classification under extreme label scarcity using explicit structural descriptors. Persistent homology based Betti vectors and spectral density-of-states embeddings capture multiscale topology and diffusion geometry. A prototype-based framework with STAMP conditioning improves performance in label-starved regimes and demonstrates that principled structural priors enhance stability and generalization.

Brain Network Graph Classification (Under Review, ICML 2026)

This project investigates how atlas choice affects downstream performance in functional brain network models. By introducing edge-based quadratic features that enhance structural expressiveness beyond node-level aggregation, the framework achieves consistent gains across multi-atlas baselines and highlights the importance of principled structural design in neuroscience-driven graph learning.

Interfaces and Reformulations for Robust Graph Learning (Collaborative ICML Submissions)

This line of work explores alternative representations and problem formulations for structure-aware graph learning. GraphMind proposes a reproducible topology-to-language interface that translates structural evidence into concise node descriptions embedded by frozen language models. DuoLink reformulates link prediction as node classification on the line graph, aligning message passing with edge neighborhoods and capturing edge motifs directly.

Contact

Email: sxk230046@utdallas.edu
LinkedIn: surbhi-kumar-662492263
Google Scholar: Profile