I am a Ph.D. Candidate in Mathematics at The University of Texas at Dallas working in graph representation learning under the supervision of Professor Baris Coskunuzer. My research develops mathematically grounded frameworks that integrate Topological Data Analysis and spectral geometry with deep learning models such as graph neural networks and graph transformers to encode multiscale structural information in graph-structured data.
A central goal of my work is to translate topological descriptors into stable and interpretable representations that improve learning under limited supervision and distribution shift, particularly in biomedical domains such as functional brain networks and molecular graphs for ligand-based virtual screening. I am experienced in CUDA-enabled GPU programming and scalable model development on high-performance computing (HPC) clusters.
- Graph Representation Learning
- Topological Data Analysis
- Spectral Geometry for Graphs
- Graph Neural Networks and Graph Transformers
- Reliable and Leakage-Free Evaluation Protocols
Ph.D. in Mathematics
M.S. in Mathematics (Data Science Specialization)
M.Sc. in Mathematics
Bachelor of Science (Majors: Mathematics, Computer Science, Physics)
Projects
A realistic multi-target ligand-based virtual screening benchmark based on the TopU95 collection derived from CHEMBL35. My contribution focuses on designing a rigorous, domain-shift-aware benchmarking protocol that mitigates inflated performance caused by unrealistic decoys and biased train–test splits. The unified framework reduces artificial class separation and enables fair comparison across classical ML, GNNs, and foundation models, providing a faithful assessment of robustness and early enrichment in practical molecular discovery settings.
Same-Graph Cross-Task Transfer in GNNs: Protocols and Predictors formalizes transfer between node classification and link prediction on the same graph under a leakage-free evaluation framework. The protocol fixes node and edge splits, excludes evaluated edges from message passing, and controls negative sampling to prevent artificial signal reuse. Results show transfer is directional and predictable, with NC to LP benefiting homophilic graphs, while LP to NC can induce negative transfer under naive reuse. The work also introduces the CoTask Score to measure shared encoder utility across tasks.
This work studies graph classification under extreme label scarcity using explicit structural descriptors. Persistent homology based Betti vectors and spectral density-of-states embeddings capture multiscale topology and diffusion geometry. A prototype-based framework with STAMP conditioning improves performance in label-starved regimes and demonstrates that principled structural priors enhance stability and generalization.
This project investigates how atlas choice affects downstream performance in functional brain network models. By introducing edge-based quadratic features that enhance structural expressiveness beyond node-level aggregation, the framework achieves consistent gains across multi-atlas baselines and highlights the importance of principled structural design in neuroscience-driven graph learning.
This line of work explores alternative representations and problem formulations for structure-aware graph learning. GraphMind proposes a reproducible topology-to-language interface that translates structural evidence into concise node descriptions embedded by frozen language models. DuoLink reformulates link prediction as node classification on the line graph, aligning message passing with edge neighborhoods and capturing edge motifs directly.
LinkedIn: surbhi-kumar-662492263
Google Scholar: Profile