Education

Doctor of Science in Mathematics and Computational Science

Cambridge, MA

Massachusetts Institute of Technology09/2023 – Current

GPA: 4.75 / 5.00

Relevant Coursework: Stochastic Processes, Eigenvalue of Random Matrices, Nonlinear Dynamics and Chaos, Fast Methods for Partial Differential Equations

Master of Science in Computational Science and Engineering

Cambridge, MA

Massachusetts Institute of Technology09/2021 – 06/2023

GPA: 5.00 / 5.00

Relevant Coursework: Parallel Computing & Scientific Machine Learning, Optimization Methods, Numerical Methods for Partial Differential Equations, Introduction to Numerical Methods

Bachelor of Science in Chemistry & Bachelor of Science in Physics

Beijing

Peking University09/2017 – 07/2021

GPA: 3.89 / 4.00, rank 1 / 137, honored as Weiming Bachelor (top 1%)

Relevant Coursework: Introduction to Computation, Data Structure and Algorithms, Computational Physics, Ordinary Differential Equations, Mathematical Method in Physics, Advanced Mathematics I, II, Advanced Algebra I, II

Exchange Student

Los Angeles, CA

University of California, Los Angeles09/2019 – 12/2019

GPA: 4.00 / 4.00

Relevant Coursework: Introduction to Probability, Applied Numerical Methods

Publication

S. Tan, O. Smith, C. Rackauckas; Efficient Explicit Taylor ODE Integrators with Symbolic-Numeric Computing arXiv preprint 2026 https://arxiv.org/abs/2602.04086

S. Tan, K. Miao, A. Edelman, C. Rackauckas; Scalable Higher-order Nonlinear Solvers via Higher-order Automatic Differentiation Proceedings of 16th International Modelica and FMI Conference 2025

B. Zhu, A. Sabharwal, S. Tan, Y. Ma, A. Edelman, C. Rackauckas; Efficient Symbolic Computation vis Hash Consing arXiv preprint 2025 https://arxiv.org/abs/2509.20534

S. Tan, J. Zhu, A. Edelman, C. Rackauckas; TaylorDiff.jl: Efficient and Versatile Higher-Order Derivatives in Julia ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming 2025, Differentiable Programming Workshop 2025

S. Tan, A. Edelman, C. Rackauckas; Fast Higher-order Automatic Differentiation for Physical Models Proceedings of the JuliaCon Conferences 2023

S. Tan; Higher-Order Automatic Differentiation and Its Applications Master's Thesis at MIT 2023

S. Tan; Data-Driven Density Functional Models Bachelor's Thesis at Peking University 2021

I. Leven, H. Hao, S. Tan, ..., T. Head-Gordon; Recent advances for improving the accuracy, transferability, and efficiency of reactive force fields J. Chem. Theory Comput. 2021

S. Tan, I. Leven, ..., T. Head-Gordon; Stochastic constrained extended system dynamics for solving charge equilibration models J. Chem. Theory Comput. 2020

Professional Experience

Optimization Engineering Intern

Cupertino, CA

Apple05/2025 – 08/2025

Advisor: Qilin He, Jiaqi Jiang (Platform Architecture)

Fine-tuned pre-trained machine learning surrogate models for physical simulations, achieving 18× speedup over traditional methods while maintaining high accuracy
Developed algorithms for differentiable physical solvers, enabling gradient-based optimization which is much more efficient than zeroth-order methods

Deep Learning Compiler Engineering Intern

Santa Clara, CA

NVIDIA05/2022 – 08/2022

Advisor: Yuan Lin (Deep Learning Compiler Team)

Improved the heuristics of layer fusion (matrix multiplication or convolution combined with element-wise functions), for running transformer-based large language models (LLMs) efficiently on GPUs with TensorRT
Designed a data-driven heuristic framework for fusion auto-tuning, enhancing search quality with 1.8× optimal tactic coverage.
Reduced compilation time by 40% with caching, multi-threading and better auto-tuning strategies, based on profiling-driven analysis of layer fusion pipeline in TensorRT

Research Experience

Advanced Nonlinear and Differential Equation Solvers via Higher-order Automatic Differentiation

Cambridge, MA

Massachusetts Institute of Technology02/2024 – Current

Advisor: Christopher Rackauckas & Alan Edelman

Developing advanced solvers for nonlinear equations and ordinary differential equations (ODEs) where higher-order derivatives can be used to enhance convergence and efficiency

Efficient Higher-order Automatic Differentiation for Physics-Informed Neural Networks

Cambridge, MA

Massachusetts Institute of Technology09/2022 – 02/2024

Advisor: Christopher Rackauckas & Alan Edelman

Developed higher-order forward-mode automatic differentiation algorithms that scale linearly with the order, suitable for physics-informed neural networks (PINNs) which require multiple order derivatives
Generated efficient higher-order differentiation rules (aka primitives) automatically from first-order chain rules with symbolic computation and metaprogramming in Julia
Composed the algorithm with existing first-order reverse-mode automatic differentiation libraries like Zygote.jl to establish a mixed-mode strategy for efficient PINN training

Low-level Automatic Differentiation Algorithms for Linear Algebra Routines

Cambridge, MA

Massachusetts Institute of Technology09/2021 – 06/2022

Advisor: Christopher Rackauckas & Alan Edelman

Worked as a part of the Enzyme project, an automatic differentiation framework based on source code transformation at LLVM intermediate representation (IR) level, which can differentiate through all languages with a LLVM backend (e.g. Julia, C++, Fortran)
Implemented algorithms that synthesize derivatives of BLAS/LAPACK kernels which are commonly used in scientific computing, and performed extensive optimizations based on linear algebra relations
Outperformed other high-level AD frameworks in Julia with 1.3× speed on a linear algebra benchmark set

Optimization Methods for Density Functional Models

Beijing

Peking University12/2020 – 06/2021

Advisor: Weinan E & Linfeng Zhang

Modeled the exchange-correlation density functional in generalized Kohn-Sham theory with deep neural networks and descriptors from density matrices
Established a method for using physical quantity labels that depend on the functional minimization result to train the functional model, in other words, addressed the "differentiate through argmin" problem
Implemented the training process with multiple types of physical quantity label, such as energy levels of molecular orbitals and dipole moment
Improved the accuracy and generalization performance of the model, obtained an average energy error of 0.06 kcal/mol on a test set that includes 1200 water molecule configurations labeled with SCAN0 functional (48% less than previous methods)

Noise Contrastive Learning for High-Dimensional Statistical Modeling

Cambridge, MA (Remote)

Massachusetts Institute of Technology05/2020 – 08/2020

Advisor: Bin Zhang

Applied noise contrastive learning for unsupervised training of the potential energy surface of coarse-grained molecular systems, which is a high-dimensional and unnormalized probability distribution (such that it cannot be learned with maximum likelihood estimation)
Utilized normalizing flow-based neural network architecture to construct the "noise", enabling computationally efficient sampling, density evaluation, and gradient computation
Obtained an efficient and systematic coarse-graining workflow, outperforming traditional force-mapping scheme which is inefficient to train and lacks transferability

 Awards  MathWorks Prize for Outstanding Masters Research, MIT Center for Computational Science and Engineering 
03/2023
 Weiming Bachelor, Peking University (top 1%) 
07/2021
 Academic Award, College of Chemistry and Molecular Engineering, Peking University (top 2%) 
07/2021
 2020 Wusi Scholarship & Merit Student, Peking University (top 1%) 
11/2020
 National Second Prize in Contemporary Undergraduate Mathematical Contest in Modeling, China Society for Industrial and Applied Mathematics 
12/2019
 2019 National Scholarship & Merit Student, Peking University (top 1%) 
11/2019
 Education Aboard Program Scholarhsip, Peking University 
05/2019
 2018 National Scholarship & Merit Student, Peking University (top 1%) 
11/2018
 Gold Medal and National Training Team in 29th Chinese Chemistry Olympiad (CChO), Chinese Chemical Society 
12/2015
 

Skills

Programming Languages: C/C++, Python, Julia, Rust, Fortran, JavaScript/TypeScript

High-Performance Computing and Artificial Intelligence Infrastructure: CUDA, MPI, LLVM, XLA, TensorRT

Machine Learning Frameworks: PyTorch, Lux.jl, JAX