I specialize in numerical methods and performance optimization for scientific computing, with a growing focus on GPU kernel design and solver acceleration. Drawing on my PhD and experience with PETSc and Nek5000, I work at the intersection of algorithm design, memory hierarchy tuning, and scalable HPC software. My goal is to devise mathematically sound solutions that clearly navigate the trade-offs between accuracy and performance.
Education
- 2003 - University of Bucharest (Romania) - BSc in Mathematics
- 2007 - Royal Institute of Technology (Sweden) - MSc Numerical Analysis
- 2012 - Royal Institute of Technology (Sweden) - PhD Numerical Analysis
Professional
- 2013 - 2016 Argonne National Laboratory - Nek5000 group
- 2017 - 2022 Argonne National Laboratory - PETSc group
- 2022 - 2024 Idaho National Laboratory - MOOSE group
- 2024- present PeraCompute Technologies - Computational Scientist
- 2024- present AiGIA - Chief Scientific Officer
I have been affiliated with the following software projects
- Computational Fluid Dynamics high-order software - Nek5000/NekRS (Fortran77 programming language)
- Linear Algebra library - PETSc (C programming language)
- Parallel finite element framework - MOOSE (C++ programming language)
- Editorial board of SIAM News
- Course design - PETSc for PDEs (NASA 2019)
- Popular science contributions
- Panelist - National Science Foundation
- Mentor - SuperComputing
- Minisymposiums organizer: SIAM CSE , SIAM UQ , JMM etc.
Areas of research I have been active in, and selected publications
- CUDA kernel development for matrix reordering and memory access optimization
- GEMM strategies: tuning and performance modeling
- LLVM and IR transformations for kernel performance
- Lower-precision control and rounding strategies to accelerate GPU kernels while maintaining numerical robustness
- The PetscSF Scalable Communication Layer - enabling asynchronous CUDA/NVSHMEM data movement on Summit’s V100 GPUs
- On the strong scaling of the spectral element solver Nek5000 on petascale systems
- Large-scale lossy data compression based on an a priori error estimator in a spectral element code - extensible to GPU offloading for in-place error control and I/O-bound reduction
- Attention head sparsification - DCT compression strategies on a GPT Transformer
- Fourier neural networks as function approximators and differential equation solvers
- Fitting Matérn smoothness parameters using automatic differentiation
- Informed knot placement schemes for B-spline approximation
- Corrected trapezoidal rules for a class of singular functions
- A highly accurate boundary treatment for confined Stokes flow
- Fitting Matérn smoothness parameters using automatic differentiation
- Large-scale lossy data compression based on an a priori error estimator in a spectral element code
- Fourier neural networks as function approximators and differential equation solvers
- A scalable matrix-free spectral element approach for unsteady PDE constrained optimization using PETSc/TAO - a dense algebra-derived strategy to allow GPU-tensorization on backward solves, showcased on CPUs
- Asynchronous two-level checkpointing scheme for large-scale adjoints in the spectral-element solver Nek5000
- Computing Derivatives for PETSc Adjoint Solvers using Algorithmic Differentiation
- Characterization of the secondary flow in hexagonal ducts
- The three-dimensional structure of swirl-switching in bent pipe flow
- Adaptive mesh refinement for steady flows in Nek5000
- Non-conforming elements in Nek5000: pressure preconditioning and parallel performance
- Direct numerical simulation of fluid flow in a 5x5 square rod bundle
Emails
- Work: oanam@peracompute.org
- Personal: oana_marin@outlook.com