What can ML tools and simulations do for (radio) cosmology ?

(Or what keeps me busy in Switzerland)









Nicolas Cerardi











  • Physics-Informed Neural Networks for continuous CDM simulations
  • Constraining the reionization history with higher-order statistics and simulation-based inference
  • Satellites mega constellations and the Square Kilometre Array Observatory











  • Physics-Informed Neural Networks for continuous CDM simulations


With Emma Tolley & Ashutosh Mishra

The dynamics of CDM
  • Cold Dark Matter obeys the Cosmological Vlasov Poisson equations:
  • $$ \frac{\partial f}{\partial t} + \vec{v} \cdot \vec{\nabla}_x f - \vec{\nabla}_x \Phi \cdot \frac{\partial f}{\partial \vec{v}} = 0 $$ $$ \nabla^2_x \Phi = 4\pi G a^2 \bar{\rho}(\tau) \delta(\vec{x}, \tau) $$

1D CDM simulation

Phase-space distribution

  • Cold DM occupies a 1D submanifold in the 1+1D phase-space
  • Multi-stream flow after shell-crossing
Density

  • Singularities arise at shell-crossing locations
  • They affect the CDM dynamics (Poisson equation)
Routes to CDM simulations
  • N-body codes employ particles instead of a fluid
  • This leads to discreteness errors (2-body interactions)


Can we find an alternative to N-body simulations ?

Since CDM occupies a low-dimensional submanifold, we can completely describe the dynamics using the displacement field $\zeta(q,\tau)$: $$ x(q,\tau) = q + \zeta(q,\tau) $$
With the equation of motion: $$ \frac{\partial^2 \zeta}{\partial \tau^2} + \frac{3}{2\tau} \frac{\partial \zeta}{\partial \tau} = - \frac{3}{2\tau} \frac{\partial \phi}{\partial x}$$
Can a neural network learn the solution ?

Physics Informed Neural Networks (PINNs)


PINNs: simulations seen as an optimisation problem
  • Use a neural network to learn the solution of a PDE
  • Network optimisation is guided by the physical constraints (PDE + boundary)
  • Fully unsupervised $\Longrightarrow$ independent from N-body
  • Provides a continuous representation of the solution, not just particles !

Physics Informed Neural Networks (PINNs)

PDE
  • Numerical integration allows to calculate $\nabla_x \phi$
  • Auto-diff provides access to the remaining term of the PDE (velocity and acceleration)

Boundary
  • Initial conditions given from the Zel'dovich Approximation
  • Boundary conditions set to 0

Which neural architecture can we use ?


  • Sharp features in acceleration due to the shell-crossings.
  • Our first attempts with MLPs failed to captures these.
  • We also tested more complex activation functions.
  • We need an architecture that provides high flexibility down to the 2nd derivative of the network output

Kolmogorov-Arnold Networks (KANs)


  • The Kolmogorov-Arnold representation theorem: any multivariate function can be approximated by a combination of (many) flexible univariate functions
  • Allows an efficient parameterization of the target function

Regular MLP

[Liu+25]
Kolmogorov-Arnold Network

Edges are combination of 3rd-order B-splines

$\Rightarrow$ Physics-Informed KANs (PIKANs!)

PIKAN implementation
  • We use a [8, 12, 8, 1] KAN architecture (~4500 parameters)
  • We employ independent KANs on consecutive time chunks
$$ x(q,\tau) = q + \textcolor{cyan}{\zeta_0} + \sum_{i=1}^{k-1} \textcolor{orange}{\zeta_i}(q, \tau_{end,i}) + \textcolor{green}{\zeta_k}(q,\tau)$$
  • $ \zeta_0 $ is the Zel'dovich approximation
  • $ \zeta_i $ are the KANs trained on previous chunks
  • $ \zeta_k $ is the KAN operating at $\tau$

Results

Results : Phase-space


A very good visual agreement, up to 6 shell-crossings !

Results : displacement, velocity and acceleration


Errors w.r.t a high resolution N-body simulation
Error is less than 2% on the displacement

Results : density profiles

  • Density profiles are in good agreement
  • In the core, N-body profiles are noisy
  • The PIKAN provides a smoother solution (too much?)

Caveats
  • Training time is quite long (several hours per time chunk on a single GPU)
  • While the learnt representation is defined continuously, we still need to sample points for the numerical integration of $ \nabla_x\phi $
  • (for now a 1D single halo collapse)

What else can we do with such a tool ?


  • Unlike a N-body sim that advances timestep per timestep, in the PINN framework the optimisation is performed simultaneously over time.
  • $\Rightarrow$ We can set the "initial conditions" at any $\tau$

Results : Backward simulation

  • Test case: reverse time through 2 shell crossings.
  • Start from high-resolution N-body data at $\tau_{end}$, optimise to retrieve the underlying initial conditions.

Input: $ (\zeta, \dot{\zeta}, \ddot{\zeta})$

Input: $ (\zeta, \dot{\zeta}, \delta(x))$

CDM PIKAN - Takeaways
  • We presented a new method to simulate CDM, independent from N-body simulations.
  • PIKANs allows a continuous and differentiable modelling of the dynamics.
  • It is possible to run the simulation backward to retrieve the initial conditions.
  • Future work: higher dimensions, complex ICs...
[Cerardi, Tolley and Mishra, MNRAS, 2025]












Implicit inference of the reionization history with higher-order statistics of the 21-cm signal

With the SEarCH team: Sambit Giri, Michele Bianco, Davide Piras, Emmanuel de Salis, Massimo de Santis, Merve Selcuk-Simsek, Philipp Denzel, Kelley Hess, Carmen Toribio, Franz Kirsten & Hatem Ghorbel
If you missed the last semester of SEarCH...

  • Participated to the SKA Data Challenge 3b: Inference
  • Goal: infer the neutral fraction $ \bar{x}_{\rm HI} $ in SKA-Low mocks at three different $ z $ to constrain the Epoch of Reionisation
How did we tackle this problem ?

Our strategy
  1. Created a large dataset of ~16000 SKA-Low mocks
    • Included instrumental noise but no foregrounds

  2. Trained 2 independent DL approaches
    • Regression task on $ \bar{x}_{\rm HI} $ with deep CNNs
      [De Salis+25]
    • Bayesian inference with Simulation-based Inference (SBI)
"I have a complex and noisy model which doesn't provide an explicit likelihood"
$ \Rightarrow $ learn the posterior directly from simulations !
Our results at SDC3b
  • Obtained great results on the PS2 data 🎉
Our results at SDC3b
  • But not ranked on the PS1 data
Going beyond the data challenge
  • Computing higher-order statistics from our mock simulations
  • Combining them within our SBI framework
  • Provide statistically robust results
Extending our modelling
  • Rejecting extreme reionisation cases
  • Dataset expanded x2 with more noise realisation applied on the mocks
  • Test the noise-level dependence: 100h and 1000h SKA-Low noise considered
  • More statistics computed from the mocks
More statistics !

We compute several kinds of statistics $ t $ from the 21cm cubes
2pt statistics
1D Power Spectrum
2D Power Spectrum

Higher-order statistics
Bispectrum (equilateral & squeezed)
Betti numbers (topological invariant)

+ combinations of these metrics
The true SBI situation here
  • $ \bar{x}_{\rm HI} $ is an output of the simulation: we do not directly sample it
  • Prior on $ \bar{x}_{\rm HI} $ is implicit and non-trivial
  • Direct neural posterior estimation is the way to go !
\[\begin{aligned} p(\bar{x}_{\rm HI} | t) & \propto p(\bar{x}_{\rm HI}) p(t | \bar{x}_{\rm HI}) \\ & \propto q_\phi (\bar{x}_{\rm HI} | t) \end{aligned} \]
  • We are model-dependent on both the prior and the likelihood
SBI methodology

We apply Variational Mutual Information Maximization (VMIM, Jeffrey+20) to learn $ p(\bar{x}_{\rm HI} | t) $ by:
  • Compressing the stacked statistics $ [t_1, t_2, t_3] $ into a 3D vector $ y $
  • Maximizing the mutual information between $ y $ and $ \bar{x}_{HI} $

Can you trust me ?
  • For each case we train 20+ models to check for the stability of our approach
  • We systematically perform coverage tests and select the best model for each case
Test of Accuracy with Random Points (TARP) runs on the validation set for different cases
Results: Individual test scenarios
Good, but these are individual posteriors. What about the big picture ?
Results: global trends
  • $ \sigma(\bar{x}_{\rm HI}) $ decreases by ~40% from 100h to 1000h (but very statistic dependent)
  • PS2D + higher order stats: $ \sigma(\bar{x}_{\rm HI}) $ decreases by ~60% at 100h and ~40% at 1000h
Results: the relative strength of each statistic

  • Betti numbers are more informative towards the end of EoR
  • Power spectrum is more constraining in the early EoR
  • Bispectrum mildly contributes to early EoR constraints

SBI for EoR - Takeaways
  • Based on our work for SDC3b, we investigated the reionisation history through several summary statistics
  • We employed 2-pt statistics (1D and 2D power spectra) with higher order statistics (bispectrum and Betti numpers)
  • We applied a SBI framework to infer parameters from these statistics
  • Betti numbers (alone and combined with others) are particularly promising
  • We also identify different regimes of relevance for each statistic

[Cerardi et al., submitted to MNRAS, 2025]












Satellites mega constellations and the Square Kilometre Array Observatory

With Emma Tolley, Federico di Vruno and Chris Finlay
We are not alone... on the radio spectrum !
The night sky now...
Alan Dyer / amazingsky.com
Radio signals from satellites
  • Intended emissions (in ITU-R allocated bands)
  • Reflection from sunlight and terrestrial sources
  • Unintended Electromagnetic Radiation (UEMR)
Starlinks: both spectral lines and broad band features at low freqs [Di Vruno+23]
Can we forecast the future observing conditions of the SKAO?

Today
  • Starlink: 9000+
  • Oneweb: ~700
In 5-10 years
  • Starlink: ~40000
  • Oneweb: 720
  • GuoWang: ~12000
  • QianFan: ~1300
  • Leo: ~3000

  • Propagating orbits for 50 000 satellites is computationally expensive...
  • Instead, we can model the density of satellite in the sky with an analytical model from [Bassa+2021]
Satellite density maps

  • Sharp features correspond to the edges of satellite shells
  • The SKAO will be covered by almost all shells
The exposition of the SKA-Low to mega constellations

  • Significantly exposed to satellites at all frequencies, up to 100% below 100 MHz
  • Flagging will not be an option -> RFI subtraction
Trajectory-based Subtraction and Calibration

TABASCAL (Chris Finlay): forward model of telescope + RFI + astronomical sources. Joint fit at the visibility level.
  • Improves Gain calibration
  • Enable RFI modelling and subtraction

RFI subtraction vs flagging on MeerKAT simulated observations, in [Finlay+2025].
Satellites constellations - Takeaways
  • Satellites will be ubiquous in SKAO observations
  • A serious threat for low frequency science
  • Future work: forecast the RFI contamination from these constellations


Trajectory-based RFI subtraction
  • Requires prior on satellite orbits and HPC resources
  • Currently tested on real observations
Thank you for your attention !



(What also keeps me busy in Switzerland)