Implicit inference of the reionization history with higher-order statistics of the 21-cm signal

SKACH Winter Days, CSCS, Lugano, January 2026





Nicolas Cerardi

On behalf of the SEarCH team: Sambit Giri, Michele Bianco, Davide Piras, Emmanuel de Salis, Massimo de Santis, Merve Selcuk-Simsek, Philipp Denzel, Kelley Hess, Carmen Toribio, Franz Kirsten & Hatem Ghorbel
If you missed the last semester of SEarCH...

  • Participated to the SKA Data Challenge 3b: Inference
  • Goal: infer the neutral fraction $ \bar{x}_{\rm HI} $ in SKA-Low mocks at three different $ z $ to constrain the Epoch of Reionisation
How did we tackle this problem ?

Our strategy
  1. Created a large dataset of ~16000 SKA-Low mocks
    • Included instrumental noise but no foregrounds

  2. Trained 2 independent DL approaches
    • Regression task on $ \bar{x}_{\rm HI} $ with deep CNNs
      [De Salis+25]
    • Bayesian inference with Simulation-based Inference (SBI)
"I have a complex and noisy model which doesn't provide an explicit likelihood"
$ \Rightarrow $ learn the posterior directly from simulations !
Our results at SDC3b
  • Obtained great results on the PS2 data 🎉
Our results at SDC3b
  • But not ranked on the PS1 data
Goals of today's talk
  • Go beyond the data challenge with higher-order statistics
  • Give details on our SBI strategy
  • Provide statistically robust results
Extending our modelling
  • Rejecting extreme reionisation cases
  • Dataset expanded x2 with more noise realisation applied on the mocks
  • Test the noise-level dependence: 100h and 1000h SKA-Low noise considered
  • More statistics computed from the mocks
More statistics !

We compute several kinds of statistics $ t $ from the 21cm cubes
2pt statistics
1D Power Spectrum
2D Power Spectrum

Higher-order statistics
Bispectrum (equilateral & squeezed)
Betti numbers (topological invariant)

+ combinations of these metrics
The true SBI situation here
  • $ \bar{x}_{\rm HI} $ is an output of the simulation: we do not directly sample it
  • Prior on $ \bar{x}_{\rm HI} $ is implicit and non-trivial
  • Direct neural posterior estimation is the way to go !
\[\begin{aligned} p(\bar{x}_{\rm HI} | t) & \propto p(\bar{x}_{\rm HI}) p(t | \bar{x}_{\rm HI}) \\ & \propto q_\phi (\bar{x}_{\rm HI} | t) \end{aligned} \]
  • We are model-dependent on both the prior and the likelihood
SBI methodology

We apply Variational Mutual Information Maximization (VMIM, Jeffrey+20) to learn $ p(\bar{x}_{\rm HI} | t) $ by:
  • Compressing the stacked statistics $ [t_1, t_2, t_3] $ into a 3D vector $ y $
  • Maximizing the mutual information between $ y $ and $ \bar{x}_{HI} $

Can you trust me ?
SBI studies should always provide robust statistical validation of the learnt posteriors
  • For each case we train 20+ models to check for the stability of our approach
  • We systematically perform coverage tests and select the best model for each case
Test of Accuracy with Random Points (TARP) runs on the validation set for different cases
Results: Individual test scenarios
Good, but these are individual posteriors. What about the big picture ?
Results: global trends
  • $ \sigma(\bar{x}_{\rm HI}) $ decreases by ~40% from 100h to 1000h (but very statistic dependent)
  • PS2D + higher order stats: $ \sigma(\bar{x}_{\rm HI}) $ decreases by ~60% at 100h and ~40% at 1000h
Results: the relative strength of each statistic

  • Betti numbers are more informative towards the end of EoR
  • Power spectrum is more constraining in the early EoR
  • Bispectrum mildly contributes to early EoR constraints
Thanks for your attention !

Conclusions
  • Based on our work for SDC3b, we investigated the reionisation history through several summary statistics
  • We employed 2-pt statistics (1D and 2D power spectra) with higher order statistics (bispectrum and Betti numpers)
  • We applied a SBI framework to infer parameters from these statistics
  • Betti numbers (alone and combined with others) are particularly promising
  • We also identify different regimes of relevance for each statistic

[Cerardi et al., submitted to MNRAS, 2025]