SEarCH SBI 21cm

Implicit inference of the reionization history with higher-order statistics of the 21-cm signal

SKACH Winter Days, CSCS, Lugano, January 2026

Nicolas Cerardi

On behalf of the SEarCH team: Sambit Giri, Michele Bianco, Davide Piras, Emmanuel de Salis, Massimo de Santis, Merve Selcuk-Simsek, Philipp Denzel, Kelley Hess, Carmen Toribio, Franz Kirsten & Hatem Ghorbel

If you missed the last semester of SEarCH...

Participated to the SKA Data Challenge 3b: Inference
Goal: infer the neutral fraction $ \bar{x}_{\rm HI} $ in SKA-Low mocks at three different $ z $ to constrain the Epoch of Reionisation

How did we tackle this problem ?

Our strategy

Created a large dataset of ~16000 SKA-Low mocks
- Included instrumental noise but no foregrounds

Trained 2 independent DL approaches
- Regression task on $ \bar{x}_{\rm HI} $ with deep CNNs
  
  [De Salis+25]
- Bayesian inference with Simulation-based Inference (SBI)

"I have a complex and noisy model which doesn't provide an explicit likelihood"
$ \Rightarrow $ learn the posterior directly from simulations !

Our results at SDC3b

Obtained great results on the PS2 data 🎉

Our results at SDC3b

But not ranked on the PS1 data

Goals of today's talk

Go beyond the data challenge with higher-order statistics
Give details on our SBI strategy
Provide statistically robust results

Extending our modelling

Rejecting extreme reionisation cases
Dataset expanded x2 with more noise realisation applied on the mocks
Test the noise-level dependence: 100h and 1000h SKA-Low noise considered
More statistics computed from the mocks

More statistics !

We compute several kinds of statistics $ t $ from the 21cm cubes

2pt statistics

1D Power Spectrum

2D Power Spectrum

Higher-order statistics

Bispectrum (equilateral & squeezed)

Betti numbers (topological invariant)

+ combinations of these metrics

The true SBI situation here

$ \bar{x}_{\rm HI} $ is an output of the simulation: we do not directly sample it
Prior on $ \bar{x}_{\rm HI} $ is implicit and non-trivial
Direct neural posterior estimation is the way to go !

\[\begin{aligned} p(\bar{x}_{\rm HI} | t) & \propto p(\bar{x}_{\rm HI}) p(t | \bar{x}_{\rm HI}) \\ & \propto q_\phi (\bar{x}_{\rm HI} | t) \end{aligned} \]

We are model-dependent on both the prior and the likelihood

SBI methodology

We apply Variational Mutual Information Maximization (VMIM, Jeffrey+20) to learn $ p(\bar{x}_{\rm HI} | t) $ by:

Compressing the stacked statistics $ [t_1, t_2, t_3] $ into a 3D vector $ y $
Maximizing the mutual information between $ y $ and $ \bar{x}_{HI} $

Can you trust me ?

SBI studies should always provide robust statistical validation of the learnt posteriors

For each case we train 20+ models to check for the stability of our approach
We systematically perform coverage tests and select the best model for each case

Test of Accuracy with Random Points (TARP) runs on the validation set for different cases

Results: Individual test scenarios

Good, but these are individual posteriors. What about the big picture ?

Results: global trends

$ \sigma(\bar{x}_{\rm HI}) $ decreases by ~40% from 100h to 1000h (but very statistic dependent)
PS2D + higher order stats: $ \sigma(\bar{x}_{\rm HI}) $ decreases by ~60% at 100h and ~40% at 1000h

Results: the relative strength of each statistic

Betti numbers are more informative towards the end of EoR
Power spectrum is more constraining in the early EoR
Bispectrum mildly contributes to early EoR constraints

Thanks for your attention !

Conclusions

Based on our work for SDC3b, we investigated the reionisation history through several summary statistics
We employed 2-pt statistics (1D and 2D power spectra) with higher order statistics (bispectrum and Betti numpers)
We applied a SBI framework to infer parameters from these statistics
Betti numbers (alone and combined with others) are particularly promising
We also identify different regimes of relevance for each statistic

[Cerardi et al., submitted to MNRAS, 2025]

nicolas-cerardi.github.io/talks/SKACHWinter2026_SEarCH