I have used machine / deep learning methods, and AI approaches more generally, in a number of different projects. These include direct application to observational data, to increase the accuracy and speed of predictions.
Most of my applications of these methods has, however, been to increase the efficiency and applicability of numerical simulations. Asd an example, in (Lovell et al., 2022) we used simple tree based regression methods to learn the galaxy-halo relationship in a series of zoom simulations, and then apply this back to a large parent only volume. This allowed us to predict clustering statistics over large volumes and dynamic ranges, but using the results of high fidelity hydrodynamic simulations.
Figure showing the frameowrk used in
(Lovell et al., 2022) to learn the galaxy-halo relationship from a series of zooms, and apply to a larent dark-matter only parent simulation.
In (Lovell et al., 2023) we extended this framework using the CAMELS simulations combined with normalising flows, for probabilistic modelling of the galaxy-halo relationship. Finally, Maxwell Maltz, a student at the University of Sussex, has explored adding quantitative scatter to tree based predictions, to better recover the covariances in common distribution functions (Maltz et al., 2024).
I have also explored convolutional neural networks applied to spectra for star formation history recovery (Lovell et al., 2019). With collaborators I have worked on tree models applied to synthetic absorption spectra (Appleby et al., 2023), graph neural networks and symbolic regression for cosmological inference (Shao et al., 2023), dimensionality reduction techniques for representing SPS models (Lovell, 2021), and explored how to estimate generalization error (Acquaviva et al., 2020).
Simulation Based Inference approaches can be considered a branch of AI / deep learning, and I have done considerable work applying these methods to numerical simulations (Lovell et al., 2024).
References
2024
-
First Light and Reionisation Epoch Simulations (FLARES) XVII: Learning the galaxy-halo connection at high redshifts
Maxwell G. A. Maltz, Peter A. Thomas, Christoper C. Lovell, and 6 more authors
Oct 2024
arXiv:2410.24082
Understanding the galaxy-halo relationship is not only key for elucidating the interplay between baryonic and dark matter, it is essential for creating large mock galaxy catalogues from N-body simulations. High-resolution hydrodynamical simulations are limited to small volumes by their large computational demands, hindering their use for comparisons with wide-field observational surveys. We overcome this limitation by using the First Light and Reionisation Epoch Simulations (FLARES), a suite of high-resolution (M_gas = 1.8 x 10^6 M_Sun) zoom simulations drawn from a large, (3.2 cGpc)^3 box. We use an extremely randomised trees machine learning approach to model the relationship between galaxies and their subhaloes in a wide range of environments. This allows us to build mock catalogues with dynamic ranges that surpass those obtainable through periodic simulations. The low cost of the zoom simulations facilitates multiple runs of the same regions, differing only in the random number seed of the subgrid models; changing this seed introduces a butterfly effect, leading to random differences in the properties of matching galaxies. This randomness cannot be learnt by a deterministic machine learning model, but by sampling the noise and adding it post-facto to our predictions, we are able to recover the distributions of the galaxy properties we predict (stellar mass, star formation rate, metallicity, and size) remarkably well. We also explore the resolution-dependence of our models’ performances and find minimal depreciation down to particle resolutions of order M_DM ~ 10^8 M_Sun, enabling the future application of our models to large dark matter-only boxes.
-
Learning the Universe: Cosmological and Astrophysical Parameter Inference with Galaxy Luminosity Functions and Colours
Christopher C. Lovell, Tjitske Starkenburg, Matthew Ho, and 9 more authors
Nov 2024
arXiv:2411.13960
We perform the first direct cosmological and astrophysical parameter inference from the combination of galaxy luminosity functions and colours using a simulation based inference approach. Using the Synthesizer code we simulate the dust attenuated ultraviolet–near infrared stellar emission from galaxies in thousands of cosmological hydrodynamic simulations from the CAMELS suite, including the Swift-EAGLE, Illustris-TNG, Simba & Astrid galaxy formation models. For each galaxy we calculate the rest-frame luminosity in a number of photometric bands, including the SDSS {}textit{ugriz} and GALEX FUV & NUV filters; this dataset represents the largest catalogue of synthetic photometry based on hydrodynamic galaxy formation simulations produced to date, totalling \textgreater200 million sources. From these we compile luminosity functions and colour distributions, and find clear dependencies on both cosmology and feedback. We then perform simulation based (likelihood-free) inference using these distributions, and obtain constraints on both cosmological and astrophysical parameters. Both colour distributions and luminosity functions provide complementary information on certain parameters when performing inference. Most interestingly we achieve constraints on {}sigma_8 describing the clustering of matter. This is attributable to the fact that the photometry encodes the star formation–metal enrichment history of each galaxy; galaxies in a universe with a higher {}sigma_8 tend to form earlier and have higher metallicities, which leads to redder colours. We find that a model trained on one galaxy formation simulation generalises poorly when applied to another, and attribute this to differences in the subgrid prescriptions, and lack of flexibility in our emission modelling. The photometric catalogues are publicly available at: https://camels.readthedocs.io/ .
2023
-
A Hierarchy of Normalizing Flows for Modelling the Galaxy-Halo Relationship
Christopher C. Lovell, Sultan Hassan, Daniel Anglés-Alcázar, and 8 more authors
ICML, Jul 2023
Publication Title: arXiv e-prints ADS Bibcode: 2023arXiv230706967L
Using a large sample of galaxies taken from the Cosmology and Astrophysics with MachinE Learning Simulations (CAMELS) project, a suite of hydrodynamic simulations varying both cosmological and astrophysical parameters, we train a normalizing flow (NF) to map the probability of various galaxy and halo properties conditioned on astrophysical and cosmological parameters. By leveraging the learnt conditional relationships we can explore a wide range of interesting questions, whilst enabling simple marginalisation over nuisance parameters. We demonstrate how the model can be used as a generative model for arbitrary values of our conditional parameters; we generate halo masses and matched galaxy properties, and produce realisations of the halo mass function as well as a number of galaxy scaling relations and distribution functions. The model represents a unique and flexible approach to modelling the galaxy-halo relationship.
-
Mapping circumgalactic medium observations to theory using machine learning
Sarah Appleby, Romeel Davé, Daniele Sorini, and 2 more authors
MNRAS, Oct 2023
Publisher: OUP ADS Bibcode: 2023MNRAS.525.1167A
We present a random forest (RF) framework for predicting circumgalactic medium (CGM) physical conditions from quasar absorption line observables, trained on a sample of Voigt profile-fit synthetic absorbers from the SIMBA cosmological simulation. Traditionally, extracting physical conditions from CGM absorber observations involves simplifying assumptions such as uniform single-phase clouds, but by using a cosmological simulation we bypass such assumptions to better capture the complex relationship between CGM observables and underlying gas conditions. We train RF models on synthetic spectra for H I and selected metal lines around galaxies across a range of star formation rates, stellar masses, and impact parameters, to predict absorber overdensities, temperatures, and metallicities. The models reproduce the true values from SIMBA well, with normalized transverse standard deviations of 0.50-0.54 dex in overdensity, 0.32-0.54 dex in temperature, and 0.49-0.53 dex in metallicity predicted from metal lines (not H I), across all ions. Examining the feature importance, the RF indicates that the overdensity is most informed by the absorber column density, the temperature is driven by the line width, and the metallicity is most sensitive to the specific star formation rate. Alternatively examining feature importance by removing one observable at a time, the overdensity and metallicity appear to be more driven by the impact parameter. We introduce a normalizing flow approach in order to ensure the scatter in the true physical conditions is accurately spanned by the network. The trained models are available online.
-
A Universal Equation to Predict Ωm from Halo and Galaxy Catalogs
Helen Shao, Natalí S. M. Santi, Francisco Villaescusa-Navarro, and 14 more authors
ApJ, Oct 2023
Publisher: IOP ADS Bibcode: 2023ApJ...956..149S
We discover analytic equations that can infer the value of Ωm from the positions and velocity moduli of halo and galaxy catalogs. The equations are derived by combining a tailored graph neural network (GNN) architecture with symbolic regression. We first train the GNN on dark matter halos from Gadget N-body simulations to perform field-level likelihood-free inference, and show that our model can infer Ωm with ~6% accuracy from halo catalogs of thousands of N-body simulations run with six different codes: Abacus, CUBEP3M, Gadget, Enzo, PKDGrav3, and Ramses. By applying symbolic regression to the different parts comprising the GNN, we derive equations that can predict Ωm from halo catalogs of simulations run with all of the above codes with accuracies similar to those of the GNN. We show that, by tuning a single free parameter, our equations can also infer the value of Ωm from galaxy catalogs of thousands of state-of-the-art hydrodynamic simulations of the CAMELS project, each with a different astrophysics model, run with five distinct codes that employ different subgrid physics: IllustrisTNG, SIMBA, Astrid, Magneticum, SWIFT-EAGLE. Furthermore, the equations also perform well when tested on galaxy catalogs from simulations covering a vast region in parameter space that samples variations in 5 cosmological and 23 astrophysical parameters. We speculate that the equations may reflect the existence of a fundamental physics relation between the phase-space distribution of generic tracers and Ωm, one that is not affected by galaxy formation physics down to scales as small as 10 h -1 kpc.
2022
-
A machine learning approach to mapping baryons on to dark matter haloes using the EAGLE and C-EAGLE simulations
Christopher C. Lovell, Stephen M. Wilkins, Peter A. Thomas, and 4 more authors
MNRAS, Feb 2022
ADS Bibcode: 2022MNRAS.509.5046L
High-resolution cosmological hydrodynamic simulations are currently limited to relatively small volumes due to their computational expense. However, much larger volumes are required to probe rare, overdense environments, and measure clustering statistics of the large-scale structure. Typically, zoom simulations of individual regions are used to study rare environments, and semi-analytic models and halo occupation models applied to dark-matter-only (DMO) simulations are used to study the Universe in the large-volume regime. We propose a new approach, using a machine learning framework, to explore the halo-galaxy relationship in the periodic EAGLE simulations, and zoom C-EAGLE simulations of galaxy clusters. We train a tree-based machine learning method to predict the baryonic properties of galaxies based on their host dark matter halo properties. The trained model successfully reproduces a number of key distribution functions for an infinitesimal fraction of the computational cost of a full hydrodynamic simulation. By training on both periodic simulations and zooms of overdense environments, we learn the bias of galaxy evolution in differing environments. This allows us to apply the trained model to a larger DMO volume than would be possible if we only trained on a periodic simulation. We demonstrate this application using the (800 Mpc)3 P-Millennium simulation, and present predictions for key baryonic distribution functions and clustering statistics from the EAGLE model in this large volume.
2021
-
Sengi: A small, fast, interactive viewer for spectral outputs from stellar population synthesis models
C. C. Lovell
A&C, Jan 2021
We present Sengi, (https://christopherlovell.github.io/sengi), an online tool for viewing the spectral outputs of stellar population synthesis (SPS) codes. Typical SPS codes require significant disk space or computing resources to produce spectra for simple stellar populations with arbitrary parameters. This makes it difficult to present their results in an interactive, web-friendly format. Sengi uses Non-negative Matrix Factorisation (NMF) and bilinear interpolation to estimate output spectra for arbitrary values of stellar age and metallicity. The reduced disk requirements and computational expense allows the result to be served as a client-based Javascript application. In this paper we present the method for generating grids of spectra, fitting those grids with NMF, bilinear interpolation across the fitted coefficients, and finally provide estimates of the prediction and interpolation errors.
2020
-
Debunking Generalization Error or: How I Learned to Stop Worrying and Love My Training Set
Viviana Acquaviva, Chistopher Lovell, and Emille Ishida
NeurIPS, Nov 2020
We aim to determine some physical properties of distant galaxies (for example, stellar mass, star formation history, or chemical enrichment history) from their observed spectra, using supervised machine learning methods. We know that different astrophysical processes leave their imprint in various regions of the spectra with characteristic signatures. Unfortunately, identifying a training set for this problem is very hard, because labels are not readily available - we have no way of knowing the true history of how galaxies have formed. One possible approach to this problem is to train machine learning models on state-of-the-art cosmological simulations. However, when algorithms are trained on the simulations, it is unclear how well they will perform once applied to real data. In this paper, we attempt to model the generalization error as a function of an appropriate measure of distance between the source domain and the application domain. Our goal is to obtain a reliable estimate of how a model trained on simulations might behave on data.
2019
-
Learning the relationship between galaxies spectra and their star formation histories using convolutional neural networks and cosmological simulations
Christopher C. Lovell, Viviana Acquaviva, Peter A. Thomas, and 3 more authors
MNRAS, Dec 2019
We present a new method for inferring galaxy star formation histories (SFH) using machine learning methods coupled with two cosmological hydrodynamic simulations. We train convolutional neural networks to learn the relationship between synthetic galaxy spectra and high-resolution SFHs from the EAGLE and Illustris models. To evaluate our SFH reconstruction we use Symmetric Mean Absolute Percentage Error (SMAPE), which acts as a true percentage error in the low error regime. On dust-attenuated spectra we achieve high test accuracy (median SMAPE = 10.5 per cent). Including the effects of simulated observational noise increases the error (12.5 per cent), however this is alleviated by including multiple realizations of the noise, which increases the training set size and reduces overfitting (10.9 per cent). We also make estimates for the observational and modelling errors. To further evaluate the generalization properties we apply models trained on one simulation to spectra from the other, which leads to only a small increase in the error (median SMAPE ̃ 15 per cent). We apply each trained model to SDSS DR7 spectra, and find smoother histories than in the vespa catalogue. This new approach complements the results of existing spectral energy distribution fitting techniques, providing SFHs directly motivated by the results of the latest cosmological simulations.