Comparing Regression Data

Google Summer of Code 2025 Project Report

Introduction

TARDIS is an open-source Monte Carlo radiative-transfer spectral synthesis code for 1D models of supernova ejecta. It is designed for rapid spectral modelling of supernovae.

TARDIS relies on a regression data framework in its testing to maintain its scientific accuracy. This regression data framework saves large TARDIS objects such as its simulation data into HDF files which are tracked in an external repository using git LFS. These HDF files are then retrieved during testing and the data produced at runtime is compared to this saved dataset to verify accuracy of the code.

The problem: Regression data is modified occasionally when new TARDIS features are added. HDF files, given their recursive internal structure, are hard to compare with other HDF files. Moreover, given multiple TARDIS commits, each of which are adding important features, there wasn't any functionality to compare the impact of each of these commits on the regression data and compare them together.

Project Summary

Goal: Build a comprehensive regression data comparison notebook to track and visualize how TARDIS regression files change across different commits.

1Automated Commit Testing

The process begins by fetching tardis commit hashes from the tardis repository. The commits can be retrieved through two different methods: fetching the last n tardis commits or providing a custom commit list. Once the commits are obtained through either approach, they are then processed for the next step.

2Environment Management

The system follows a structured workflow to create TARDIS environments for each commit to ensure regression data reproducibility. It provides various options and provides detailed configurable log statements at each step.

The workflow by default automatically fetches the environment file for each commit and creates a fresh environment for it to process the regression data in. TARDIS uses conda lockfiles which have precise dependency versions for recreating the environment.

Once the environment is installed, the workflow also fetches the optional dependencies from pyproject.toml automatically(TARDIS has some pip based dependencies as optional dependency for visualisation modules). After installing the dependencies, the workflow proceeds to run tests for that commit.

The workflow also allows to reuse environments if they have already been created in previous runs.

At any step, if the workflow fails to process any step, for e.g., installing optional dependencies, it will default to the default environment but will continue to process commits. The system warns the user that such failure has occurred, since failed installations and the regression data produced from such commits won't be as reliable from a scientific perspective.

INFO:tardisbase.testing.regression_comparison.run_tests:Original HEAD of regression data repo: 7a61fc3a19eba4d08008c599ce39bdf24ad678ca

INFO:tardisbase.testing.regression_comparison.run_tests:Processing commit 1/3: 701bb18916886ecf2797b0dda4843750f69592da

INFO:tardisbase.testing.regression_comparison.run_tests:Creating conda environment: tardis-test-701bb189

INFO:tardisbase.testing.regression_comparison.run_tests:Checking if environment tardis-test-701bb189 exists...

INFO:tardisbase.testing.regression_comparison.run_tests:Executing command: conda env list

INFO:tardisbase.testing.regression_comparison.run_tests:Command completed successfully.

INFO:tardisbase.testing.regression_comparison.run_tests:Environment tardis-test-701bb189 exists, removing it for recreation...

INFO:tardisbase.testing.regression_comparison.run_tests:Executing command: conda env remove --name tardis-test-701bb189 -y

INFO:tardisbase.testing.regression_comparison.run_tests:Command completed successfully.

INFO:tardisbase.testing.regression_comparison.run_tests:Creating conda environment

INFO:tardisbase.testing.regression_comparison.run_tests:Executing command: conda create --name tardis-test-701bb189 --file /tmp/tmphaib48h8.lock -y

INFO:tardisbase.testing.regression_comparison.run_tests:Command completed successfully.

INFO:tardisbase.testing.regression_comparison.run_tests:Installing TARDIS with all extras ['viz', 'tardisbase']

INFO:tardisbase.testing.regression_comparison.run_tests:Executing command: conda run -n tardis-test-701bb189 pip install -e /home/riddhigangbhoj/tardis-work/tardis[viz,tardisbase]

INFO:tardisbase.testing.regression_comparison.run_tests:Command completed successfully.

3Testing

For each commit, the system executes pytest to run both continuum and non-continuum test suites. Once testing is complete, the generated regression data from these test runs is committed to the repository, creating a new commit that captures the test results.

INFO:tardisbase.testing.regression_comparison.run_tests:

Processed Tardis Commits:

INFO:tardisbase.testing.regression_comparison.run_tests:701bb18916886ecf2797b0dda4843750f69592da

INFO:tardisbase.testing.regression_comparison.run_tests:ce43cec0fa5d9255108c90c84659c71d34fb1c26

INFO:tardisbase.testing.regression_comparison.run_tests:fa4c4ea98055ea3bef24d69feba26fb5f74c2ddf

INFO:tardisbase.testing.regression_comparison.run_tests:

Regression Data Commits:

INFO:tardisbase.testing.regression_comparison.run_tests:effdb1b68bf069630446c837d5f79ef602dd29c7

INFO:tardisbase.testing.regression_comparison.run_tests:6de6fb12155b6fc55d0fb808b1795175d4ee7015

INFO:tardisbase.testing.regression_comparison.run_tests:6a33cdd4c32a2715ee6c8e7e29c999f580eaadcf

4Getting regression Data commits

There are two ways to proceed after this step. The first option is to use falsely generated regression commits that are created programmatically from the tardis commits obtained in the previous step. The second option is to fetch regression data commits directly from git, which can be done by either retrieving the last n commits or by providing custom commit hashes.

5Comparison of Regression Data commits

Once the regression data commits have been finalised, it's time to move to the comparison step. If you wish, you can also preview the commits with details about how they were generated before moving forward.

#	Hash	Description	Date
1	effdb1	Regression data for --Relativity BugFix [2] (#3176)	2025-08-28 03:26
2	6de6fb	Regression data for --add from workflow method to sdec and liv plot (#3198)	2025-08-28 03:50
3	6a33cd	Regression data for --Post-release 2025.07.20 (#3201)	2025-08-28 04:13

The workflow compares two commits at a time, running git diff on each pair. For more comprehensive testing, the workflow automatically copies over contents from each item into temporary directories. The current code only allows git based diffs and command line based diffs but is modular so that new comparison methods can be added easily.

There are filtering options for .H5 and .npy files, too in case you want to just view certain types of files in the comparison.

The comparison matrix creates a table where each column is compared between two consecutive commits, and each cell under those columns is either modified(M), unchanged(•), not-present(-), deleted(D) or added(A).

Sample Comparison Results (Index 160-170):

Index	File Path	32c5e0-e1656c	8ee899-32c5e0
160	tardis/tests/test_tardis_full_formal_integral/test_transport_simple_formal_integral/test_spectrum_integrated__30-downbranch__.npy	•	•
161	tardis/tests/test_tardis_full_formal_integral/test_transport_simple_formal_integral/test_spectrum_integrated__30-macroatom__.npy	•	•
162	tardis/transport/montecarlo/tests/test_continuum/test_montecarlo_continuum.h5	M	M
163	tardis/transport/montecarlo/tests/test_montecarlo_main_loop/test_montecarlo_main_loop.h5	M	M
164	tardis/transport/montecarlo/tests/test_montecarlo_main_loop/test_montecarlo_main_loop_vpacket_log.h5	M	M
165	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source/test_bb_attributes.h5	M	M
166	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source/test_bb_energies.npy	•	•
167	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source/test_bb_mus.npy	•	•
168	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source/test_bb_nus.npy	•	•
169	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source_rel/test_bb_attributes.h5	M	M
170	tardis/transport/montecarlo/tests/test_packet_source/test_black_body_simple_source_rel/test_bb_energies.npy	•	•

Legend: • = No change, M = Modified

View full notebook

Acknowledgement

I am deeply grateful to my mentors Wolfgang Kerzendorf, Andrew Fullard, Atharva Arya, and Abhinav Ohri for their invaluable guidance throughout this project. Their consistent feedback and direction have been really important for my development.

The environment in TARDIS is really great for learning and growing and the team has been incredibly supportive. I'm grateful for the opportunity to be part of this team.