Calibration

STIsim provides a streamlined calibration framework designed for the practical realities of fitting HIV and STI models to data. It’s built on Starsim’s calibration tools and Optuna, but makes a number of opinionated choices so you can go from data to calibrated model with minimal boilerplate.

How it works (and what choices we’ve made)

STIsim’s calibration uses approximate Bayesian computation (ABC) – the idea is to find parameters that minimize the difference between model output and observed data. For each trial, Optuna proposes a set of parameters, the model runs, and we compute a goodness-of-fit (GOF) score. After many trials, the best-scoring parameter sets approximate the posterior distribution.

Specifically, sti.Calibration makes these choices for you:

GOF metric: Normalized absolute error between model and data, summed across all targets. You can weight targets differently (e.g., weight syphilis prevalence 10x higher than HIV incidence).
Parameter routing: Dot notation ('hiv.beta_m2f') automatically finds and sets the right module parameter – no custom build_fn needed.
Data format: A simple CSV/DataFrame with time (year) and columns matching sim result names.

This approach has been used extensively in HPVsim and Covasim and tends to perform well for STI/HIV fitting. But it’s not the only approach:

Want a different GOF metric? Pass a custom eval_fn to sti.Calibration (e.g., mean squared error, log-likelihood).
Want statistically rigorous components? Use Starsim’s CalibComponent system directly (Normal, BetaBinomial, etc.) – see the Starsim calibration tutorial.
Want full Bayesian inference? Starsim supports sampling-importance resampling – see the Starsim calibration tutorial.

The rest of this tutorial covers the streamlined STIsim workflow.

Setup: generate synthetic data

To demonstrate calibration without external data files, we’ll generate synthetic targets from a simulation with known “true” parameters. In practice, you’d load survey data (e.g., DHS, PHIA) instead.

import numpy as np
import pandas as pd
import sciris as sc
import starsim as ss
import stisim as sti

# "True" parameters we'll try to recover
true_beta = 0.08
true_condom = 0.6

# Create and run a sim with the true parameters
true_sim = sti.Sim(
    diseases=sti.Gonorrhea(beta_m2f=true_beta, eff_condom=true_condom),
    n_agents=2000, start=2010, stop=2030,
)
true_sim.run(verbose=0)

# Extract yearly prevalence and incidence as our "data"
df = true_sim.to_df(resample='year', use_years=True, sep='.')
data = df[['timevec', 'ng.prevalence', 'ng.new_infections']].dropna()
data = data.rename(columns={'timevec': 'time'})
data['time'] = data['time'].astype(int)
print(f'Generated {len(data)} data points')
data.head()

Initializing sim with 2000 agents
Generated 21 data points

	time	ng.prevalence	ng.new_infections
0	2010	0.028121	189.0
1	2011	0.028945	143.0
2	2012	0.025657	148.0
3	2013	0.021480	116.0
4	2014	0.019359	129.0

Defining calibration parameters

STIsim uses dot notation to route parameters to the right module: 'module_name.par_name'. The module name is whatever sim.get_module() would find – typically the disease name (ng, hiv, syph), network name (structuredsexual), connector name (hiv_syph), or intervention name (art, fsw_testing).

You can write parameters in two equivalent formats:

# Nested format -- group parameters by module (Pythonic, uses keyword syntax)
calib_pars = dict(
    ng=dict(
        beta_m2f=  dict(low=0.03, high=0.15, guess=0.06),
        eff_condom=dict(low=0.3,  high=0.9,  guess=0.5),
    ),
)

# Flat format -- equivalent, uses string keys
calib_pars_flat = {
    'ng.beta_m2f':   dict(low=0.03, high=0.15, guess=0.06),
    'ng.eff_condom':  dict(low=0.3,  high=0.9,  guess=0.5),
}

# Both produce the same result after flattening
print(sti.flatten_calib_pars(calib_pars))
print(calib_pars_flat)

{'ng.beta_m2f': {'low': 0.03, 'high': 0.15, 'guess': 0.06}, 'ng.eff_condom': {'low': 0.3, 'high': 0.9, 'guess': 0.5}}
{'ng.beta_m2f': {'low': 0.03, 'high': 0.15, 'guess': 0.06}, 'ng.eff_condom': {'low': 0.3, 'high': 0.9, 'guess': 0.5}}

Each parameter spec requires low and high bounds. The optional guess is used as the starting point for the “before” comparison in check_fit().

The nested format is particularly convenient when calibrating multiple modules – parameters are visually grouped:

# Multi-module example (not run here -- just to show the pattern)
multi_pars = dict(
    hiv=dict(
        beta_m2f=dict(low=0.002, high=0.014, guess=0.006),
        eff_condom=dict(low=0.5, high=0.9, guess=0.75),
    ),
    structuredsexual=dict(
        prop_f0=dict(low=0.55, high=0.9, guess=0.7),
        m1_conc=dict(low=0.05, high=0.3, guess=0.15),
    ),
    syph=dict(
        beta_m2f=dict(low=0.15, high=0.35, guess=0.2),
    ),
)
print(f'{len(sti.flatten_calib_pars(multi_pars))} parameters across 3 modules')

5 parameters across 3 modules

Running a calibration

Create a Calibration with the sim, parameters, and data. No custom build_fn is needed – STIsim’s default automatically routes parameters using dot notation.

sim = sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030, verbose=-1)

calib = sti.Calibration(
    sim=sim,
    calib_pars=calib_pars,
    data=data,
    total_trials=10,
    n_workers=1,
)
calib.calibrate()
print(f'Best parameters: {calib.best_pars}')

Removed existing calibration file /tmp/tmp0zlmeno_/starsim_calibration.db
sqlite:////tmp/tmp0zlmeno_/starsim_calibration.db

[I 2026-07-24 17:21:22,235] A new study created in RDB with name: starsim_calibration

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.24 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.34 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.45 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.55 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.66 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.77 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.87 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.97 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.06 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.16 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.26 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.35 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.45 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.55 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.65 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.75 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.85 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (1.95 s)  •••••••••••••••••••— 95%

[I 2026-07-24 17:21:24,921] Trial 0 finished with value: 24.455711120224066 and parameters: {'ng.beta_m2f': 0.07745266538397222, 'ng.eff_condom': 0.5304157756897214, 'rand_seed': 720709}. Best is trial 0 with value: 24.455711120224066.


  Running 2030 (240/241) (2.05 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.13 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.25 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.36 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.47 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.58 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.68 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.80 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.91 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (1.01 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.12 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.23 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.33 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.43 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.53 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.63 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.73 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.84 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.94 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (2.04 s)  •••••••••••••••••••— 95%

[I 2026-07-24 17:21:27,633] Trial 1 finished with value: 24.156431535312798 and parameters: {'ng.beta_m2f': 0.12115296092786983, 'ng.eff_condom': 0.31983891279477683, 'rand_seed': 44458}. Best is trial 1 with value: 24.156431535312798.


  Running 2030 (240/241) (2.14 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.22 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.33 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.43 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.53 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.63 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.74 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.84 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.95 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.06 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.16 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.27 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.36 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.46 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.56 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.66 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.77 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.87 s)  ••••••••••••••••••—— 90%

[I 2026-07-24 17:21:30,296] Trial 2 finished with value: 24.551971648779443 and parameters: {'ng.beta_m2f': 0.12456014985983911, 'ng.eff_condom': 0.3291057834702714, 'rand_seed': 83332}. Best is trial 1 with value: 24.156431535312798.


  Running 2029 (228/241) (1.98 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.10 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.22 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.32 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.42 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.53 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.64 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.75 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.86 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.96 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.07 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.17 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.28 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.39 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.49 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.59 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.69 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.79 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.90 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (1.99 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.09 s)  •••••••••••••••••••• 100%

[I 2026-07-24 17:21:33,131] Trial 3 finished with value: 24.643679904452284 and parameters: {'ng.beta_m2f': 0.08010694600261908, 'ng.eff_condom': 0.36020114939755915, 'rand_seed': 414730}. Best is trial 1 with value: 24.156431535312798.

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.23 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.33 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.43 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.53 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.64 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.74 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.85 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.96 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.07 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.18 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.28 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.39 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.50 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.60 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.71 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.81 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.92 s)  ••••••••••••••••••—— 90%

[I 2026-07-24 17:21:35,824] Trial 4 finished with value: 24.629952793301022 and parameters: {'ng.beta_m2f': 0.13804460806882837, 'ng.eff_condom': 0.8584723078876069, 'rand_seed': 317440}. Best is trial 1 with value: 24.156431535312798.


  Running 2029 (228/241) (2.03 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.13 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.23 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.34 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.44 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.54 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.65 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.74 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.85 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.95 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.05 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.15 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.25 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.37 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.47 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.57 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.68 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.78 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.88 s)  ••••••••••••••••••—— 90%

[I 2026-07-24 17:21:38,491] Trial 5 finished with value: 24.759538196029723 and parameters: {'ng.beta_m2f': 0.030562999167146745, 'ng.eff_condom': 0.6074309953405739, 'rand_seed': 784146}. Best is trial 1 with value: 24.156431535312798.


  Running 2029 (228/241) (2.00 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.10 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.24 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.36 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.46 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.57 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.67 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.77 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.87 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.97 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.08 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.18 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.29 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.41 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.52 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.63 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.74 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.85 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.96 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (2.07 s)  •••••••••••••••••••— 95%

[I 2026-07-24 17:21:41,239] Trial 6 finished with value: 24.468283063948398 and parameters: {'ng.beta_m2f': 0.10884558713306167, 'ng.eff_condom': 0.8813811160862943, 'rand_seed': 780719}. Best is trial 1 with value: 24.156431535312798.


  Running 2030 (240/241) (2.18 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.13 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.25 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.37 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.47 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.58 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.68 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.80 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.90 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (1.00 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.10 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.21 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.31 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.42 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.52 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.63 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.73 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.84 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.95 s)  ••••••••••••••••••—— 90%

[I 2026-07-24 17:21:43,967] Trial 7 finished with value: 23.88912606515334 and parameters: {'ng.beta_m2f': 0.067186171345261, 'ng.eff_condom': 0.4702691137742058, 'rand_seed': 994166}. Best is trial 7 with value: 23.88912606515334.


  Running 2029 (228/241) (2.05 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.16 s)  •••••••••••••••••••• 100%

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.13 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.24 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.36 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.46 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.57 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.67 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.78 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.88 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.98 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.09 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.20 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.31 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.42 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.52 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.63 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.73 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.83 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.94 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (2.04 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.13 s)  •••••••••••••••••••• 100%

[I 2026-07-24 17:21:46,666] Trial 8 finished with value: 23.804471847589543 and parameters: {'ng.beta_m2f': 0.07824890896870898, 'ng.eff_condom': 0.4616509638628863, 'rand_seed': 827304}. Best is trial 8 with value: 23.804471847589543.

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.13 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.25 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.38 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.51 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.63 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.75 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.86 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.96 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (1.06 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.16 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.26 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.36 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.46 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.56 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.67 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.77 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.87 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.97 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (2.08 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.18 s)  •••••••••••••••••••• 100%

[I 2026-07-24 17:21:49,418] Trial 9 finished with value: 21.628710669602693 and parameters: {'ng.beta_m2f': 0.13565155764315975, 'ng.eff_condom': 0.8661072789352529, 'rand_seed': 77628}. Best is trial 9 with value: 21.628710669602693.

Making results structure...
Processed 10 trials; 0 failed
Best pars: {'ng.beta_m2f': 0.13565155764315975, 'ng.eff_condom': 0.8661072789352529, 'rand_seed': 77628}
Removed existing calibration file /tmp/tmp0zlmeno_/starsim_calibration.db
Best parameters: {'ng.beta_m2f': 0.13565155764315975, 'ng.eff_condom': 0.8661072789352529, 'rand_seed': 77628}

The calibration found parameters that minimize the mismatch between model output and data. Let’s see how they compare to the true values:

true_pars = {'ng.beta_m2f': true_beta, 'ng.eff_condom': true_condom}
for par, true_val in true_pars.items():
    best_val = calib.best_pars[par]
    print(f'{par}: true={true_val:.3f}, calibrated={best_val:.3f}')

ng.beta_m2f: true=0.080, calibrated=0.136
ng.eff_condom: true=0.600, calibrated=0.866

Visualizing the calibration

STIsim inherits several diagnostic tools from Starsim for inspecting calibration results:

check_fit() runs sims with the guess parameters and the best-fit parameters side by side, comparing GOF scores
plot_final() runs the best-fit parameters and plots the resulting sim
plot_optuna() shows Optuna diagnostic plots (optimization history, parameter importance, etc.)

Note: plot_param_importances requires scikit-learn (pip install scikit-learn). If it’s not installed, the plot will be skipped with a warning.

# Compare GOF scores: guess parameters vs best-fit parameters
calib.check_fit(do_plot=False)



Checking fit...

Initializing sim with 500 agents

Initializing sim with 500 agents


  Running "Sim 0": 2010 ( 0/241) (0.00 s)  ———————————————————— 0%



  Running "Sim 0": 2011 (12/241) (0.13 s)  •——————————————————— 5%



  Running "Sim 0": 2012 (24/241) (0.24 s)  ••—————————————————— 10%



  Running "Sim 0": 2013 (36/241) (0.34 s)  •••————————————————— 15%



  Running "Sim 0": 2010 ( 0/241) (0.00 s)  ———————————————————— 0%



  Running "Sim 0": 2014 (48/241) (0.45 s)  ••••———————————————— 20%



  Running "Sim 0": 2015 (60/241) (0.55 s)  •••••——————————————— 25%



  Running "Sim 0": 2011 (12/241) (0.14 s)  •——————————————————— 5%



  Running "Sim 0": 2016 (72/241) (0.65 s)  ••••••—————————————— 30%



  Running "Sim 0": 2012 (24/241) (0.26 s)  ••—————————————————— 10%



  Running "Sim 0": 2017 (84/241) (0.76 s)  •••••••————————————— 35%



  Running "Sim 0": 2013 (36/241) (0.37 s)  •••————————————————— 15%



  Running "Sim 0": 2018 (96/241) (0.87 s)  ••••••••———————————— 40%



  Running "Sim 0": 2014 (48/241) (0.47 s)  ••••———————————————— 20%



  Running "Sim 0": 2019 (108/241) (0.98 s)  •••••••••——————————— 45%



  Running "Sim 0": 2015 (60/241) (0.59 s)  •••••——————————————— 25%



  Running "Sim 0": 2020 (120/241) (1.10 s)  ••••••••••—————————— 50%



  Running "Sim 0": 2016 (72/241) (0.69 s)  ••••••—————————————— 30%



  Running "Sim 0": 2021 (132/241) (1.21 s)  •••••••••••————————— 55%



  Running "Sim 0": 2017 (84/241) (0.80 s)  •••••••————————————— 35%



  Running "Sim 0": 2018 (96/241) (0.90 s)  ••••••••———————————— 40%



  Running "Sim 0": 2022 (144/241) (1.33 s)  ••••••••••••———————— 60%



  Running "Sim 0": 2019 (108/241) (1.00 s)  •••••••••——————————— 45%



  Running "Sim 0": 2023 (156/241) (1.44 s)  •••••••••••••——————— 65%



  Running "Sim 0": 2020 (120/241) (1.11 s)  ••••••••••—————————— 50%



  Running "Sim 0": 2024 (168/241) (1.54 s)  ••••••••••••••—————— 70%



  Running "Sim 0": 2025 (180/241) (1.66 s)  •••••••••••••••————— 75%



  Running "Sim 0": 2021 (132/241) (1.24 s)  •••••••••••————————— 55%



  Running "Sim 0": 2026 (192/241) (1.76 s)  ••••••••••••••••———— 80%



  Running "Sim 0": 2022 (144/241) (1.35 s)  ••••••••••••———————— 60%



  Running "Sim 0": 2027 (204/241) (1.87 s)  •••••••••••••••••——— 85%



  Running "Sim 0": 2023 (156/241) (1.46 s)  •••••••••••••——————— 65%



  Running "Sim 0": 2028 (216/241) (1.98 s)  ••••••••••••••••••—— 90%



  Running "Sim 0": 2024 (168/241) (1.57 s)  ••••••••••••••—————— 70%



  Running "Sim 0": 2029 (228/241) (2.08 s)  •••••••••••••••••••— 95%



  Running "Sim 0": 2025 (180/241) (1.67 s)  •••••••••••••••————— 75%



  Running "Sim 0": 2030 (240/241) (2.20 s)  •••••••••••••••••••• 100%





  Running "Sim 0": 2026 (192/241) (1.78 s)  ••••••••••••••••———— 80%



  Running "Sim 0": 2027 (204/241) (1.89 s)  •••••••••••••••••——— 85%



  Running "Sim 0": 2028 (216/241) (1.99 s)  ••••••••••••••••••—— 90%



  Running "Sim 0": 2029 (228/241) (2.10 s)  •••••••••••••••••••— 95%



  Running "Sim 0": 2030 (240/241) (2.21 s)  •••••••••••••••••••• 100%




Fit with original pars: 24.590598855659472

Fit with best-fit pars: 24.0106328240419

✓ Calibration improved fit 24.590598855659472 --> 24.0106328240419

True

# Plot the sim with best-fit parameters
calib.plot_final()

Initializing sim with 500 agents

  Running 2010 ( 0/241) (0.00 s)  ———————————————————— 0%

  Running 2011 (12/241) (0.12 s)  •——————————————————— 5%

  Running 2012 (24/241) (0.24 s)  ••—————————————————— 10%

  Running 2013 (36/241) (0.34 s)  •••————————————————— 15%

  Running 2014 (48/241) (0.44 s)  ••••———————————————— 20%

  Running 2015 (60/241) (0.54 s)  •••••——————————————— 25%

  Running 2016 (72/241) (0.63 s)  ••••••—————————————— 30%

  Running 2017 (84/241) (0.74 s)  •••••••————————————— 35%

  Running 2018 (96/241) (0.84 s)  ••••••••———————————— 40%

  Running 2019 (108/241) (0.94 s)  •••••••••——————————— 45%

  Running 2020 (120/241) (1.04 s)  ••••••••••—————————— 50%

  Running 2021 (132/241) (1.16 s)  •••••••••••————————— 55%

  Running 2022 (144/241) (1.27 s)  ••••••••••••———————— 60%

  Running 2023 (156/241) (1.37 s)  •••••••••••••——————— 65%

  Running 2024 (168/241) (1.47 s)  ••••••••••••••—————— 70%

  Running 2025 (180/241) (1.57 s)  •••••••••••••••————— 75%

  Running 2026 (192/241) (1.67 s)  ••••••••••••••••———— 80%

  Running 2027 (204/241) (1.78 s)  •••••••••••••••••——— 85%

  Running 2028 (216/241) (1.88 s)  ••••••••••••••••••—— 90%

  Running 2029 (228/241) (1.98 s)  •••••••••••••••••••— 95%

  Running 2030 (240/241) (2.09 s)  •••••••••••••••••••• 100%

# Optuna diagnostics: optimization history and parameter importance
calib.plot_optuna(['plot_optimization_history', 'plot_param_importances'])

Could not run plot_param_importances: Tried to import 'sklearn' but failed. Please make sure that the package is installed correctly to use this feature. Actual error: No module named 'sklearn'.

/opt/hostedtoolcache/Python/3.13.14/x64/lib/python3.13/site-packages/starsim/calibration.py:447: ExperimentalWarning: optuna.visualization.matplotlib._optimization_history.plot_optimization_history is experimental (supported from v2.2.0). The interface can change in the future.
  fig = getattr(vis, method)(self.study)
/opt/hostedtoolcache/Python/3.13.14/x64/lib/python3.13/site-packages/starsim/calibration.py:447: ExperimentalWarning: optuna.visualization.matplotlib._param_importances.plot_param_importances is experimental (supported from v2.2.0). The interface can change in the future.
  fig = getattr(vis, method)(self.study)

With only 10 trials and 500 agents, recovery won’t be perfect. In practice, you’d use 1000-2000 trials with 5000-10000 agents.

Extracting calibrated parameters

After calibration, use get_pars() to extract the top parameter sets as flat dicts ready to feed back into a sim:

# Get top 3 parameter sets (sorted by mismatch)
par_sets = calib.get_pars(n=3)
for i, pars in enumerate(par_sets):
    print(f'Set {i}: {pars}')

Set 0: {'ng.beta_m2f': 0.13565155764315975, 'ng.eff_condom': 0.8661072789352529}
Set 1: {'ng.beta_m2f': 0.07824890896870898, 'ng.eff_condom': 0.4616509638628863}
Set 2: {'ng.beta_m2f': 0.067186171345261, 'ng.eff_condom': 0.4702691137742058}

You can also access the full results DataFrame via calib.df, which includes all parameters plus the mismatch value for every trial.

Running calibrated sims with `make_calib_sims`

The most common post-calibration task is running the top N parameter sets to generate results with uncertainty. make_calib_sims() handles this in one call:

# Run top 3 parameter sets
msim = sti.make_calib_sims(calib=calib, n_parsets=3, verbose=-1)
print(f'Ran {len(msim.sims)} simulations')
print(f'Each sim has par_idx: {[s.par_idx for s in msim.sims]}')

Initializing sim "Sim 1" with 500 agents
Initializing sim "Sim 0" with 500 agents
Initializing sim "Sim 2" with 500 agents
Ran 3 simulations
Each sim has par_idx: [0, 1, 2]

You can also pass calibration parameters directly – useful when loading saved results:

# From a list of parameter dicts
msim = sti.make_calib_sims(
    calib_pars=par_sets,
    sim=sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030, verbose=-1),
    verbose=-1,
)

# From a DataFrame (like calib.df)
msim = sti.make_calib_sims(
    calib_pars=calib.df,
    sim=sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030, verbose=-1),
    n_parsets=3, verbose=-1,
)

Initializing sim "Sim 0" with 500 agents
Initializing sim "Sim 1" with 500 agents
Initializing sim "Sim 2" with 500 agents
Initializing sim "Sim 0" with 500 agents
Initializing sim "Sim 1" with 500 agents
Initializing sim "Sim 2" with 500 agents

Filtering with `check_fn`

Some parameter combinations may produce epidemiologically implausible results (e.g., disease die-out). Pass a check_fn to filter:

def check_ng_alive(sim):
    """Reject sims where gonorrhea died out."""
    return float(sim.results.ng.new_infections[-12:].sum()) > 0

msim = sti.make_calib_sims(
    calib=calib, n_parsets=3,
    check_fn=check_ng_alive, verbose=-1,
)
print(f'Kept {len(msim.sims)} sims after filtering')

Initializing sim "Sim 0" with 500 agents
Initializing sim "Sim 1" with 500 agents
Initializing sim "Sim 2" with 500 agents
Dropped 3/3 sims via check_fn
Kept 0 sims after filtering

Multiple seeds per parameter set

For stochastic robustness, run each parameter set with multiple random seeds. When combined with check_fn, only the first surviving seed per parameter set is kept:

msim = sti.make_calib_sims(
    calib=calib, n_parsets=3,
    seeds_per_par=2,
    check_fn=check_ng_alive, verbose=-1,
)
print(f'Kept {len(msim.sims)} sims (1 per par set, best surviving seed)')

Initializing sim "Sim 0" with 500 agents
Initializing sim "Sim 1" with 500 agents
Initializing sim "Sim 2" with 500 agents
Initializing sim "Sim 3" with 500 agents
Initializing sim "Sim 4" with 500 agents
Initializing sim "Sim 5" with 500 agents
Dropped 5/6 sims via check_fn
Kept 1 sims (1 per par set, best surviving seed)

Saving and loading

After calibration, use calib.save() to save the calibration object and parameter DataFrame. This handles shrinking (keeping only the top results to reduce file size) automatically:

# Save with shrinking (keeps top 10% by default)
calib.save('tutorial_output/my_calib.obj')

# Load back and use
loaded = sc.load('tutorial_output/my_calib.obj')
print(f'Loaded calibration with {len(loaded.df)} parameter sets')

Saved calibration to tutorial_output/my_calib.obj
Saved parameters to tutorial_output/my_calib_pars.df
Loaded calibration with 10 parameter sets

Fitting to any target: if you can measure it, you can fit to it

The key idea behind STIsim’s calibration is simple: any result that appears in sim.to_df() can be a calibration target. You just need to include a column with the same name in your data file.

Built-in disease results (prevalence, incidence, etc.) are always available. But what if you need something custom – say, HIV-syphilis coinfection prevalence among adolescent girls? Any module (analyzers, interventions, connectors, etc.) can define custom results using define_results – see Custom results for the full explanation. Add the result to your sim, add a matching column to your data, and the calibration fits to it automatically.

Here’s an example using an analyzer:

class CoinfectionPrev(ss.Analyzer):
    """Measure HIV-syphilis coinfection prevalence among 15-24 year old females."""

    def init_pre(self, sim):
        super().init_pre(sim)
        self.define_results(
            ss.Result('coinf_prev_f_15_24', dtype=float, scale=False, label='Coinfection prev (F 15-24)'),
        )

    def step(self):
        ppl = self.sim.people                                                    # Get the people object
        target = ppl.female & (ppl.age >= 15) & (ppl.age < 25)                  # Boolean mask: alive females aged 15-24
        n_target = target.count()                                                # Count how many match
        if n_target > 0:
            hiv = self.sim.diseases.hiv.infected                                 # Boolean: who has HIV
            syph = self.sim.diseases.syph.infected                               # Boolean: who has syphilis
            has_both = hiv & syph & target                                       # Intersection: coinfected in target group
            self.results['coinf_prev_f_15_24'][self.ti] = has_both.count() / n_target  # Store as prevalence

Now include this analyzer when creating your sim, and add the corresponding column to your data file:

# Add the analyzer to your sim
sim = sti.Sim(
    diseases=[sti.HIV(), sti.Syphilis()],
    n_agents=500, start=2010, stop=2030, verbose=-1,
    analyzers=[CoinfectionPrev()],
)

# Your data CSV just needs a matching column:
#
# time, hiv.prevalence, syph.symptomatic_prevalence, coinfectionprev.coinf_prev_f_15_24
# 2016, 0.12,           0.03,                   0.005
# 2020, 0.11,           0.025,                  0.004

The column name follows the same dot notation: analyzer_name.result_name. When the calibration runs, it will automatically compare the model’s coinfection prevalence against your data, alongside the other targets. No changes to the calibration code are needed – just add the column to the data and the analyzer to the sim.

This pattern works for any custom quantity: age-specific prevalence, risk-group-stratified incidence, treatment coverage by pathway, etc. If you can compute it in an analyzer, you can fit to it.

Data format and weights

The calibration expects a DataFrame (or CSV) with a time column (integer years) and columns matching simulation result names:

time	hiv.prevalence	syph.symptomatic_prevalence	hiv.n_on_art
2010	0.12	0.03	50000
2015	0.11	0.025	120000
2020	0.098	0.02	180000

Missing years are fine – the calibration only compares at timepoints where data exists. To see what result names are available, run sim.to_df().columns.

Not all targets are equally informative. Use weights to tell the calibration which data points matter most – for example, if you have high-quality survey data for syphilis but only routine program data for HIV:

weights = {
    'hiv.prevalence': 2.0,              # Routine data -- moderate weight
    'syph.symptomatic_prevalence': 10.0,     # PHIA survey -- high weight
    'hiv.n_on_art': 1.0,                # Program data -- default weight
}

Typical production workflow

A complete calibration analysis typically has three scripts:

# 1. run_calibrations.py -- find best parameters
sim = sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030, verbose=-1)
data = pd.read_csv('data/calibration_targets.csv')

calib = sti.Calibration(
    sim=sim,
    calib_pars=dict(
        hiv=dict(beta_m2f=dict(low=0.002, high=0.014, guess=0.006)),
        structuredsexual=dict(prop_f0=dict(low=0.55, high=0.9, guess=0.7)),
    ),
    data=data,
    weights={'hiv.prevalence': 5.0},
    total_trials=2000,
)
calib.calibrate()
calib.save('results/calib.obj')

# 2. run_msim.py -- run top parameter sets with full results
pars_df = sc.load('results/calib_pars.df')
sim = sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030, verbose=-1)
msim = sti.make_calib_sims(
    calib_pars=pars_df, sim=sim, n_parsets=3,
)

# 3. run_scenarios.py -- compare interventions
for scenario in ['baseline', 'intervention']:
    interventions = [sti.ART(coverage=0.9)] if scenario == 'intervention' else []
    sim = sti.Sim(diseases=sti.Gonorrhea(), n_agents=500, start=2010, stop=2030,
                  verbose=-1, interventions=interventions)
    msim = sti.make_calib_sims(
        calib_pars=pars_df, sim=sim,
        n_parsets=3, seeds_per_par=2,
    )