Folder structure

Assumes the following folder structure. It is important that logger files are named record_NN.h5 or record_NNN.h5

Note that, because the data format to communicate parameters used is a Pickle file, the filenames MUST be indexed at 0

SIMULATION_NAME/
    parameter_grid.json #Describes each run
    record_00.h5
    record_01.h5
    ...
    record_NN.h5 # Each logger run
    sites.geojson # Polygons of the regions

From this information, you can specify an output_directory where summaries will automatically be available for frontend analysis.

public/demo/projects/SIMULATION_NAME/
    metadata.json
    summary_000.csv
    summary_001.csv
    ...
    summary_NNN.csv

We are provided a parameter_grid.json file that looks like the following:

{
    "pub": [
        0.0953169,
        0.521456,
        0.40569099999999997,
        0.484659,
        0.138482
    ],
    "grocery": [
        0.387384,
        0.452953,
        0.548852,
        0.042028699999999995,
        0.21261799999999997
    ], ...
}

(In this case, there are 5 runs and each run takes the parameter listed. This makes it tricky to do a grid search in the interface since many values will be distinct...)

Check available projects

Because extracting from the records can take a while, we don't want to overwrite an existing project unless indicated

init_available_projects[source]

init_available_projects(project_name:str)

Create the Summary CSVs

Take the record_**.h5 and convert them to CSVs the frontend can parse

These record files can be on the order of 8GB and summarizing each can take about 45 minutes. It works, though it is not the most efficient or parallelized implementation

summarize_h5[source]

summarize_h5(record_f, outdir)

Dependent on the context variable output_dir. The actual summarized output is much smaller than the record file itself

Creating the metadata.json

We want to convert the provided parameter_grid.json file into a metadata.json file (e.g., below) that also includes some basic summary statistics from the project. This has the format:

{
    "description": "Learning center comparison",
    "parameters_varied": [
        "indoor_beta",
        "outdoor_beta",
        "household_beta",
        "learning_centers"
    ],
    "run_parameters": {
        "1": {
            "learning_centers": false,
            "household_beta": 0.2,
            "indoor_beta": 0.45,
            "outdoor_beta": 0.05
        },
        "2": {
            "learning_centers": false,
            "household_beta": 0.2,
            "indoor_beta": 0.55,
            "outdoor_beta": 0.05
        }, ...
    },
    "all_regions": [
        "CXB-201",
        "CXB-202", ...
    ], 
    "all_timestamps": [
        "2020-05-01",
        "2020-05-02", ...
    ], 
    "all_fields": [
        "currently_dead",
        "currently_in_hospital_0_12", ...
    ],
    "field_statistics": {
        "n_infections_in_communal": {
            "max": 132.0,
            "min": 0.0
        },
        "recovered": {
            "max": 1937.0,
            "min": 0.0
        }, ...
    }

This involves restructuring the provided parameter grids and parsing the new summary_**.csvs for extents of each field.

pgrid_to_run_parameters[source]

pgrid_to_run_parameters(parameter_grid:dict)

Convert parameter_grid dictionary to desired metadata dictionary

collect_statistics[source]

collect_statistics(project:Union[str, Path])

Copying the sites.geojson

This part is a bit simpler. We need to copy the sites.geojson file from the provided records to the output directory.

Note: some geojson files may be very large. This is the place to reduce the size to something more reasonable yet still functional.

Also, some geojson files for this project have been annotated with SSID as the 'property' that describes each region. Others are annotated with the region key. We need to unify this interface

Fixing the sites.geojson

We need to unify the geojson file a bit. First, the files are terribly large with high resolution (making it very slow to load in the frontend), and the multipolygons are rendering incorrectly.

fix_geojson[source]

fix_geojson(gjson_file)

Bundle as Script

main[source]

main(record_path:"Path to JUNE simulation records and parameter grid", force_add_project:"Overwrite project if it already exists"=False, test_only:"Test behavior without changing files"=False, project_name:"Name the project. If not provided, use folder name of record_path"=None, description:"Description of project"='NA')

Create a project that can be visualized from the record files