Python Analysis

Analyzing reproducible experiments often requires locating specific output files buried within hashed directory structures. repx-py abstracts this complexity, allowing users to query jobs by name, parameters, or dependency relationships and retrieve their outputs as standard Python objects or pandas DataFrames.

Installation

repx-py is available as a flake package. Include it in your project's development shell:

# flake.nix
{
  inputs.repx.url = "github:repx-org/repx";
  
  outputs = { self, nixpkgs, repx }: {
    devShells.x86_64-linux.default = nixpkgs.legacyPackages.x86_64-linux.mkShell {
      packages = [
        repx.packages.x86_64-linux.repx
      ];
    };
  };
}

Or build it directly:

nix build github:repx-org/repx#repx

Loading an Experiment

The Experiment class is the entry point. It loads the Lab metadata and allows you to query runs and jobs.

from repx_py import Experiment

# Initialize from the built lab directory
exp = Experiment(lab_path="./result")

print(f"Loaded experiment with {len(exp.jobs())} total jobs")

Querying Jobs

The JobCollection interface allows you to filter jobs based on their metadata and parameters.

1. Basic Filtering

# Get all jobs in the 'simulation' run
jobs = exp.jobs().filter(name__startswith="simulation")

# Filter by exact parameter match
jobs = exp.jobs().filter(param_model="resnet50")

2. Advanced Filtering

Supported operators:

__startswith
__endswith
__contains
param_<NAME>: Search within effective parameters.

# Find jobs where learning rate is 0.01 AND model name contains 'net'
target_jobs = exp.jobs().filter(
    param_learning_rate=0.01,
    param_model__contains="net"
)

3. Converting to DataFrame

You can convert a collection of jobs into a Pandas DataFrame to inspect their metadata and parameters in tabular format.

df = target_jobs.to_dataframe()
print(df)
# Output:
#                                   name  param_learning_rate param_model
# job_id                                                                 
# 8f2a... simulation.train-model-batch-1                 0.01      resnet

Accessing Results

Once you have a JobView (by iterating over a collection or getting a specific ID), you can access its outputs.

for job in target_jobs:
    print(f"Analyzing Job: {job.id}")

    # 1. Get absolute path to an output file
    log_path = job.get_output_path("run.log")

    # 2. Load CSV data directly
    #    (Assumes the stage defined an output named "metrics.csv")
    metrics_df = job.load_csv("metrics.csv")
    print(metrics_df.describe())

    # 3. Load JSON data
    config = job.load_json("config.json")

Effective Parameters

RepX resolves the "effective parameters" for every job by tracing values inherited from upstream dependencies. This means you always know exactly what configuration produced a result, even if parameters were defined in a producer stage.

# Access resolved parameters
print(job.effective_params)

Analysis within a Pipeline

When running an analysis stage inside a RepX pipeline, you don't have access to the full Lab directory (because it's being built!). Instead, you use the from_run_metadata factory method.

Inside analysis.py:

import argparse
from repx_py import Experiment

parser = argparse.ArgumentParser()
parser.add_argument("--meta", help="Path to input run metadata")
parser.add_argument("--store", help="Path to artifact store base")
args = parser.parse_args()

# Load context from the specific upstream run
exp = Experiment.from_run_metadata(args.meta, args.store)

# Now you can query the upstream jobs as usual
jobs = exp.jobs()

Installation​

Loading an Experiment​

Querying Jobs​

1. Basic Filtering​

2. Advanced Filtering​

3. Converting to DataFrame​

Accessing Results​

Effective Parameters​

Analysis within a Pipeline​