Skip to main content

Core Concepts

RepX provides reproducible experiment execution on High-Performance Computing infrastructure. The framework separates experiment definition from execution, enabling portable workflows across heterogeneous compute environments.

Architecture Overview

RepX comprises three distinct layers:

LayerComponentFunction
DefinitionNix libraryExperiment specification, dependency resolution
ExecutionRust CLIOrchestration, synchronization, job management
AnalysisPython libraryResult querying, metadata access

Terminology

Stage

A discrete computational unit with explicit interfaces:

  • Inputs: Data dependencies from upstream stages
  • Outputs: Produced artifacts consumed by downstream stages
  • Parameters: Configuration values affecting execution
  • Environment: Software dependencies captured as Nix closures

Pipeline

A directed acyclic graph (DAG) of stages connected by data flow dependencies. Stage outputs map to downstream stage inputs.

Run

A parameterized pipeline instantiation. Multiple runs may share pipeline structure with varying parameter configurations.

Lab

The build artifact produced by nix build. A Lab encapsulates:

  • Experiment DAG structure (metadata JSON)
  • Job executables and dependency closures
  • Container images (when applicable)
  • Host tools for target bootstrapping

Workflow

1. Define

Specify experiment structure using the repx Nix library:

{
outputs = { self, repx, ... }: {
packages.x86_64-linux.default = repx.lib.runs2Lab [
# Run definitions
];
};
}

2. Build

Generate the Lab artifact:

nix build

3. Visualize

Inspect experiment topology:

repx viz --format svg
Experiment Topology

4. Execute

Submit to an execution target:

repx run <run_name> --lab ./result [--target <target>]

5. Analyze

Query results programmatically:

from repx_py import Lab

lab = Lab.from_path("./result")
for job in lab.jobs():
print(job.outputs)

Execution Model

Scheduling

RepX supports two scheduler backends:

SchedulerUse Case
localDirect execution with concurrency control
slurmHPC cluster submission via SLURM

Runtime Environments

Jobs execute in isolated environments:

RuntimeDescription
NativeDirect process on host system
BwrapBubblewrap namespace isolation
DockerDocker container
PodmanPodman container

Incremental Execution

RepX tracks job completion state. Interrupted executions resume from the last completed job. Container images are synchronized incrementally to minimize transfer overhead during iterative development.