Skip to main content

Nix Functions Reference

This reference documents the public API exposed by repx.lib (typically accessed as repx-lib in your flake).

Lab Definition

repx-lib.mkLab

Creates the top-level Lab derivation. This is the entry point for your experiment definition.

repx-lib.mkLab {
inherit pkgs repx-lib;
gitHash = self.rev or self.dirtyRev or "unknown";
lab_version = "1.0.0";
runs = { ... };
groups = { ... }; # optional
}

Arguments:

ParameterTypeRequiredDescription
pkgsAttribute SetYesThe Nixpkgs package set.
repx-libAttribute SetYesThe RepX library instance.
gitHashStringYesGit commit hash for provenance tracking. Baked into all metadata. Typically self.rev or self.dirtyRev or "unknown".
lab_versionStringYesUser-defined version string for this lab. Written into the lab manifest.
runsAttribute SetYesDictionary where keys are run attribute names and values are run placeholders (created by callRun).
groupsAttribute SetNo (default: {})Named groupings of runs. See Run Groups.
containerModeStringNo (default: "unified")Controls container image generation for the full lab. "unified" builds one shared image for all runs, "per-run" builds a separate image per run, "none" skips image generation entirely.
runContainerModeStringNo (default: "per-run")Controls container image generation for per-run lab slices (accessed via .runs.<name>). Same values as containerMode.

Returns: An attribute set containing:

  • lab -- The Lab derivation containing the complete experiment graph.
  • runs -- An attribute set of per-run Lab derivations. Each entry (e.g., .runs.simulation) is a standalone Lab containing only that single run, built with runContainerMode.

Validation rules:

  • All run names must be unique after evaluation.
  • Group names must not collide with any run name.
  • Circular dependencies between runs cause a build error.

repx-lib.callRun

Creates a run placeholder for use in mkLab's runs attribute set. Run placeholders are resolved lazily during lab evaluation.

repx-lib.callRun <runPath> <dependencies>

Arguments:

ParameterTypeDescription
runPathPath or FunctionPath to a run definition file (.nix), or a function that returns a run definition.
dependenciesListA list of dependency specifications. Can be empty ([]) for runs with no dependencies.

Each element in the dependencies list can be:

  • A bare run placeholder -- implies a "hard" dependency:
    repx-lib.callRun ./runs/analysis.nix [ simulationRun ]
  • A list [ runPlaceholder "type" ] -- explicit dependency type ("hard" or "soft"):
    repx-lib.callRun ./runs/analysis.nix [ [ simulationRun "hard" ] [ validationRun "soft" ] ]

Hard vs Soft dependencies:

  • "hard": The dependent run receives all jobs from the dependency as inputs. The dependency must complete before the dependent run starts.
  • "soft": The dependent run is aware of the dependency's jobs but does not receive them as direct inputs.

Returns: An attribute set with _repx_type = "run_placeholder".

Complete example:

# nix/lab.nix
{ pkgs, repx-lib, ... }:
let
simulation = repx-lib.callRun ./runs/simulation.nix [];
analysis = repx-lib.callRun ./runs/analysis.nix [ simulation ];
report = repx-lib.callRun ./runs/report.nix [ [ analysis "hard" ] [ simulation "soft" ] ];
in
repx-lib.mkLab {
inherit pkgs repx-lib;
gitHash = "abc123";
lab_version = "1.0.0";
runs = {
inherit simulation analysis report;
};
}

Run Groups

Groups allow you to tag collections of runs with a name for organizational purposes. Groups can be listed with repx list groups.

repx-lib.mkLab {
# ...
runs = {
inherit training validation testing;
};
groups = {
ml-pipeline = [ training validation ];
evaluation = [ validation testing ];
};
}

Rules:

  • Each group value must be a list of run placeholders (created by callRun).
  • Group names must not collide with any run name.

Run Definition

repx-lib.mkRun (internal)

Defines a parameterized run. You typically don't call mkRun directly -- instead, you write a run definition file that returns the arguments, and callRun + mkLab handle the rest.

A run definition file (e.g., runs/simulation.nix) returns an attribute set:

# nix/runs/simulation.nix
{ pkgs, repx-lib, ... }:
{
name = "simulation";
pipelines = [ ./pipelines/main.nix ];
params = {
seed = [ 1 2 3 ];
model = [ "A" "B" ];
};
}

Attributes:

AttributeTypeRequiredDefaultDescription
nameStringYesUnique name for this run.
pipelinesList of PathsYesPaths to pipeline definition files.
paramsAttribute SetYesParameter lists for sweeping. RepX generates the Cartesian product. Use utils.zip to pair parameters in lockstep instead.
containerizedRemoved. Container image generation is now controlled at the lab level via containerMode on mkLab.
paramsDependenciesListNo[]Additional Nix derivations that parameter values depend on (beyond auto-detection).
hashModeStringNo"pure"Controls how job IDs are computed. "pure" (default) includes the full Nix store path of the stage script derivation, so any change to packages (even transitive dependencies like glibc) invalidates the job. "params-only" hashes only the stage identity (pname + version), resolved parameters, and pipeline wiring -- package/dependency changes are ignored. See Hash Modes below.

Parameter format:

Parameter values are lists. RepX computes the Cartesian product of all parameter lists:

params = {
seed = [ 1 2 3 ]; # 3 values
model = [ "A" "B" ]; # 2 values
};
# Produces 3 x 2 = 6 parameter combinations

To pair parameters in lockstep instead of crossing them, use utils.zip:

params = {
seed = [ 1 2 3 ]; # cartesian
config = utils.zip { model = [ "A" "B" ]; lr = [ 0.1 0.01 ]; }; # zipped
};
# Produces 3 seeds x 2 zipped configs = 6 combinations
# model="A" is always paired with lr=0.1, never with lr=0.01

The run definition file receives { pkgs, repx-lib, ... } as arguments (via callPackage). You can access repx-lib.utils for parameter helpers -- see mkUtils.

Hash Modes

The hashMode attribute controls how job IDs (the hashes in directory names like <hash>-stage-name-1.1) are computed. This affects when jobs are considered "changed" and need to be re-executed.

ModeDescription
"pure" (default)The job ID includes the full Nix store path of the stage script derivation. Any change to the script, its runDependencies, or any transitive dependency (e.g., a glibc update, a nixpkgs bump) produces a new job ID and forces re-execution. This is the strictest mode and guarantees full reproducibility.
"params-only"The job ID is computed from the stage's pname, version, resolved parameter values, and the upstream dependency graph structure. Changes to packages, runDependencies, or script contents do not affect the job ID. Only parameter changes, stage renames, version bumps, or pipeline rewiring trigger re-execution.

Example:

# nix/runs/simulation.nix
{ repx-lib, ... }:
{
name = "simulation";
hashMode = "params-only"; # ignore package updates
pipelines = [ ./pipelines/main.nix ];
parameters = {
seed = [ 1 2 3 ];
};
}

When to use "params-only":

  • Large parameter sweeps where you want to iterate on script logic without invalidating thousands of completed jobs.
  • Environments where nixpkgs updates are frequent but irrelevant to your experiment's correctness.
  • Workloads where the experiment parameters are the only meaningful source of variation.

Caveats:

  • Changing a stage's script body will not produce a new job ID. If you fix a bug in the script, you must either bump the version, change a parameter, or manually clean the old results.
  • runDependencies changes (e.g., adding a new tool) are also invisible to the hash. Ensure your runtime environment is correct before relying on cached results.

Pipeline Construction

repx.mkPipe

Constructs a pipeline from a set of stages. Used inside a pipeline definition file. mkPipe is essentially an identity function on the stages attribute set -- its purpose is to mark the set as a pipeline and provide future extensibility.

# nix/pipelines/main.nix
{ repx, pkgs, ... }:
repx.mkPipe rec {
generate = repx.callStage ./stages/generate.nix [];
train = repx.callStage ./stages/train.nix [ generate ];
analyze = repx.callStage ./stages/analyze.nix [ train ];
}

Arguments:

ParameterTypeDescription
stagesAttribute SetA (typically rec) attribute set where each key is a stage name and value is a Stage derivation (returned by callStage).

Returns: The stages attribute set (a Pipeline definition).

note

Pipeline files receive { repx, pkgs, ... } as arguments. The repx object contains mkPipe and callStage. This is distinct from repx-lib which is available in run definition files and lab files.

repx.callStage

Instantiates a stage from a file, resolving dependencies and parameters.

repx.callStage <path> <dependencies>

Arguments:

ParameterTypeDescription
pathPathPath to the stage definition file (.nix).
dependenciesListA list of stage dependencies.

Each element in the dependencies list can be:

  • A Stage derivation -- implicit mapping. Output names of the upstream stage are matched to input names of the current stage:
    train = repx.callStage ./stages/train.nix [ generate ];
  • A list [ stage "source" "target" ] -- explicit mapping. Maps the source output of stage to the target input of the current stage:
    analyze = repx.callStage ./stages/analyze.nix [
    [ train "model_weights" "weights_file" ]
    [ generate "data_csv" "input_data" ]
    ];

Returns: A Stage derivation with passthru metadata.


Stage Schema

Stages are defined as Nix functions that accept { pkgs } and return an attribute set. There are two stage types: simple and scatter-gather.

Common Attributes

These attributes are valid for both stage types:

AttributeTypeRequiredDefaultDescription
pnameString or FunctionYesStage name. Can be a function { params }: ... for dynamic names.
versionStringNo"1.1"Stage version string.
paramsAttribute SetNo{}Default parameter values. Overridden by run-level parameters of the same name.
runDependenciesListNo[]Nix packages to include in $PATH at runtime.
resourcesAttribute Set or FunctionNonullResource hints for SLURM scheduling. See Resource Hints.
passthruAttribute SetNo{}Arbitrary attributes passed through to the derivation's passthru.

Simple Stage Attributes

In addition to common attributes:

AttributeTypeRequiredDescription
inputsAttribute Set or FunctionNoMap of input identifiers to default values. Available as $inputs associative array in the script. Can be a function { params }: ....
outputsAttribute Set or FunctionNoMap of output identifiers to file path templates using $out. Can be a function { params }: ....
runFunctionYesThe execution script. Receives { inputs, outputs, params, pkgs, ... } and returns a Bash string.

Scatter-Gather Stage Attributes

In addition to common attributes:

AttributeTypeRequiredDescription
scatterAttribute SetYesThe scatter phase definition (has inputs, outputs, run, and optionally resources).
stepsAttribute SetYesA set of step definitions forming a mini-DAG per branch. Each step has pname, inputs, outputs, run, deps, and optionally resources and runDependencies. See Step Dependencies.
gatherAttribute SetYesThe gather phase definition (has inputs, outputs, run, and optionally resources).
inputsAttribute SetNoShared inputs for the scatter phase.

Step Dependencies

Each step in the steps attrset is an attribute set with these fields:

AttributeTypeRequiredDefaultDescription
pnameStringYesStep name (must match the attrset key).
inputsAttribute SetYesMap of input identifiers to default values. Root steps should declare worker__item to receive the scatter work item.
outputsAttribute SetYesMap of output identifiers to $out/... path templates.
depsListYesList of step references this step depends on. Empty list ([]) = root step.
runFunctionYesExecution script, same contract as simple stages.
resourcesAttribute SetNonullPer-step resource hints for SLURM scheduling.
runDependenciesListNo[]Additional Nix packages for this step's $PATH.

Dependency wiring uses the same syntax as repx.callStage dependencies:

  • Bare reference ([ other_step ]): Implicit name mapping — output names of the dependency are matched to input names of this step.
  • Explicit mapping ([ other_step "source_output" "target_input" ]): Maps a specific output of the dependency to a specific input of this step.

Constraints:

  • There must be exactly one sink step (a step no other step depends on). The sink step's outputs become the gather phase's inputs.
  • At least one root step (deps = []) must declare a worker__item input.
  • The step DAG must be acyclic.

Dynamic Attribute Resolution

The pname, inputs, outputs, and resources attributes can be functions that accept { params } and return the resolved value. This allows stage definitions to adapt based on parameters:

{ pkgs }:
{
pname = { params }: "train-${params.model}";

inputs = { params }: {
"data" = "input.csv";
} // (if params.use_pretrained then {
"pretrained_weights" = "weights.bin";
} else {});

outputs = { params }: {
"model" = "$out/model-${params.model}.bin";
"metrics" = "$out/metrics.csv";
};

resources = { params }: {
mem = if params.dataset_size > 10000 then "64G" else "8G";
cpus = if params.model == "large" then 8 else 4;
};

params = {
model = "small";
dataset_size = 1000;
use_pretrained = false;
};

runDependencies = [ pkgs.python3 ];

run = { inputs, outputs, params, ... }: ''
python3 train.py \
--model ${params.model} \
--output "${outputs.model}"
'';
}

Resource Hints

Resource hints guide SLURM job submission. They can be specified at the stage level and are automatically merged from upstream dependencies.

resources = {
mem = "16G"; # Memory (supports K, M, G, T suffixes)
cpus = 4; # CPU count
time = "02:00:00"; # Wall time (HH:MM:SS, MM:SS, or raw seconds)
partition = "gpu"; # SLURM partition
sbatch_opts = [ "--gres=gpu:1" ]; # Extra sbatch options
};
FieldTypeDescription
memStringMemory limit. Suffixes: K (KiB), M (MiB), G (GiB), T (TiB).
cpusIntegerNumber of CPUs.
timeStringWall time limit. Formats: HH:MM:SS, MM:SS, or raw seconds.
partitionStringSLURM partition name.
sbatch_optsList of StringsAdditional sbatch flags.

Merge semantics: When a stage depends on upstream stages, resource hints are automatically merged:

  • mem, cpus, time: The maximum across all inputs and the stage's own declaration is used.
  • partition, sbatch_opts: The stage's own value takes precedence (last-writer-wins). If unset, the first dependency's value is used.

For scatter-gather stages, each sub-stage (scatter, gather) and each individual step can have its own resources attribute.


Utility Functions

repx-lib.mkUtils

A factory that creates a set of parameter utility functions. Available automatically inside run definition files as repx-lib.utils (injected by mkLab).

utils = repx-lib.mkUtils { inherit pkgs; };

utils.list

Wraps a plain list into a RepX parameter object. Use this when you have a dynamically constructed list that should be treated as a parameter sweep dimension.

utils.list [ 1 2 3 ]
# Equivalent to setting params = { x = [ 1 2 3 ]; } directly

utils.range

Generates a list of integers from start to end (inclusive). Wrapper around pkgs.lib.range.

utils.range 1 10
# [ 1 2 3 4 5 6 7 8 9 10 ]

utils.scan

Scans a directory for entries matching criteria. Works at Nix evaluation time.

utils.scan {
src = ./data; # Path or derivation to scan
type = "file"; # "any" (default), "file", or "directory"
match = ".*\\.csv"; # Optional regex pattern to filter by name
}

Arguments:

ParameterTypeDefaultDescription
srcPath or Derivation(required)The directory to scan.
typeString"any"Entry type filter: "any", "file", or "directory".
matchString or nullnullRegex pattern to filter entry names.

Returns: A RepX parameter object ({ _repx_param = true; values = [...]; context = [...]; }). The values are absolute paths to matching entries. The context tracks derivation dependencies for Nix garbage collection safety.

Behavior for store paths vs local paths:

  • Local paths: Uses builtins.readDir for fast evaluation.
  • Store paths / derivations: Uses find in a build step (since readDir doesn't work on store paths).

utils.dirs

Scans a source for subdirectories. Shorthand for scan { type = "directory"; ... } with an important optimization: for non-store local paths, each directory is wrapped in its own individual derivation for fine-grained Nix caching.

# Sweep over directories in ./submissions/
{ pkgs, repx-lib, ... }:
{
name = "grading";
pipelines = [ ./pipelines/grade.nix ];
params = {
submission = repx-lib.utils.dirs ./submissions;
};
}

utils.files

Scans a source for files only. Shorthand for scan { type = "file"; ... }.

params = {
config = repx-lib.utils.files ./configs;
};

utils.zip

Groups multiple parameter lists so they sweep in lockstep (element-wise) instead of as a Cartesian product. All lists must have the same length.

params = {
workload = [ "a" "b" "c" ]; # normal cartesian dimension

# mode and multiplier are paired: fast↔2, slow↔3.
# Without zip: 2×2 = 4 combos (fast×2, fast×3, slow×2, slow×3).
# With zip: exactly 2 combos (fast×2, slow×3).
config = utils.zip {
mode = [ "fast" "slow" ];
multiplier = [ 2 3 ];
};
};
# Total combinations: 3 workloads × 2 zipped configs = 6

Arguments:

ParameterTypeDescription
(positional)Attrset of listsEach attribute is a parameter name and its list of values. All lists must have the same length.

Returns: A RepX zip marker ({ _repx_zip = true; groups = {...}; length = N; }). The individual parameter names inside the zip group become normal parameters in each job combination -- stages declare and access them the same way as any other parameter.

Error handling: If lists have different lengths, evaluation fails with a message showing each parameter's length:

error: utils.zip: all parameter lists must have the same length,
but 'config_label' has 2 items, 'mode' has 3 items, 'vf_enable' has 3 items

Script Execution Contract

When RepX executes a stage script, the following contract applies:

  1. Shell settings: set -euxo pipefail -- scripts fail on any error, undefined variable, or pipe failure.
  2. Arguments: The script receives $1 as the output directory ($out) and $2 as the inputs JSON manifest.
  3. Input readiness: RepX polls for all input files to become readable with a 30-second timeout (2-second intervals). This handles async filesystem syncs on networked storage.
  4. Output cleanup: The output directory ($out) is cleared before each run (preserving slurm-*.out files).
  5. Working directory: The script's working directory is set to $out.
  6. $PATH: Contains only packages from runDependencies plus core utilities (bash, coreutils, findutils, sed, grep, jq).

Build-Time Script Validation

Every stage script undergoes automatic validation at build time:

  1. ShellCheck: The script is linted with shellcheck to catch common Bash issues.
  2. OSH parsing: The script is parsed into an AST using Oils for Unix (OSH).
  3. Dependency analysis: A Python analyzer walks the AST to extract all external command invocations and verifies each command exists in $PATH (populated by runDependencies).

If any command referenced in your script is not provided by runDependencies, the Nix build fails with an error listing the missing commands. This catches dependency issues at build time rather than at runtime.