Nix Functions Reference
This reference documents the public API exposed by repx.lib (typically accessed as repx-lib in your flake).
Lab Definition
repx-lib.mkLab
Creates the top-level Lab derivation. This is the entry point for your experiment definition.
repx-lib.mkLab {
inherit pkgs repx-lib;
gitHash = self.rev or self.dirtyRev or "unknown";
lab_version = "1.0.0";
runs = { ... };
groups = { ... }; # optional
}
Arguments:
| Parameter | Type | Required | Description |
|---|---|---|---|
pkgs | Attribute Set | Yes | The Nixpkgs package set. |
repx-lib | Attribute Set | Yes | The RepX library instance. |
gitHash | String | Yes | Git commit hash for provenance tracking. Baked into all metadata. Typically self.rev or self.dirtyRev or "unknown". |
lab_version | String | Yes | User-defined version string for this lab. Written into the lab manifest. |
runs | Attribute Set | Yes | Dictionary where keys are run attribute names and values are run placeholders (created by callRun). |
groups | Attribute Set | No (default: {}) | Named groupings of runs. See Run Groups. |
containerMode | String | No (default: "unified") | Controls container image generation for the full lab. "unified" builds one shared image for all runs, "per-run" builds a separate image per run, "none" skips image generation entirely. |
runContainerMode | String | No (default: "per-run") | Controls container image generation for per-run lab slices (accessed via .runs.<name>). Same values as containerMode. |
Returns: An attribute set containing:
lab-- The Lab derivation containing the complete experiment graph.runs-- An attribute set of per-run Lab derivations. Each entry (e.g.,.runs.simulation) is a standalone Lab containing only that single run, built withrunContainerMode.
Validation rules:
- All run names must be unique after evaluation.
- Group names must not collide with any run name.
- Circular dependencies between runs cause a build error.
repx-lib.callRun
Creates a run placeholder for use in mkLab's runs attribute set. Run placeholders are resolved lazily during lab evaluation.
repx-lib.callRun <runPath> <dependencies>
Arguments:
| Parameter | Type | Description |
|---|---|---|
runPath | Path or Function | Path to a run definition file (.nix), or a function that returns a run definition. |
dependencies | List | A list of dependency specifications. Can be empty ([]) for runs with no dependencies. |
Each element in the dependencies list can be:
- A bare run placeholder -- implies a
"hard"dependency:repx-lib.callRun ./runs/analysis.nix [ simulationRun ] - A list
[ runPlaceholder "type" ]-- explicit dependency type ("hard"or"soft"):repx-lib.callRun ./runs/analysis.nix [ [ simulationRun "hard" ] [ validationRun "soft" ] ]
Hard vs Soft dependencies:
"hard": The dependent run receives all jobs from the dependency as inputs. The dependency must complete before the dependent run starts."soft": The dependent run is aware of the dependency's jobs but does not receive them as direct inputs.
Returns: An attribute set with _repx_type = "run_placeholder".
Complete example:
# nix/lab.nix
{ pkgs, repx-lib, ... }:
let
simulation = repx-lib.callRun ./runs/simulation.nix [];
analysis = repx-lib.callRun ./runs/analysis.nix [ simulation ];
report = repx-lib.callRun ./runs/report.nix [ [ analysis "hard" ] [ simulation "soft" ] ];
in
repx-lib.mkLab {
inherit pkgs repx-lib;
gitHash = "abc123";
lab_version = "1.0.0";
runs = {
inherit simulation analysis report;
};
}
Run Groups
Groups allow you to tag collections of runs with a name for organizational purposes. Groups can be listed with repx list groups.
repx-lib.mkLab {
# ...
runs = {
inherit training validation testing;
};
groups = {
ml-pipeline = [ training validation ];
evaluation = [ validation testing ];
};
}
Rules:
- Each group value must be a list of run placeholders (created by
callRun). - Group names must not collide with any run name.
Run Definition
repx-lib.mkRun (internal)
Defines a parameterized run. You typically don't call mkRun directly -- instead, you write a run definition file that returns the arguments, and callRun + mkLab handle the rest.
A run definition file (e.g., runs/simulation.nix) returns an attribute set:
# nix/runs/simulation.nix
{ pkgs, repx-lib, ... }:
{
name = "simulation";
pipelines = [ ./pipelines/main.nix ];
params = {
seed = [ 1 2 3 ];
model = [ "A" "B" ];
};
}
Attributes:
| Attribute | Type | Required | Default | Description |
|---|---|---|---|---|
name | String | Yes | Unique name for this run. | |
pipelines | List of Paths | Yes | Paths to pipeline definition files. | |
params | Attribute Set | Yes | Parameter lists for sweeping. RepX generates the Cartesian product. Use utils.zip to pair parameters in lockstep instead. | |
containerized | Removed. Container image generation is now controlled at the lab level via containerMode on mkLab. | |||
paramsDependencies | List | No | [] | Additional Nix derivations that parameter values depend on (beyond auto-detection). |
hashMode | String | No | "pure" | Controls how job IDs are computed. "pure" (default) includes the full Nix store path of the stage script derivation, so any change to packages (even transitive dependencies like glibc) invalidates the job. "params-only" hashes only the stage identity (pname + version), resolved parameters, and pipeline wiring -- package/dependency changes are ignored. See Hash Modes below. |
Parameter format:
Parameter values are lists. RepX computes the Cartesian product of all parameter lists:
params = {
seed = [ 1 2 3 ]; # 3 values
model = [ "A" "B" ]; # 2 values
};
# Produces 3 x 2 = 6 parameter combinations
To pair parameters in lockstep instead of crossing them, use utils.zip:
params = {
seed = [ 1 2 3 ]; # cartesian
config = utils.zip { model = [ "A" "B" ]; lr = [ 0.1 0.01 ]; }; # zipped
};
# Produces 3 seeds x 2 zipped configs = 6 combinations
# model="A" is always paired with lr=0.1, never with lr=0.01
The run definition file receives { pkgs, repx-lib, ... } as arguments (via callPackage). You can access repx-lib.utils for parameter helpers -- see mkUtils.
Hash Modes
The hashMode attribute controls how job IDs (the hashes in directory names like <hash>-stage-name-1.1) are computed. This affects when jobs are considered "changed" and need to be re-executed.
| Mode | Description |
|---|---|
"pure" (default) | The job ID includes the full Nix store path of the stage script derivation. Any change to the script, its runDependencies, or any transitive dependency (e.g., a glibc update, a nixpkgs bump) produces a new job ID and forces re-execution. This is the strictest mode and guarantees full reproducibility. |
"params-only" | The job ID is computed from the stage's pname, version, resolved parameter values, and the upstream dependency graph structure. Changes to packages, runDependencies, or script contents do not affect the job ID. Only parameter changes, stage renames, version bumps, or pipeline rewiring trigger re-execution. |
Example:
# nix/runs/simulation.nix
{ repx-lib, ... }:
{
name = "simulation";
hashMode = "params-only"; # ignore package updates
pipelines = [ ./pipelines/main.nix ];
parameters = {
seed = [ 1 2 3 ];
};
}
When to use "params-only":
- Large parameter sweeps where you want to iterate on script logic without invalidating thousands of completed jobs.
- Environments where nixpkgs updates are frequent but irrelevant to your experiment's correctness.
- Workloads where the experiment parameters are the only meaningful source of variation.
Caveats:
- Changing a stage's script body will not produce a new job ID. If you fix a bug in the script, you must either bump the
version, change a parameter, or manually clean the old results. runDependencieschanges (e.g., adding a new tool) are also invisible to the hash. Ensure your runtime environment is correct before relying on cached results.
Pipeline Construction
repx.mkPipe
Constructs a pipeline from a set of stages. Used inside a pipeline definition file. mkPipe is essentially an identity function on the stages attribute set -- its purpose is to mark the set as a pipeline and provide future extensibility.
# nix/pipelines/main.nix
{ repx, pkgs, ... }:
repx.mkPipe rec {
generate = repx.callStage ./stages/generate.nix [];
train = repx.callStage ./stages/train.nix [ generate ];
analyze = repx.callStage ./stages/analyze.nix [ train ];
}
Arguments:
| Parameter | Type | Description |
|---|---|---|
stages | Attribute Set | A (typically rec) attribute set where each key is a stage name and value is a Stage derivation (returned by callStage). |
Returns: The stages attribute set (a Pipeline definition).
Pipeline files receive { repx, pkgs, ... } as arguments. The repx object contains mkPipe and callStage. This is distinct from repx-lib which is available in run definition files and lab files.
repx.callStage
Instantiates a stage from a file, resolving dependencies and parameters.
repx.callStage <path> <dependencies>
Arguments:
| Parameter | Type | Description |
|---|---|---|
path | Path | Path to the stage definition file (.nix). |
dependencies | List | A list of stage dependencies. |
Each element in the dependencies list can be:
- A Stage derivation -- implicit mapping. Output names of the upstream stage are matched to input names of the current stage:
train = repx.callStage ./stages/train.nix [ generate ]; - A list
[ stage "source" "target" ]-- explicit mapping. Maps thesourceoutput ofstageto thetargetinput of the current stage:analyze = repx.callStage ./stages/analyze.nix [
[ train "model_weights" "weights_file" ]
[ generate "data_csv" "input_data" ]
];
Returns: A Stage derivation with passthru metadata.
Stage Schema
Stages are defined as Nix functions that accept { pkgs } and return an attribute set. There are two stage types: simple and scatter-gather.
Common Attributes
These attributes are valid for both stage types:
| Attribute | Type | Required | Default | Description |
|---|---|---|---|---|
pname | String or Function | Yes | Stage name. Can be a function { params }: ... for dynamic names. | |
version | String | No | "1.1" | Stage version string. |
params | Attribute Set | No | {} | Default parameter values. Overridden by run-level parameters of the same name. |
runDependencies | List | No | [] | Nix packages to include in $PATH at runtime. |
resources | Attribute Set or Function | No | null | Resource hints for SLURM scheduling. See Resource Hints. |
passthru | Attribute Set | No | {} | Arbitrary attributes passed through to the derivation's passthru. |
Simple Stage Attributes
In addition to common attributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
inputs | Attribute Set or Function | No | Map of input identifiers to default values. Available as $inputs associative array in the script. Can be a function { params }: .... |
outputs | Attribute Set or Function | No | Map of output identifiers to file path templates using $out. Can be a function { params }: .... |
run | Function | Yes | The execution script. Receives { inputs, outputs, params, pkgs, ... } and returns a Bash string. |
Scatter-Gather Stage Attributes
In addition to common attributes:
| Attribute | Type | Required | Description |
|---|---|---|---|
scatter | Attribute Set | Yes | The scatter phase definition (has inputs, outputs, run, and optionally resources). |
steps | Attribute Set | Yes | A set of step definitions forming a mini-DAG per branch. Each step has pname, inputs, outputs, run, deps, and optionally resources and runDependencies. See Step Dependencies. |
gather | Attribute Set | Yes | The gather phase definition (has inputs, outputs, run, and optionally resources). |
inputs | Attribute Set | No | Shared inputs for the scatter phase. |
Step Dependencies
Each step in the steps attrset is an attribute set with these fields:
| Attribute | Type | Required | Default | Description |
|---|---|---|---|---|
pname | String | Yes | Step name (must match the attrset key). | |
inputs | Attribute Set | Yes | Map of input identifiers to default values. Root steps should declare worker__item to receive the scatter work item. | |
outputs | Attribute Set | Yes | Map of output identifiers to $out/... path templates. | |
deps | List | Yes | List of step references this step depends on. Empty list ([]) = root step. | |
run | Function | Yes | Execution script, same contract as simple stages. | |
resources | Attribute Set | No | null | Per-step resource hints for SLURM scheduling. |
runDependencies | List | No | [] | Additional Nix packages for this step's $PATH. |
Dependency wiring uses the same syntax as repx.callStage dependencies:
- Bare reference (
[ other_step ]): Implicit name mapping — output names of the dependency are matched to input names of this step. - Explicit mapping (
[ other_step "source_output" "target_input" ]): Maps a specific output of the dependency to a specific input of this step.
Constraints:
- There must be exactly one sink step (a step no other step depends on). The sink step's outputs become the gather phase's inputs.
- At least one root step (
deps = []) must declare aworker__iteminput. - The step DAG must be acyclic.
Dynamic Attribute Resolution
The pname, inputs, outputs, and resources attributes can be functions that accept { params } and return the resolved value. This allows stage definitions to adapt based on parameters:
{ pkgs }:
{
pname = { params }: "train-${params.model}";
inputs = { params }: {
"data" = "input.csv";
} // (if params.use_pretrained then {
"pretrained_weights" = "weights.bin";
} else {});
outputs = { params }: {
"model" = "$out/model-${params.model}.bin";
"metrics" = "$out/metrics.csv";
};
resources = { params }: {
mem = if params.dataset_size > 10000 then "64G" else "8G";
cpus = if params.model == "large" then 8 else 4;
};
params = {
model = "small";
dataset_size = 1000;
use_pretrained = false;
};
runDependencies = [ pkgs.python3 ];
run = { inputs, outputs, params, ... }: ''
python3 train.py \
--model ${params.model} \
--output "${outputs.model}"
'';
}
Resource Hints
Resource hints guide SLURM job submission. They can be specified at the stage level and are automatically merged from upstream dependencies.
resources = {
mem = "16G"; # Memory (supports K, M, G, T suffixes)
cpus = 4; # CPU count
time = "02:00:00"; # Wall time (HH:MM:SS, MM:SS, or raw seconds)
partition = "gpu"; # SLURM partition
sbatch_opts = [ "--gres=gpu:1" ]; # Extra sbatch options
};
| Field | Type | Description |
|---|---|---|
mem | String | Memory limit. Suffixes: K (KiB), M (MiB), G (GiB), T (TiB). |
cpus | Integer | Number of CPUs. |
time | String | Wall time limit. Formats: HH:MM:SS, MM:SS, or raw seconds. |
partition | String | SLURM partition name. |
sbatch_opts | List of Strings | Additional sbatch flags. |
Merge semantics: When a stage depends on upstream stages, resource hints are automatically merged:
mem,cpus,time: The maximum across all inputs and the stage's own declaration is used.partition,sbatch_opts: The stage's own value takes precedence (last-writer-wins). If unset, the first dependency's value is used.
For scatter-gather stages, each sub-stage (scatter, gather) and each individual step can have its own resources attribute.
Utility Functions
repx-lib.mkUtils
A factory that creates a set of parameter utility functions. Available automatically inside run definition files as repx-lib.utils (injected by mkLab).
utils = repx-lib.mkUtils { inherit pkgs; };
utils.list
Wraps a plain list into a RepX parameter object. Use this when you have a dynamically constructed list that should be treated as a parameter sweep dimension.
utils.list [ 1 2 3 ]
# Equivalent to setting params = { x = [ 1 2 3 ]; } directly
utils.range
Generates a list of integers from start to end (inclusive). Wrapper around pkgs.lib.range.
utils.range 1 10
# [ 1 2 3 4 5 6 7 8 9 10 ]
utils.scan
Scans a directory for entries matching criteria. Works at Nix evaluation time.
utils.scan {
src = ./data; # Path or derivation to scan
type = "file"; # "any" (default), "file", or "directory"
match = ".*\\.csv"; # Optional regex pattern to filter by name
}
Arguments:
| Parameter | Type | Default | Description |
|---|---|---|---|
src | Path or Derivation | (required) | The directory to scan. |
type | String | "any" | Entry type filter: "any", "file", or "directory". |
match | String or null | null | Regex pattern to filter entry names. |
Returns: A RepX parameter object ({ _repx_param = true; values = [...]; context = [...]; }). The values are absolute paths to matching entries. The context tracks derivation dependencies for Nix garbage collection safety.
Behavior for store paths vs local paths:
- Local paths: Uses
builtins.readDirfor fast evaluation. - Store paths / derivations: Uses
findin a build step (sincereadDirdoesn't work on store paths).
utils.dirs
Scans a source for subdirectories. Shorthand for scan { type = "directory"; ... } with an important optimization: for non-store local paths, each directory is wrapped in its own individual derivation for fine-grained Nix caching.
# Sweep over directories in ./submissions/
{ pkgs, repx-lib, ... }:
{
name = "grading";
pipelines = [ ./pipelines/grade.nix ];
params = {
submission = repx-lib.utils.dirs ./submissions;
};
}
utils.files
Scans a source for files only. Shorthand for scan { type = "file"; ... }.
params = {
config = repx-lib.utils.files ./configs;
};
utils.zip
Groups multiple parameter lists so they sweep in lockstep (element-wise) instead of as a Cartesian product. All lists must have the same length.
params = {
workload = [ "a" "b" "c" ]; # normal cartesian dimension
# mode and multiplier are paired: fast↔2, slow↔3.
# Without zip: 2×2 = 4 combos (fast×2, fast×3, slow×2, slow×3).
# With zip: exactly 2 combos (fast×2, slow×3).
config = utils.zip {
mode = [ "fast" "slow" ];
multiplier = [ 2 3 ];
};
};
# Total combinations: 3 workloads × 2 zipped configs = 6
Arguments:
| Parameter | Type | Description |
|---|---|---|
| (positional) | Attrset of lists | Each attribute is a parameter name and its list of values. All lists must have the same length. |
Returns: A RepX zip marker ({ _repx_zip = true; groups = {...}; length = N; }). The individual parameter names inside the zip group become normal parameters in each job combination -- stages declare and access them the same way as any other parameter.
Error handling: If lists have different lengths, evaluation fails with a message showing each parameter's length:
error: utils.zip: all parameter lists must have the same length,
but 'config_label' has 2 items, 'mode' has 3 items, 'vf_enable' has 3 items
Script Execution Contract
When RepX executes a stage script, the following contract applies:
- Shell settings:
set -euxo pipefail-- scripts fail on any error, undefined variable, or pipe failure. - Arguments: The script receives
$1as the output directory ($out) and$2as the inputs JSON manifest. - Input readiness: RepX polls for all input files to become readable with a 30-second timeout (2-second intervals). This handles async filesystem syncs on networked storage.
- Output cleanup: The output directory (
$out) is cleared before each run (preservingslurm-*.outfiles). - Working directory: The script's working directory is set to
$out. $PATH: Contains only packages fromrunDependenciesplus core utilities (bash, coreutils, findutils, sed, grep, jq).
Build-Time Script Validation
Every stage script undergoes automatic validation at build time:
- ShellCheck: The script is linted with
shellcheckto catch common Bash issues. - OSH parsing: The script is parsed into an AST using Oils for Unix (OSH).
- Dependency analysis: A Python analyzer walks the AST to extract all external command invocations and verifies each command exists in
$PATH(populated byrunDependencies).
If any command referenced in your script is not provided by runDependencies, the Nix build fails with an error listing the missing commands. This catches dependency issues at build time rather than at runtime.