Start Here
RefTrace is a tool to lint Nextflow pipelines. It works by parsing the Nextflow DSL and serving it up to you in Python. No more regex!
RefTrace focuses on modules, processes, and directives. It parses each module in a pipeline and returns a list of processes. It doesn't care about channels or workflows at the moment.
Example: Labeling Processes
It may be organizational policy to have a standard set of labels for each process:
CORRECT_PROCESS_LABELS = [
"process_single",
"process_low",
"process_medium",
"process_high",
"process_long",
"process_high_memory",
]
These labels define the resource requirements of the process. Here's a label applied to a process:
process CUSTOM_GETCHROMSIZES {
tag "$fasta"
label 'process_single'
...
But what if someone forgets the label 'process_single'
part? RefTrace can detect if a process is missing a label:
@rule
def no_standard_label(module: Module, results: LintResults):
for process in module.processes:
labels = process.labels
good_labels = [label for label in labels
if label.label in CORRECT_PROCESS_LABELS]
if len(good_labels) == 0:
results.warnings.append(
LintWarning(
line=process.line,
warning=f"process '{process.name}' has no standard label"
)
)
This rule is not built in to RefTrace. You write it yourself!
RefTrace takes care of the parsing and lets you focus on the logic.
The command line tool takes @rule
decorated functions and runs them over each module in the pipeline.
You would put the above rule in a rules.py
file in the root of your pipeline. You would then run reftrace lint
.
reftrace generate
will generate a starter rules.py
file for you.