Skip to content

Blog

Announcing RefTrace Linting (it's open source)

Pipeline authors can now write custom linting rules without parsing the Nextflow DSL.

Linting rules help catch bugs before runtime, but are traditionally hard to write.
A bioinformatician needs to parse the Nextflow DSL to write meaningful linting rules.
RefTrace Linting helps by parsing the Nextflow DSL for them. It offers a Python-like API.

Separating Business Logic from Syntax

An example linting rule is:
No process in the pipeline uses more than 2 GB of memory.

The Nextflow domain-specific language (DSL), based on Groovy, allows for pipeline authors to declare how much memory a process uses like this:

process foo {
memory '2 GB'
}

or like this:

process foo {
memory 2.GB
}

It’s great that Groovy offers this flexibility. But it means the bioinformatician needs to account for both '2 GB' and 2.GB in their linting rule.

RefTrace abstracts away these syntax differences. It takes care of parsing the DSL and lets the bioinformatician focus on what their rule does.

A Python* API

Other approaches to linting have been tried. AWS HealthOmics took a cool approach to linting based on CodeNarc, a static analysis tool for Groovy.

While this approach can offer a lot of flexibility, RefTrace tries to help bioinformaticians who are still new to Groovy write linting rules. It assumes the bioinformatician knows Python.

Here’s a RefTrace linting rule that says, “all processes must use between 2 and 96 cpus”:

# This file should exist in the root of your pipeline directory
def rule_check_cpu_directive(module):
for process in module.processes:
cpu_directives = process.directives.cpus
if not cpu_directives:
# No CPU directive, so we skip this process
return
for cpu_directive in cpu_directives:
cpu_value = cpu_directive.num
if cpu_value < 2 or cpu_value > 96:
fatal("Process %s has an invalid CPU value. It should be >= 2 and <= 96, but it is %d" % (process.name, cpu_value))

A bioinformatician runs the above with reft lint.

The RefTrace Linting API exposes Python-like objects that can be composed.

Without a Python-like API and without DSL parsing handled for them, the bioinformatician would need to write something more similar to:

class CpuAstVisitor extends AbstractAstVisitor {
def MIN_CPU = 2
def MAX_CPU = 96
@Override
void visitMethodCallExpression(MethodCallExpression expression) {
if(expression.getMethodAsString() == 'cpus'){
checkOneArgument(expression)
}
super.visitMethodCallExpression(expression)
}
private checkOneArgument(final MethodCallExpression expression){
def methodArguments = AstUtil.getMethodArguments(expression)
if (methodArguments.size() == 0) {
addViolation(expression, 'the cpus directive must have one argument')
return new EmptyExpression()
} else if (methodArguments.size() > 1) {
addViolation(expression, 'the cpus directive must have only one argument')
}
if( methodArguments.first() instanceof ConstantExpression){
checkNumeric((ConstantExpression)methodArguments.first())
}
}
private checkNumeric(ConstantExpression expression){
try {
def val = Integer.parseInt(expression.value.toString())
checkMinMax(expression, val)
} catch (NumberFormatException ignored){
addViolation(expression,
"'${expression.value}' is not a valid number.")
}
}
private void checkMinMax(Expression exp, final int val) {
if (val < MIN_CPU) {
addViolation(exp,
"The minimum CPU count is '$MIN_CPU'.")
} else if (val > MAX_CPU) {
addViolation(exp,
"The maximum CPU count is '$MAX_CPU'.")
}
}
}

Try it Out

RefTrace is open-source and distributed as a single, static binary.

You can find the source at: github.com/reftrace/reftrace
Or download a prebuilt binary: https://github.com/reftrace/reftrace/releases/latest

A quickstart is available at https://reftrace.com.

RefTrace Linting is still early. You may find a bug. Feedback is much appreciated on GitHub or at [email protected]. Thank you.

*Technically Starlark