Skip to content

Nextflow Linting Examples

RefTrace lets you express your linting business logic in Python.
It takes care of parsing the Nextflow DSL.

A Nextflow module is a file. It can contain:

  • process definitions
  • workflow definitions
  • function definitions

Currently, RefTrace focuses on processes. They are the core unit of execution in a Nextflow pipeline.

Here’s how you can print the names of all the processes in a module:

def rule_process_names(module):
for process in module.processes:
print(process.name)

A Nextflow process may have directives that inform its execution. For example, the cpus directive specifies the number of CPUs required for the process to run. It reserves CPUs at runtime through a cluster resource manager.

Here’s how to print all the directives of a process:

def rule_print_directives(module):
for process in module.processes:
for attr in dir(process.directives):
directive = getattr(process.directives, attr)
if len(directive) > 0:
print(process.name, directive)

The rnaseq pipeline has both processes that consume many resources and processes that consume few. Here’s how we can find the processes that consume many resources:

def rule_resource_intensive_processes(module):
for process in module.processes:
for process_label in process.directives.label:
if 'process_high' == process_label.label:
print(process.name)

Failing a Lint Rule Immediately

Maybe we have a business policy that processes must explicitly declare their resource requirements. None of this process_high label stuff. Let’s find all processes that don’t explicitly declare their cpu, memory, and time requirements:

def has_cpus(directives):
return len(directives.cpus) > 0
def has_memory(directives):
return len(directives.memory) > 0
def has_time(directives):
return len(directives.time) > 0
def is_explicit(process):
return has_cpus(process.directives) and has_memory(process.directives) and has_time(process.directives)
def rule_explicit_resource_requirements(module):
for process in module.processes:
if not is_explicit(process):
fatal(process.name)

We used the fatal function to bail out immediately. RefTrace returns a non-zero exit code when linting fails.

A Gentler Approach

A downside of the above approach is that we don’t know which directive is missing. We can use the error function to fail the lint rule, but continue execution:

def rule_explicit_resource_requirements_gentle(module):
for process in module.processes:
if not has_cpus(process.directives):
error(process.name, "missing CPUs directive")
if not has_memory(process.directives):
error(process.name, "missing memory directive")
if not has_time(process.directives):
error(process.name, "missing time directive")