Nextflow Linting Examples
RefTrace lets you express your linting business logic in Python.
It takes care of parsing the Nextflow DSL.
A Nextflow module is a file. It can contain:
- process definitions
- workflow definitions
- function definitions
Currently, RefTrace focuses on processes. They are the core unit of execution in a Nextflow pipeline.
Here’s how you can print the names of all the processes in a module:
A Nextflow process may have directives that inform its execution. For example, the cpus
directive specifies the number of CPUs required for the process to run. It reserves CPUs at runtime through a cluster resource manager.
Here’s how to print all the directives of a process:
The rnaseq
pipeline has both processes that consume many resources and processes that consume few. Here’s how we can find the processes that consume many resources:
Failing a Lint Rule Immediately
Maybe we have a business policy that processes must explicitly declare their resource requirements. None of this process_high
label stuff. Let’s find all processes that don’t explicitly declare their cpu, memory, and time requirements:
We used the fatal function to bail out immediately. RefTrace returns a non-zero exit code when linting fails.
A Gentler Approach
A downside of the above approach is that we don’t know which directive is missing. We can use the error function to fail the lint rule, but continue execution: