6. Snakemake¶
This tutorial was developed assuming a unix-like architecture (Ubuntu 14.04).
6.1. Introduction¶
6.1.1. Snakemake concepts¶
- Inspired by GNU Make: system of rules & targets
- A rule is the recipe for a target
- Rules are combined by matching their inputs and outputs
6.1.2. Installation¶
sudo apt-get -y install python3-pip
sudo pip3 install snakemake
6.2. Downloads for practical exercises¶
6.2.1. Ubuntu libraries¶
sudo apt-get -y install zlib1g-dev # samtools (1-6)
sudo apt-get -y install libncurses5-dev libncursesw5-dev # samtools (1-6)
sudo apt-get -y install r-base-core # Rsamtools (4-6)
sudo pip3 install "rpy2<2.5.6" # Rsamtools (4-6)
sudo pip3 install pyyaml # Config management (5-6)
6.2.2. Tuto material¶
wget https://github.com/rioualen/gene-regulation/archive/1.0.tar.gz
tar xvzf 1.0.tar.gz
cd gene-regulation-1.0/doc/snakemake_tutorial
6.2.3. Samtools¶
wget -nc http://sourceforge.net/projects/samtools/files/samtools/1.3/samtools-1.3.tar.bz2
bunzip2 -f samtools-1.3.tar.bz2
tar xvf samtools-1.3.tar
cd samtools-1.3
make
sudo make install
cd gene-regulation-1.0/doc/snakemake_tutorial
6.3. Demo workflows¶
6.3.1. Workflow 1: Rules and targets¶
- Only the first rule is executed by default
- Rule
all
defines the target - Rule
sam_to_bam
automatically produces the target
# file: workflow1.py
rule all:
input: "GSM521934.bam"
rule sam_to_bam:
input: "GSM521934.sam"
output: "GSM521934.bam"
shell: "samtools view {input} > {output}"
In the terminal:
snakemake -s workflow1/workflow1.py
6.3.2. Workflow 2: Introducing wildcards¶
- Wildcards can replace variables
- Workflow applies to list of files or samples
- Use of the expand function
# file: workflow2.py
SAMPLES = ["GSM521934", "GSM521935"]
rule all:
input: expand("{sample}.bam", sample = SAMPLES)
rule sam_to_bam:
input: "{file}.sam"
output: "{file}.bam"
shell: "samtools view {input} > {output}"
In the terminal:
snakemake -s workflow2/workflow2.py
6.3.3. Workflow 3: Keywords¶
- Rules can use a variety of keywords
- An exhaustive list can be found here
# file: workflow3.py
SAMPLES = ["GSM521934", "GSM521935"]
rule all:
input: expand("{sample}.bam", sample = SAMPLES)
rule sam_to_bam:
input: "{file}.sam"
output: "{file}.bam"
params:
threads = 2 log: "{file}.log"
benchmark: "{file}.json"
shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"
In the terminal:
snakemake -s workflow3/workflow3.py
6.3.4. Workflow 4: Combining rules¶
- Dependencies are handled implicitly, by matching filenames
- Commands can be executed by keywords
run
orshell
- Several languages:
R
,bash
,python
# file: workflow4.py
from snakemake.utils
import R
SAMPLES = ["GSM521934", "GSM521935"]
rule all:
input: expand("{sample}_sorted.bam", sample = SAMPLES)
rule sam_to_bam:
input: "{file}.sam"
output: "{file}.bam"
params:
threads = 2
log: "{file}.log"
benchmark: "{file}.json"
shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"
rule bam_sorted:
input: "{file}.bam"
output: "{file}_sorted.bam"
run:
R("""
library(Rsamtools)
sortBam("{input}", "{output}")
""")
In the terminal:
snakemake -s workflow4/workflow4.py
6.3.5. Workflow 5: Configuration file¶
- Can be in
json
or inyml
format - Acessible through the global variable config
# file: workflow5.py
from snakemake.utils
import R
configfile: "config.yml"
SAMPLES = config["samples"].split()
OUTDIR = config["outdir"]
rule all:
input: expand(OUTDIR + "{sample}_sorted.bam", sample = SAMPLES)
rule sam_to_bam:
input: "{file}.sam"
output: "{file}.bam"
params:
threads = config["samtools"]["threads"]
log: "{file}.log"
benchmark: "{file}.json"
shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"
rule bam_sorted:
input: "{file}.bam"
output: "{file}_sorted.bam"
run:
R("""
library(Rsamtools)
sortBam("{input}", "{output}")
""")
# file: config.yml
samples: "GSM521934 GSM521935"
outdir: "gene-regulation-1.0/doc/snakemake_tutorial/results/"
samtools:
threads: "2"
In the terminal:
snakemake -s workflow5/workflow5.py
6.3.6. Workflow 6: Separated files¶
- The keyword
include
is used to import rules
# file: workflow6.py
from snakemake.utils
import R
configfile: "config.yml"
SAMPLES = config["samples"].split()
OUTDIR = config["outdir"]
include: "sam_to_bam.rules"
include: "bam_sorted.rules"
rule all:
input: expand(OUTDIR + "{sample}_sorted.bam", sample = SAMPLES)
# file: sam_to_bam.rules
rule sam_to_bam:
input: "{file}.sam"
output: "{file}.bam"
params:
threads = config["samtools"]["threads"]
log: "{file}.log"
benchmark: "{file}.json"
shell: "(samtools view -bS --threads {params.threads} {input} > {output}) > {log}"
# file: bam_sorted.rules
rule bam_sorted:
input: "{file}.bam"
output: "{file}_sorted.bam"
run:
R("""
library(Rsamtools)
sortBam("{input}", "{output}")
""")
In the terminal:
snakemake -s workflow6/workflow6.py
6.3.7. Workflow 7: The keyword Ruleorder todo¶
6.3.8. Workflow 8: Combining wildcards with zip¶
6.3.9. Workflow 9: Combining wildcards selectively¶
6.3.10. Workflow 10: Using regular expression in wildcards¶
6.3.11. Other¶
- temp()
- touch()
- target/all
6.4. Bonus: generating flowcharts¶
snakemake -s workflow6/workflow6.py --dag | dot -Tpng -o d.png
snakemake -s workflow6/workflow6.py --rulegraph | dot -Tpng -o r.png
include img