rnaseq

Proof of concept of a RNA-Seq pipeline from reads to count matrix (including quality control) with Nextflow and additional example RNA-Seq analysis in R.

Prerequisites

Unix-like OS (Linux, macOS, etc.)
Java version 8
Docker engine 1.10.x (or later)

Necessary files

Reads to be mapped must be stored in compressed .fastq.gz file format in folder data

Additional necessary files

If the reads to be analyzed originate from a human RNA-Seq experiment, these additional 3 files must be stored in folder data:

Prebuild Hisat2 index for H. sapiens, release GRCh38

https://genome-idx.s3.amazonaws.com/hisat/grch38_snptran.tar.gz

Gencode GTF file, release 38 (GRCh38.p13)

https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_38/gencode.v38.chr_patch_hapl_scaff.annotation.gtf.gz

USCS BED file, assembly GRCh38/hg38, track GENCODE V38

http://genome.ucsc.edu/cgi-bin/hgTables

The BED file must be stored in *.annotation.bed.gz file format.

For the analysis of another species, the corresponding files for this organismus must be downloaded.

Quick start

Because this pipeline uses HISAT2 as the alignment program for mapping reads, this pipeline is for short reads only!

Example run:

nextflow run main.nf

The above example uses default parameter params.reads for single-end reads:

nextflow run main.nf --reads "data/*.fastq.gz"

For paired-end reads, additionally parameter params.singleEnd in nextflow.config must be changed to false. Then the input command must be:

nextflow run main.nf --reads "data/*_{1,2}*.fastq.gz"

Optionally, you can specify the Nextflow output directory with flag --outdir <folder>. By default, all resulting files will be saved in folder output and folder info will contain all information about the last run nextflow session.

Installation

Clone this repository with the following command:

git clone https://github.com/maxgreil/rnaseq && cd rnaseq

Then, install Nextflow by using the following command:

curl https://get.nextflow.io | bash

The above snippet creates the nextflow launcher in the current directory.

Finally pull the following Docker container:

docker pull maxgreil/rnaseq

Alternatively, you can build the Docker Image yourself using the following command:

cd docker && docker image build . -t maxgreil/rnaseq

Arguments

Optional Arguments

Argument	Usage	Description
--reads	<files>	Directory and glob pattern of input files
--outdir	<folder>	Directory to save output files

Documentation

This pipeline is designed to:

map given reads to a genome
create a count matrix of mapped reads for subsequent RNA-Seq analysis in R
do a quality control of the created files

Pipeline overview

The pipeline is built using Nextflow and processes data using the following steps:

hisat2 - map given reads to genome
samtools - create sorted BAM files from HISAT2 SAM files
picard - mark duplicates in sorted BAM files
featureCounts - count mapped reads to genomic features (exons)
deeptools - create BIGWIG from BAM for IGV
preseq - predict and estimate the complexity of genomic sequencing library
reseqc - comprehensive evaluation of used RNA-Seq data
FastQC - BAM file quality control
MultiQC - aggregate report, describing results of the whole pipeline

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
conf		conf
docker		docker
figures		figures
modules		modules
rnaseq_analysis		rnaseq_analysis
workflows		workflows
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.nf		main.nf
nextflow.config		nextflow.config

License

MaxGreil/rnaseq

Folders and files

Latest commit

History

Repository files navigation

rnaseq

Prerequisites

Necessary files

Additional necessary files

Table of Contents

Quick start

Installation

Arguments

Optional Arguments

Documentation

Pipeline overview

About

Topics

Resources

License

Stars

Watchers

Forks

Languages