DETR'PROK Home

Latest version

DETRPROK_2.1.3.sh: DETR'PROK shell script.
splitTranscriptGff.pl: Perl script for split of extended annotations and used by DETR'PROK.

What is DET'RPROK ?

DET'RPROK (DETection of non-conding Rna in PROKaryotes) is a computational pipeline that takes as input a mapping of deep sequencing reads (sorted bam file) and performs successive steps of clustering, comparison with existing annotation (gff input file) and identification of transcribed non-coding fragments classified into putative 5'UTRs (untranslated regions of mRNAs), sRNAs (independent small RNA genes) and asRNAs (transcripts produced from the antisense strand of genes).

Author: Fabien Alfred

Please cite: Toffano-Nioche C., Luo Y., Kuchly C., Wallon C., Steinbach D., Zytnicki M., Jacq A., Gautheret D. (2013) Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.Methods. 63(1):60-5 (abstract).

Synopsis

Usage: DETRPROK.sh -bam BAMfile -gff GFFfile -read_len int [OPTIONS]

Requirements:
BEDtools (v2.25.0) toolset for genome arithmetic
SAMtools (v1.3.1) utilities for manipulating alignments in the SAM format
DETR'PROK was developped with shell (v4.2.10), awk (v1.3.3), and perl (v5.12.4)

Output: Return lists (GFF file) of 5'UTRs, asRNAs and sRNAs, from a reads alignment (BAM file) and an annotations file (GFF) by feature manipulation using BEDtools.

Parameters:
[-bam|-bed] BAM or BED file name (mandatory)
-gff GFF file name (mandatory)
-read_len read length (mandatory)

Options:
-asrna_cov minimal asRNA read coverage (default 10)
-asrna_min_reads minimal number of reads for asRNA (default 10)
-asrna_min_size minimal size for asRNA (default 50)
-clust_gap maximal gap between reads in a cluster (default 20)
-op_gap maximal intergenic distance within operon (default 150)
-rm_tmp remove temporary files (default true)
-rna_gap maximal gap between a cluster and a CDS for definition of extended annotation (default 25)
-rna_merge maximal distance to merge independent RNA candidates (default 50)
-srna_cov minimal sRNA read coverage (default 5)
-srna_min_reads minimal number of reads for sRNA (default 10)
-srna_min_size minimal size for sRNA (default 50)
-srna_inclusion minimal faction of overlap between sRNA and feature to be an sRNA candidate (default 0.0)
-utr_cov minimal 5'UTR read coverage (default 0)
-utr_min_reads minimal number of reads for 5'UTR (default 10)
-utr_min_size minimal size for 5'UTR (default 50)

History

april 2023:
DETRPROK_2.1.3.sh
add -bed option to manage paired-end sequencing by bed file input ; fix FS in awk commands
may 2016:
DETRPROK_2.1.2.sh
changes to follow BEDtools release (version 2.25.0).
nov. 2015:
DETRPROK_2.1.1.sh
comments changes
oct. 2015:
DETRPROK_2.1.sh
adding -srna_inclusion option, modifying -feature_list selection in order to accept a more precise list of features, adapting to the actual version of BEDtools (version v2.17.0).
sept. 2014:
DETRPROK_2.0.sh
remplace the S-Mart tools suite by the BEDTools (version 1.14) one.
sept. 2013:
Galaxy workflow based on S-Mart tools suite (version 41) and some scripts.