Latest version DETR'PROK shell script. Perl script for split of extended annotations and used by DETR'PROK.

What is DET'RPROK ?

DET'RPROK (DETection of non-conding Rna in PROKaryotes) is a computational pipeline that takes as input a mapping of deep sequencing reads (sorted bam file) and performs successive steps of clustering, comparison with existing annotation (gff input file) and identification of transcribed non-coding fragments classified into putative 5'UTRs (untranslated regions of mRNAs), sRNAs (independent small RNA genes) and asRNAs (transcripts produced from the antisense strand of genes).

Author: Fabien Alfred

Please cite: Toffano-Nioche C., Luo Y., Kuchly C., Wallon C., Steinbach D., Zytnicki M., Jacq A., Gautheret D. (2013) Detection of non-coding RNA in bacteria and archaea using the DETR'PROK Galaxy pipeline.Methods. 63(1):60-5 (abstract).


Usage: -bam BAMfile -gff GFFfile -read_len int [OPTIONS]

BEDtools (v2.25.0) toolset for genome arithmetic
SAMtools (v1.3.1) utilities for manipulating alignments in the SAM format
DETR'PROK was developped with shell (v4.2.10), awk (v1.3.3), and perl (v5.12.4)

Output: Return lists (GFF file) of 5'UTRs, asRNAs and sRNAs, from a reads alignment (BAM file) and an annotations file (GFF) by feature manipulation using BEDtools.

-bam BAM file name (mandatory)
-gff GFF file name (mandatory)
-read_len read length (mandatory)

-asrna_cov minimal asRNA read coverage (default 10)
-asrna_min_reads minimal number of reads for asRNA (default 10)
-asrna_min_size minimal size for asRNA (default 50)
-clust_gap maximal gap between reads in a cluster (default 20)
-op_gap maximal intergenic distance within operon (default 150)
-rm_tmp remove temporary files (default true)
-rna_gap maximal gap between a cluster and a CDS for definition of extended annotation (default 25)
-rna_merge maximal distance to merge independent RNA candidates (default 50)
-srna_cov minimal sRNA read coverage (default 5)
-srna_min_reads minimal number of reads for sRNA (default 10)
-srna_min_size minimal size for sRNA (default 50)
-srna_inclusion minimal faction of overlap between sRNA and feature to be an sRNA candidate (default 0.0)
-utr_cov minimal 5'UTR read coverage (default 0)
-utr_min_reads minimal number of reads for 5'UTR (default 10)
-utr_min_size minimal size for 5'UTR (default 50)


may 2016:
changes to follow BEDtools release (version 2.25.0).
nov. 2015:
comments changes
oct. 2015:
adding -srna_inclusion option, modifying -feature_list selection in order to accept a more precise list of features, adapting to the actual version of BEDtools (version v2.17.0).
sept. 2014:
remplace the S-Mart tools suite by the BEDTools (version 1.14) one.
sept. 2013:
Galaxy workflow based on S-Mart tools suite (version 41) and some scripts.