#### Noted: To use this pipeline, make sure that you have already installed the RNAfold[1] and Randfold[2] programs, and RepeatMakser[3] program in case that you want to erase repeat-related microRNAs. You can specify the file names in upper cases but do remember to keep them consistent in the whole pipeline. The name of deep sequencing read should end with its sequenced number (N), like "XXXX-N". #### 1. Map deep sequencing reads (SEQ_FILE) to the genome (GNOME_FILE) using the Bowtie program. GENOME_MAP_FILE: the mapping result of deep sequencing reads to the genome. --- ./bowtie-build GNOME_FILE Genome ./bowtie -f -n 0 -a -B 1 Genome SEQ_FILE GENOME_MAP_FILE --- 2. Map deep sequencing reads (SEQ_FILE) to the known noncoding sequences excluding microRNAs (NCRNA_FILE) using the Bowtie program. GENOME_MAP_FILE: the mapping result of deep sequencing reads to known noncoding RNAs. --- ./bowtie-build NCRNA_FILE NcRNA ./bowtie -f -n 0 -a -B 1 NcRNA SEQ_FILE NCRNA_MAP_FILE --- 3. Filter out reads that either map to known noncoding sequences or have more than 20 loci in the genome or are sequenced less than 10 times. RETAINED_MAP_FILE: the mapping result in the genome after removing those present in known noncoding result or with more than 20 loci in the genome or sequenced less than 10 times RETAINED_SEQ_FILE: reads that don't map to known noncoding sequences and have <=20 loci in the genome. --- perl retain.informative.reads.pl GENOME_MAP_FILE NCRNA_MAP_FILE SEQ_FILE RETAINED_MAP_FILE RETAINED_SEQ_FILE --- 4. Define the functional spot and get two putative precursors (precurser.fas) and their coordinated structures (structure.data) for each spot. --- perl spot.define.and.precursor.extract.pl GNOME_FILE RETAINED_MAP_FILE --- output: precurser.fas && structure.data 5. Map deep sequencing reads retained in step 3 (RETAINED_SEQ_FILE) to the precursor (precursor.fas) using the Bowtie program. PRECURSOR_MAP_FILE: the mapping result of retained reads to precursors. --- ./bowtie-build precursor.fas Precursor ./bowtie -f -n 0 -a -B 1 Precursor RETAINED_SEQ_FILE PRECURSOR_MAP_FILE --- 6. Sort mapped hits according to the precursors. SORTED_MAP_FILE: sorted mapping result --- perl sort.precursor.mapres.pl PRECURSOR_MAP_FILE precursor.fas > SORTED_MAP_FILE --- 7. Predict microRNAs based on the structure features and read mappings in precursors. PREDICTED_MIR_FILE: result file containing all information about predicted microRNAs --- perl predict.microRNAs.pl structure.data SORTED_MAP_FILE > PREDICTED_MIR_FILE --- 8. Extract extended microRNA precursor sequences, scan repeats with the RepeatMasker program, obtain microRNA names related to repeats, and get final microRNA predictions. PRECURSOR_GENOME_LOC: genomic loctions of predicted precursors; EXTENDED_PRE_SEQFILE: precursor sequences with +- 200nt extended; REPEATS_LIBRARY: repeat library need by RepeatMasker program; EXTENDED_PRE_SEQFILE.out: result of RepeatMaker program; REPEAT_MIR_NAME: list of microRNA names related to repeats; FINAL_MIR_PREDICTION: final result of microRNA prediction pipeline. --- perl pre.mir.chrm.loc.pl < PREDICTED_MIR_FILE > PRECURSOR_GENOME_LOC perl extend.precurosr.ud200nt.pl GNOME_FILE PRECURSOR_GENOME_LOC > EXTENDED_PRE_SEQFILE RepeatMasker -lib REPEATS_LIBRARY -s -pa 10 -a -lcambig -poly -gff -u -xsmall -excln EXTENDED_PRE_SEQFILE perl repeat.related.mir.name.pl < EXTENDED_PRE_SEQFILE.out > REPEAT_MIR_NAME perl filter.repeat.mir.pl REPEAT_MIR_NAME < PREDICTED_MIR_FILE > FINAL_MIR_PREDICTION --- 1. Vienna RNA Package, http://www.tbi.univie.ac.at/~ivo/RNA/ 2. Randfold program, http://bioinformatics.psb.ugent.be/software/details/Randfold 3. RepeatMasker program, http://www.repeatmasker.org/ #### If you have any question, please contact Chong-Jian CHEN 08/09/2010 ####