Introduction
T
|
ranscriptome sequencing (RNA-seq) helps to find gene
expression, reconstruct the transcripts, SNP detection and alternate splicing.
There are two assembly approaches i.e. Genome dependent (alignment to reference
genome) and genome independent (de novo assembly). De-novo assembly is the process
of constructing a reference genome sequence for a newly sequenced organism. And
it is necessary because for reference based sequencing reference genome is
required and this approach is not useful for organism having partial or missing
reference genome. Secondly genome sequences are incomplete, fragmented and
altered. The disadvantage of genome
assembly over transcriptome assembly is its inability to account for structural
alterations of mRNA transcripts, i.e. alternative splicing.
In RNA-seq first mRNA is extracted and purified from
cell and then reverse transcribed to create cDNA library with the help of
high-throughput sequencing techniques. This cDNA is fragmented into various
lengths.
Algorithm behind assembly is so simple, cDNA sequence
reads are assembled into transcripts by using any short read assembler, here we
are using ‘SOAPdenovo-Trans’. Short read assemblers generally one of two basic
algorithms: overlap graphs – compute pairwise overlap between the reads and
capture this information in a graph. De-Bruijin graph: breaks the reads into
smaller sequences of DNA, called K-mers, and captures overlaps of length k-1
between these k-mers not between reads. As the number of reads are growing day
by day and it is getting difficult to determine which read should be joined to
contiguous sequence contigs, so, de-Bruijin is the solution of this problem in
which a node is defined by fixed length of k-mer and nodes are connected by
edges, if they overlap by k-1 nucleotide.
De novo assembly work flow
Input files
Paired-end
RNA-seq reads of P.chabaudi in fastq
format
File 1: ERR306016_1.fastq
File 2: ERR306016_2.fastq
Exercise 1. Quality control check using fastQC
Shell command
for running fastQC on the read file:
./fastqc ERR306016_1.fastq
A.
Removal of adaptor sequences using FASTX-Toolkit
Shell command:
./fastx_clipper -a -l -C -i ERR306016_1.fastq -o new_ERR306016_1.fastq
B.
Run Quality check again on trimmed files i.e.
new_ERR306016_1.fastq and new_ERR306016_2.fastq
Shell command:
./fastqc
Repeat Exercise
1 for ERR306016_2.fastq
Exercise 2. De novo assembly of reads as well as RPKM
calculation using
SOAPdenovo-Trans
Shell command:
./SOAPdenovo-trans-127mer all -k 23 -R
-s sample.conf -o
plas_R
Clustering of
resulted contigs or transcriptome using TGICL
Shell command:
./tgicl plas_R.contig
Result will be clustered contig
indexes and singletons indexes (contigs which are not in cluster) and asm_1
directory containing CAP3 assembly result. If no singleton file form means all
contigs are clustered. Contig sequence and Singlets sequence files inside the
asm_1 directory are merged together using linux 'cat' command
Shell command:
cat contig singlets > contig_singlets.txt
Retreive the indexes from the
cluster file by using 'grep' and ‘sed’
grep -v 'CL' plas_R.contig_clusters
> contig_index.txt
Substitute the tab space with
next line using 'sed' command.
sed 's/\t/\n/g' contig_index.txt > contig_index_new.txt
Retreive the clustered contig
sequences from SOAPdenovo-trans assembled contig file 'plas_R.contig'
grep -Fxv -f contig_index_new.txt plas_R.contig > final_singletons.txt
Merge the 'contig_singlets.txt'
into "final_singletons.txt" using ‘cat’ command
cat contig_singlets.txt
final_singletons.txt > final_assembled_transcriptome.txt
Exercise 3. Read mapping using SeqMap
Shell command:
./seqmap 2
ERR306016_1.fastq plas_R_contig.fasta > seqmap_output.txt /eland:3
/available_memory:8000
Exercise 4. Annotation using standalone ncbi-BLAST and
online tool KOBAS
Shell command:
./makeblastdb -in /home/user/Desktop/NGS_Workshop/jyoti/kobas_data/p.chabaudi.pep.fasta
-input type 'fasta' -title 'plasmo_db' -dbtype 'prot'
./blastx -db /home/user/Desktop/NGS_Workshop/jyoti/kobas_data/p.chabaudi.pep.fasta
-query /home/user/Desktop/NGS_Workshop/jyoti/plas_R.contig/
-out /kobas_input.fasta/ -outfmt 6
Run KOBAS using
'kobas_input.fasta'
KOBAS is
available on the following url:
http://kobas.cbi.pku.edu.cn/home.do
No comments:
Post a Comment