Things about High-throughput Sequencing
Complementary approaches
Mainly four steps of illumina sequencing
- library construction
- cluster formation
- sequencing
- data analysis
Library prep
RNA-seq? Actually DNA sequencing
We cannot directly sequencing the RNA. We can only reverse transcript the RNA(AUCG) to DNA (ATCG) then do sequencing.
Also, there are different types of library prep based on our goals
- mRNA : polyA
- lncRNA or lncRNA+mRNA : deplete rRNA
- sRNA: choose library size (18-40 nt)
Sequencing wrong?
cluster identification
bubbles
synthesis errors (yellow => reason; red => result) (Upper one is Phasing; Below one is Pre-Phasing)
duplication
Multiplexed sequencing
Definition:
Libraries are commonly pooled together and sequenced simultaneously via a process known as multiplexing
How?
To distinguish individual libraries throughout this process, sample-specific sequences, called sample indexes or sample barcodes, are added to each fragment in the library during construction. (The barcode information is then used to computationally assign the sequence reads back to the individual libraries.)
The pooled libraries are sequenced simultaneously in a single sequencing run.
Benifits:
Reduce the cost of sequencing substantially and facilitates experimental scalability
Challenges:
Duplication: the fraction of mapped reads where any 2 reads share the same 5′ and 3′ coordinates (mostly arise during PCR-based library construction; or artifacts like the same template binds to multiple clusters on a flow cell)
May lead to false allele frequency representation
Minimizing duplicates in NGS experiments is critically important. [Picard (MarkDuplicates); SAMTools(rmdup) ]