RNA-seq has revolutionized the research community approach to studying gene expression. In fact, this technology has opened up the possibility of quantifying the expression level of all genes at once, allowing an ex post (rather than ex ante) selection of candidates that could be interesting for a certain study. The continuous drop in costs and the independence of library preparation protocols from the model species, have convinced the stakeholders to invest in this technology, by creating consortia able to produce large disease-specific datasets that, in turn, fostered transcriptomic research at a population level. Among many others, a virtuous example in this sense is The Cancer Genome Atlas. In a short time RNA-seq has moved from a technology to merely quantify the expression of genes to a powerful tool to: discover new transcripts (via de novo transcriptome assembly), characterize alternative splicing variants or new cell types (through single cell RNA sequencing). Leveraging on RNA-seq for daily diagnostic activities is no longer a dream but a consolidated reality. Although established best practices exist, managing RNA-seq data is not easy. Before sequencing, it is essential to carefully plan library preparation in order to minimize downstream analysis biases. Budget optimization is another important factor. Sequencing multiple samples increases statistical power and reduces undesired side effects due to noise and variability. However, more samples imply higher costs. Multiplexing has proved to be an effective tool to limit the budget without sacrificing the number of samples. DNA barcoding enables combining up to 96 samples into a single line, trading a lower sequencing depth for a higher number of sequenced samples. The downside of this technique is the increased burden of data analysis to achieve the same accuracy that would be achieved with a richer input. Downstream sequencing, fastq data must be validated and processed to distill raw reads into a quantitative measure of gene expression. While validation is somehow a standard procedure, read count depends on the type of RNA (microRNA, etc.) and on the target application. Usually reads are: subjected to adapter removal, aligned against a reference genome, grouped by functional unit (e.g., transcripts, genes, microRNA, etc.), normalized and counted. Subsequent analyses can vary dramatically according to the application. In the simplest setting, the subset of genes responsible for the differences on the phenotype between two populations should be discovered. In other cases, one may want to build the co-expression (or reverse expression) network in order to find interacting genes or a pathway related to a certain phenotype. Other applications involve the discovery of unknown cell types, the organization of cell types in homogeneous families, the identification of new molecules (e.g., new microRNA, long non-coding RNA, etc.), or the annotation of new variants or alternative splicing.

Geraci, F., Saha, I., Bianchini, M. (2020). Editorial: RNA-Seq Analysis: Methods, Applications and Challenges, 11, 1-3 [10.3389/fgene.2020.00220].

Editorial: RNA-Seq Analysis: Methods, Applications and Challenges

Bianchini, M.
2020-01-01

Abstract

RNA-seq has revolutionized the research community approach to studying gene expression. In fact, this technology has opened up the possibility of quantifying the expression level of all genes at once, allowing an ex post (rather than ex ante) selection of candidates that could be interesting for a certain study. The continuous drop in costs and the independence of library preparation protocols from the model species, have convinced the stakeholders to invest in this technology, by creating consortia able to produce large disease-specific datasets that, in turn, fostered transcriptomic research at a population level. Among many others, a virtuous example in this sense is The Cancer Genome Atlas. In a short time RNA-seq has moved from a technology to merely quantify the expression of genes to a powerful tool to: discover new transcripts (via de novo transcriptome assembly), characterize alternative splicing variants or new cell types (through single cell RNA sequencing). Leveraging on RNA-seq for daily diagnostic activities is no longer a dream but a consolidated reality. Although established best practices exist, managing RNA-seq data is not easy. Before sequencing, it is essential to carefully plan library preparation in order to minimize downstream analysis biases. Budget optimization is another important factor. Sequencing multiple samples increases statistical power and reduces undesired side effects due to noise and variability. However, more samples imply higher costs. Multiplexing has proved to be an effective tool to limit the budget without sacrificing the number of samples. DNA barcoding enables combining up to 96 samples into a single line, trading a lower sequencing depth for a higher number of sequenced samples. The downside of this technique is the increased burden of data analysis to achieve the same accuracy that would be achieved with a richer input. Downstream sequencing, fastq data must be validated and processed to distill raw reads into a quantitative measure of gene expression. While validation is somehow a standard procedure, read count depends on the type of RNA (microRNA, etc.) and on the target application. Usually reads are: subjected to adapter removal, aligned against a reference genome, grouped by functional unit (e.g., transcripts, genes, microRNA, etc.), normalized and counted. Subsequent analyses can vary dramatically according to the application. In the simplest setting, the subset of genes responsible for the differences on the phenotype between two populations should be discovered. In other cases, one may want to build the co-expression (or reverse expression) network in order to find interacting genes or a pathway related to a certain phenotype. Other applications involve the discovery of unknown cell types, the organization of cell types in homogeneous families, the identification of new molecules (e.g., new microRNA, long non-coding RNA, etc.), or the annotation of new variants or alternative splicing.
2020
Geraci, F., Saha, I., Bianchini, M. (2020). Editorial: RNA-Seq Analysis: Methods, Applications and Challenges, 11, 1-3 [10.3389/fgene.2020.00220].
File in questo prodotto:
File Dimensione Formato  
Ed_RNA-Seq.pdf

accesso aperto

Tipologia: PDF editoriale
Licenza: Creative commons
Dimensione 167.34 kB
Formato Adobe PDF
167.34 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11365/1111122