The advent of long-read sequencing technology has broadened the application of genomics and transcriptomics, greatly overcoming the limitations of short-read length. However, the most advanced short-read data processing pipeline cannot be fully adapted to the needs of long-read Seq data analysis. CD Genomics combines sequencing data analysis with omics, not only developing a standardized long-read Seq bioinformatics pipeline, but also enabling advanced data mining to provide integration services for your genomic, transcriptomic and epigenomic data, supporting new discoveries that were not possible before.
Current long-read sequencing platforms (PacBio SMRT and Oxford Nanopore sequencing) have overcome limitations in accuracy and throughput, producing reads typically exceeding 10 kb. While enabling improved de novo assembly, identification of structural variants and transcriptional isoforms, and performing whole human genome dating to identify co-inherited alleles, haplotype information, and de novo mutations, they also place greater demands on the analytical tools used to analyze length data and pipeline for analyzing length data.
In particular, the current mainstream long-read sequencing, PacBio SMRT and ONT technology, produce different data than that of next-generation sequencing. However, what is the same is that both short-read and long-read sequencing technologies generate large amounts of data, are computationally intensive, and have complex processes. In addition to the standard analysis workflow, multiple data processing combinations and tools need to be developed for different purposes or research projects. This undoubtedly increases the difficulty and computational volume of data analysis again.
Overview of long-read sequencing data analysis tools and pipelines (Amarasinghe S L et al., 2022)
The principles of PacBio SMRT and Nanopore sequencing are completely different and therefore different tools have been developed for data analysis.
CD Genomics has developed custom solutions for long-read seq data analysis. We provide bioinformatics analysis services combining short-read and long-read seq data for different needs. We have the ability to provide the solutions and pipelines of complex bioinformatics for data analysis to support the acquisition of new information-rich insights to help you discover informative features from massive data to advance your project or know the next design step.
Primary/secondary/tertiary sequencing data analysis are provided, from the evaluation of raw sequencing data, to alignment and in-depth analysis and biological interpretation.
Our long-read sequencing data analysis services (but not limited to)
Primary Analysis | Basecalling |
Quality control (FastQC / PycoQC / MinIONQC) | |
Read filtering / trimming / adapter removal | |
Secondary Analysis | Genome Assembly |
Consensus Sequences & Error correction | |
Variants Calling | |
Tertiary Analysis | Structural Variant Analysis |
SNP/CNV Analysis | |
Gene Regulation Analysis | |
Alternative Splicing Analysis | |
Base Modification analysis | |
Gene Ontology Enrichment Analysis | |
Pathway Enrichment Analysis | |
Protein-Protein Interaction Network | |
Co-Expression Network Analysis |
Microbial Genomics and Bioinformatics
NGS Data Analysis
CD Genomics provides accurate and cost-effective bioinformatics analysis services. We have a proven analytical pipeline in database mining and analysis. We can handle many file types, including FASTQ, BED, SAM, BAM, VCF, etc.
Even if you don't have data yet, we can help you plan your research, provide expert experimental advice, and can schedule your data generation.
If you have any questions about our bioinformatics services, such as how we promote your research and build your reports, please contact us for more detailed information.
Reference