Home>Schools

  • Complex
  • Title
  • Author
  • Keyword
  • Abstract
  • Scholars
Search

[期刊]

A hybrid correcting method considering heterozygous variations by a comprehensive probabilistic model.

Share
Edit Delete Claim

Author:

Liu Jiaqi (Liu Jiaqi.) | Wang Jiayin (Wang Jiayin.) | Xiao Xiao (Xiao Xiao.) | Unfold

Indexed by:

PubMed SCIE CPCI-S Download Full text

Abstract:

The emergence of the third generation sequencing technology, featuring longer read lengths, has demonstrated great advancement compared to the next generation sequencing technology and greatly promoted the biological research. However, the third generation sequencing data has a high level of the sequencing error rates, which inevitably affects the downstream analysis. Although the issue of sequencing error has been improving these years, large amounts of data were produced at high sequencing errors, and huge waste will be caused if they are discarded. Thus, the error correction for the third generation sequencing data is especially important. The existing error correction methods have poor performances at heterozygous sites, which are ubiquitous in diploid and polyploidy organisms. Therefore, it is a lack of error correction algorithms for the heterozygous loci, especially at low coverages.In this article, we propose a error correction method, named QIHC. QIHC is a hybrid correction method, which needs both the next generation and third generation sequencing data. QIHC greatly enhances the sensitivity of identifying the heterozygous sites from sequencing errors, which leads to a high accuracy on error correction. To achieve this, QIHC established a set of probabilistic models based on Bayesian classifier, to estimate the heterozygosity of a site and makes a judgment by calculating the posterior probabilities. The proposed method is consisted of three modules, which respectively generates a pseudo reference sequence, obtains the read alignments, estimates the heterozygosity the sites and corrects the read harboring them. The last module is the core module of QIHC, which is designed to fit for the calculations of multiple cases at a heterozygous site. The other two modules enable the reads mapping to the pseudo reference sequence which somehow overcomes the inefficiency of multiple mappings that adopt by the existing error correction methods.To verify the performance of our method, we selected Canu and Jabba to compare with QIHC in several aspects. As a hybrid correction method, we first conducted a groups of experiments under different coverages of the next-generation sequencing data. QIHC is far ahead of Jabba on accuracy. Meanwhile, we varied the coverages of the third generation sequencing data and compared performances again among Canu, Jabba and QIHC. QIHC outperforms the other two methods on accuracy of both correcting the sequencing errors and identifying the heterozygous sites, especially at low coverage. We carried out a comparison analysis between Canu and QIHC on the different error rates of the third generation sequencing data. QIHC still performs better. Therefore, QIHC is superior to the existing error correction methods when heterozygous sites exist.

Keyword:

Error correction method Heterozygous variant Hybrid correction method PacBio sequencing Probabilistic model Sequencing analysis Sequencing error

Author Community:

  • [ 1 ] [Liu Jiaqi]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 2 ] [Wang Jiayin]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. wangjiayin@mail.xjtu.edu.cn
  • [ 3 ] [Xiao Xiao]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 4 ] [Lai Xin]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 5 ] [Dai Daocheng]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 6 ] [Zhang Xuanping]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 7 ] [Zhu Xiaoyan]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 8 ] [Zhao Zhongmeng]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 9 ] [Wang Juan]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China
  • [ 10 ] [Li Zhimin]School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an, 710048, China. zhiminli@annoroad.com

Reprint Author's Address:

Show more details

Source :

BMC genomics

ISSN: 1471-2164

Year: 2020

Issue: Suppl 10

Volume: 21

Page: 753

3 . 9 6 9

JCR@2020

3 . 9 6 9

JCR@2020

ESI Discipline: MOLECULAR BIOLOGY & GENETICS;

ESI HC Threshold:108

CAS Journal Grade:2

Cited Count:

WoS CC Cited Count: 0

SCOPUS Cited Count: 1

30 Days PV: 2

FAQ| About| Online/Total:121/217233195
Address:XI'AN JIAOTONG UNIVERSITY LIBRARY(No.28, Xianning West Road, Xi'an, Shaanxi Post Code:710049) Contact Us:029-82667865
Copyright:XI'AN JIAOTONG UNIVERSITY LIBRARY Technical Support:Beijing Aegean Software Co., Ltd.