Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read

zoukankan html css js c++ java

Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read
Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform

基于ION PGM系统的短读测序和基于单分子的长读测序的全长HLA I类和II类基因多态性的参考等级表征，在PacBio平台上进行实时测序

Shingo Suzuki1, Swati Ranade2, Ken Osaki3, Sayaka Ito1, Atsuko Shigenari1, Yuko Ohnuki1, Akira Oka4, Anri Masuya5, John Harting2, Primo Baybayan2, Miwako Kitazume3, Junichi Sunaga3, Satoko Morishima6, Yasuo Morishima7, Hidetoshi Inoko5, Jerzy K. Kulski1,8 and Takashi Shiina1*

1Division of Basic Medical Science and Molecular Medicine, Department of Molecular Life Science, Tokai University School of Medicine, Isehara, Japan

2Molecular Biology Applications, Pacific Biosciences, Inc, Menlo Park, CA, United States

3Pacific Biosciences Division, Tomy Digital Biology Co., Ltd, Tokyo, Japan

4The Institute of Medical Sciences, Tokai University, Isehara, Japan

5GenoDive Pharma, Inc., Atsugi, Japan

6Division of Endocrinology, Diabetes, and Metabolism, Hematology, Rheumatology (Second Department of Internal Medicine), Graduate School of Medicine, University of the Ryukyus, Nishihara, Japan

7Department of Promotion for Blood and Marrow Transplantation, Aichi Medical University School of Medicine, Nagakute, Japan

8School of Psychiatry and Clinical Neurosciences, The University of Western Australia, Perth, WA, Australia

Although NGS technologies fuel advances in high-throughput HLA genotyping methods for identification and classification of HLA genes to assist with precision medicine efforts in disease and transplantation, the efficiency of these methods are impeded by the absence of adequately-characterized high-frequency HLA allele reference sequence databases for the highly polymorphic HLA gene system. Here, we report on producing a comprehensive collection of full-length HLA allele sequences for eight classical HLA loci found in the Japanese population. We augmented the second-generation short read data generated by the Ion Torrent technology with long amplicon spanning consensus reads delivered by the third-generation SMRT sequencing method to create reference grade high-quality sequences of HLA class I and II gene alleles resolved at the genomic coding and non-coding level. Forty-six DNAs were obtained from a reference set used previously to establish the HLA allele frequency data in Japanese subjects. The samples included alleles with a collective allele frequency in the Japanese population of more than 99.2%. The HLA loci were independently amplified by long-range PCR using previously designed HLA-locus specific primers and subsequently sequenced using SMRT and Ion PGM sequencers. The mapped long and short-reads were used to produce a reference library of consensus HLA allelic sequences with the help of the reference-aware software tool LAA for SMRT Sequencing. A total of 253 distinct alleles were determined for 46 healthy subjects. Of them, 137 were novel alleles: 101 SNVs and/or indels and 36 extended alleles at a partial or full-length level. Comparing the HLA sequences from the perspective of nucleotide diversity revealed that HLA-DRB1 was the most divergent among the eight HLA genes, and that the HLA-DPB1 gene sequences diverged into two distinct groups, DP2 and DP5, with evidence of independent polymorphisms generated in exon 2. We also identified two specific intronic variations in HLA-DRB1 that might be involved in rheumatoid arthritis. In conclusion, full-length HLA allele sequencing by third-generation and second-generation technologies has provided polymorphic gene reference sequences at a genomic allelic resolution including allelic variations assigned up to the field-4 level for a stronger foundation in precision medicine and HLA-related disease and transplantation studies.
Introduction

The Major Histocompatibility Complex (MHC) comprising HLA class I and class II molecules (antigens) are polymorphic cell-membrane-bound glycoproteins that regulate the immune response by presenting peptides of fragmented proteins to circulating cytotoxic and helper T lymphocytes, respectively. These peptide-presenting-antigens are encoded by the genomic regions within the MHC that in humans are known specifically as the Human Leukocyte Antigen (HLA) class I and class II gene loci. The HLA molecules are investigated continuously due to their crucial role in the regulation of innate and adaptive immune responses (1, 2), during rejection, graft-versus-host disease (GVHD) of hematopoietic stem cell transplants (3, 4) and the pathogenesis of numerous infectious and/or autoimmune diseases (5–8).

Conventional PCR-based genotyping approaches incorporating restriction fragment length polymorphism (PCR-RFLP) (9), single strand conformation polymorphism (PCR-SSCP) (10), sequence-specific oligonucleotides (PCR-SSO) (11), sequence-specific primers (PCR-SSP) (12), and Sanger sequencing-based typing (PCR-SBT) (13) have been used for HLA-testing in disease association and pre-transplantation (14–16) analysis. However, these methods are limited in their ability to decipher chromosomal phase (cis/trans) ambiguity and/or imprecise allele identification (17, 18) and may leave multiple pairs of HLA gene alleles unresolved. Moreover, the traditional methods focus only on the variations in the highly polymorphic regions of the HLA genes, and thus the majority of the assays only interrogate exons 2 and 3 or exons 2, 3, and 4 of the class I loci and exons 2 or exons 2 and 3 of class II. As a result, the genetic variations in the non-coding regions that regulate RNA expression levels (19–21), or the exonic coding sequences outside of the polymorphic domains have remained largely ignored. On the other hand, methods that use full-length gene sequences including the promoter-enhancer region, 5′ and 3′ untranslated regions (UTRs), as well as all exonic and intronic regions for HLA genotyping, would be more accurate for the discovery of new polymorphisms associated with disease susceptibility and transplantation outcome. To this end, however, not more than 10% of ~ 17,700 known alleles in the IPD-IMGT/HLA reference database [Release 3.31, https://www.ebi.ac.uk/ipd/imgt/hla/, (22)] are classified as full-length HLA gene sequences. Consequently, a comprehensive collection and classification of full-length HLA allele sequences including information from the “undetermined” regions of the HLA genes are desired to promote the future development of more accurate and reliable HLA genotyping methods.

During the past decade, next-generation sequencing (NGS) using mostly second-generation short-read sequencing technologies have employed exon sequencing for some or all of the HLA coding regions in an attempt to solve the phase ambiguity problem encountered by the conventional Sanger Sequencing-Based HLA genotyping methods (23). We previously reported on the development of super high resolution-sequence-based typing application using two NGS platforms, Ion PGM (Life Technologies) and GS Junior (Roche) (24–26). In both of the studies, long-range PCR amplicons from the promoter-enhancer region to 3′UTR for eight classical HLA loci, HLA-A, HLA-B, HLA-C, HLA-DRB1, HLA-DQA1, HLA-DQB1, HLA-DPA1, and HLA-DPB1 were shotgun sequenced. Other long-range PCR and NGS-based shotgun sequencing based HLA genotyping methods using the 454 GS-FLX (Roche) and the MiSeq/HiSeq (Illumina) platforms (18, 27–32) also can resolve many of the phase ambiguities. However, all of these methods are affected by the disadvantages associated with the short-read lengths of NGS technologies, which range from 150 to 500 bp. Although the short-read sequences can accurately separate the two phases in SNP dense regions, such as a polymorphic exon, the phasing is impossible in SNP deficient regions (Figure 1). Specifically, it is necessary to have at least two single nucleotide variants (SNVs) in one short-read sequence to separate both phases. The full-length HLA allele sequences are especially difficult to phase in the HLA-DQA1, HLA-DPA1, and HLA-DPB1 loci because their SNP densities are much less than in the other HLA loci (24), making it unlikely that two SNPs will appear in one short-read sequence.
查看全文

相关阅读:
Java线程--LockSupport使用
 Java线程--Callable使用
 Java线程--BlockingQueue使用
 Java线程--CopyOnWrite容器使用
 Java线程--ReentrantLock使用
 Java线程--Atomic原子类使用
 Java线程--ThreadPoolExecutor使用
 Java线程--ForkJoinPool使用
 自己动手搭建SSM
为什么选择SSM+Redis框架开发模式？

原文地址：https://www.cnblogs.com/wangprince2017/p/13756039.html

Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read

Reference Grade Characterization of Polymorphisms in Full-Length HLA Class I and II Genes With Short-Read Sequencing on the ION PGM System and Long-Reads Generated by Single Molecule, Real-Time Sequencing on the PacBio Platform

Introduction