By Savita Jayaram Ph.D., Bioinformatics Scientist; Keshav Bhojak, Bioinformatics Analyst, MedGenome Inc
It is a miracle of billion-years evolution that vertebrates, including us – the humans, are constantly thwarting attacks from an ever-expanding universe of foreign invaders such as bacteria, viruses and other pathogenic organisms throughout our lifetime. The miracle that makes this happen is our adaptive immune system, comprising of B and T cells, and a host of other regulatory cell-types that function as a central command to activate, mobilize and eventually suppress the army of rogue killers, once the threat is eliminated. The puzzle of how our immune system recognize new organisms/biomolecules that may not have existed when we were born was revealed by the work of Susumu Tonegawa and others who discovered that the recognition mechanism is mediated by a family of highly diverse immune receptors expressed by the cells of the adaptive immune system – B, T and antigen-presenting cells (APCs) (Figure 1). This diversity enables the immune system to identify and mount an attack against any foreign element invading from outside the body (bacteria, virus), or generated inside (tumor cells) protecting us from deadly diseases. It is estimated that there are 109 – 1011 unique B cell receptors and 106 – 108 T cell receptors and about 301 known human leukocyte antigen (HLA) proteins expressed by the APCs in healthy humans.
Deeper insight into the immune-receptor diversity became possible with the advent of NGS and powerful bioinformatics and computational tools. Through these sequencing efforts, we know that two individuals, including monozygotic twins, do not share identical immune receptor repertoire, although each of us is capable of mounting an immune response against common pathogens indicating that there are enormous redundancy and plasticity in the recognition process. Further, the receptor repertoire undergoes significant expansion and contraction during diseases and these changes have led to the development of novel diagnostics in the area of autoimmune diseases.
In this essay, I will give an overview of the immune receptors and discuss how MedGenome is leveraging the NGS data of immune receptor repertoire and developing tools that will not only enhance the fundamental knowledge of how our immune system works but also how the diversity can be interrogated to discover biomarkers of productive immune response eliminating pathogens, versus adverse response targeting body’s own cells leading to autoimmunity.
Immune repertoire diversity – how is it generated?
Immune receptors expressed by the B cells (B cell receptors, BCRs) and T cells (T cell receptors, TCRs) are formed during B cell development in the bone marrow and T cell development in the thymus. BCRs resemble the structure of an antibody with heavy and light chains and are membrane-bound (Figure 2A). TCRs are heterodimers of α and β polypeptide chains (αβ TCR), or γ and δ chains (γδ TCR). More than 90% of TCRs are αβ TCR (Figure 2B), while the rest are γδ TCRs. Both the receptors are created by recombining multiple gene segments residing at multiple genomic loci in the germline DNA that are brought within a coding sequence during B and T cell development. The gene segments, referred to as the variable (V) gene, the joining (J) gene and an additional diversity (D) gene (for heavy-chain and β-chain) followed by a constant (C) gene is added to all receptors. Figure 2A shows the assembly of a full-length BCR while Figure 2B shows the mechanism that generates a functional αβ TCR; following V(D)J recombination of the V, D, J and C genes. Receptor diversity arises at two levels. First, a combinatorial diversity in which recombination brings one of the 40-50 ‘V’ gene segments with a ‘D’ and ‘J’ gene segments at the germline followed by splicing of the C gene at the RNA-level.
The second level of diversity is introduced by random addition/deletion of nucleotides between gene segments (junctional diversity). Combinatorial and junctional diversity creates the final diversity of an individual’s immune receptor repertoire and explains why two individuals cannot share identical repertoire. The sequence spanning the V-D-J junction is the ‘hypervariable’ segment, which is unique to each TCR-β chain and is called the complementarity determining region 3 (CDR3). The CDR3 region recognizes the antigen. The diversity of the TCR repertoire is analysed by enumerating the unique number of CDR3 sequences present in a T cell pool. Earlier experiments using bulk RNAseq data quantitated the enrichment of the CDR3 region. However, with the recent developments in single-cell RNA sequencing technology (scRNAseq), the transcriptomes of thousands of cells can be processed simultaneously, bringing an extra dimension to the analysis of TCRs from the scRNAseq experiments (Ref 2). Identification of each cell’s unique TCRs using single cell technology now enables the pairing of α and β heterodimers that was not possible from bulk RNA sequencing. The enormous diversity of the TCR repertoire represents a major analytical challenge, which has led to the development of specialized software that aims to characterize the TCR repertoire in greater detail.
Applications of immune repertoire profiling
Immune repertoire profiling holds great potential not only for understanding the development of the normal immune response but also in providing insights into disease mechanisms leading to the development of new therapeutics and treatment modalities in infectious diseases, autoimmunity, and immuno-oncology. There is now increasing evidence that the BCR (and TCR) repertoires can serve as a proxy for aberrant immune response to many infections and autoimmune conditions, that can be monitored through patient blood/plasma, helping to gain a better understanding of their aetiology and progression (Figure 3).
Recent studies have demonstrated that TCR diversity enables monitoring and predicting response to immunotherapy drugs and the occurrence of immune-related adverse effects. Studies investigating tumor-immune interaction in cancer patients have shown that the circulating-TCR repertoire captures aspects of tumor-TCR repertoire with prognostic potential (Figure 4). Additionally, the immune repertoire data is being used to distinguish viral-driven cancers from non-viral ones, for precise tracking of vaccine-responsive T cell clones to enable more effective vaccine development. The diversity in the length of the CDR3 sequences has been linked to the T cell differentiation state – with longer CDR3 sequences enriched in antigen-naïve T cells than effector T cells.
Despite variations in the clonotypic diversity between individuals, there are instances where many individuals share the same clonotypes referred to as shared “public” clonotypes (Figure 5). Given that these individuals also share a common disease suggest that the shared clonotypes may be directed towards a common disease-specific antigen.
Tools/resources for repertoire analysis
Given the complexity of immune repertoire data, there is a need to assimilate the right tools and algorithms to estimate both the amount and diversity of unique T cell clones that characterize the T cell repertoire of any individual. Currently, for TCR sequencing of samples, MedGenome offers NGS-based solutions using SMARTer® TCR Profiling Kit (Takara Bio USA Inc) and Single-cell V(D)J Immune Profiling solution (10X™ Genomics Inc.). Data generated using these kits are currently being analysed using CellRanger, MiXCR, and VDJtools (Reviewed in Ref 3). However, improved tools for accurately predicting the binding of TCR sequences with their cognate peptide-MHC complex out of a pool of non-binding TCRs are important areas of research. MedGenome has created additional software to integrate and work on top of these existing software solutions. Very similar to the genomic data explosion, we are now seeing a rapid accumulation of immune repertoire data in public repositories. This growing body of immune receptor data has tremendous utility in analyzing, annotating and interpreting the TCR and BCR sequence data. The Adaptive Immune Receptor Repertoire sequencing (AIRRseq) federated databases and repositories have created standardized representations of immune repertoire data to facilitate cross-dataset analysis and promote the reusability of AIRRseq data (Ref 4). The AIRR community, formed in September 2014, initiated the iReceptor resource to provide a unified gateway (http://ireceptor.irmacs.sfu.ca/) to query and access the AIRRseq TCR and IG data from different repositories (Figure 6). Since its inception, AIRRseq data has been growing at an exponential scale, currently providing access to 1.3 billion sequences and 879 samples. Several computational and statistical analysis methods are being developed to resolve the complexity and deconvolute the dynamics of adaptive immunity from these large-scale AIRRseq data. MedGenome is part of an International consortium group which has been awarded a European/Canadian project grant to develop the next generation of the iReceptor platform referred to as the iReceptor-plus.
The scientific landscape is seeing an amalgamation of hypothesis-driven science and data-driven science that will have important ramifications for developing future therapeutics. The promising field of immune receptor repertoire is presented with new scientific and analytical challenges where currently no scalable solutions exist. With the exponential increase in genomic and transcriptomic data (both bulk and single cell) in addition to the rapidly accumulating immune receptor data, a scalable solution is expected with new developments in the areas of data aggregation, database management, cloud computing technologies and workflows for data integration along with scalability of computational tools for analysis. Although clear opportunities exist in analysing bigger volumes of data, it is important not to lose sight of the underlying biology. Dr. Sydney Brenner, a Nobel laureate in molecular biology commented, “There is a crisis these days. We are drowning in data and are still thirsty for more.” He said, “If we do not clearly define the problem, we won’t know what information is important.”
- Overview of methodologies for T-cell receptor repertoire analysis. Rosati et al. BMC Biotechnology (2017); 17:61
- Single cell T cell receptor sequencing: Techniques and future challenges. Simone et al. Front Immunol. (2018); 9:1638
- Computational Strategies for dissecting the high-dimensional complexity of adaptive immune repertoires. Miho et al. Front Immunol. (2018); 9:244
- AIRR Community Standardized Representations for Annotated Immune Repertoires. Heiden et al. Front Immunol. (2018); 9:2206
To know more about MedGenome’s unique Immunerepertoire Sequencing Solutions dowload the white paper here