By Ramesh Menon, Anjali Verma, Manjari Deshmukh, Akshi Bassi and Ravi Gupta (Bioinformatics R&D division, MedGenome Labs)
The GenomeAsia100K pilot project included 1,739 individuals of 219 population groups from 64 countries across Asia. The samples included 598 from India, 156 from Malaysia, 152 from South Korea, 113 from Pakistan, 100 from Mongolia, 70 from China, 70 from Papua New Guinea, 68 from Indonesia, 52 from the Philippines, 35 from Japan and 32 from Russia. The high-quality sequence data of Indian samples were generated from MedGenome’s sequencing lab located at Narayana Netralaya hospital in Bangalore, India.
The study has given insights into previously unknown population genetic structure, as well as implications of sub-population/community specific genetic variations in diseases as well as drug reactions. This study provides a useful genomic resource which will facilitating genetic studies in Asia including India. More than 20% of genetic variants identified in this study are not reported in previous studies like the Exome Aggregation Consortium (ExAC), 1000 Genomes project, gnomAD etc. In rare disease genetics databases like ExAC, gnomAD, 1000G, dbSNP are used to filter variants based on allele frequency. Since majority of the samples available in these databases are of European origin there are population specific variants present higher frequency which otherwise will be taken as rare variant. For example, when both the gnomAD and the data published in this study is used for filtering common variants (allele frequency > 0.1%), then we reduce the candidate variants roughly by two-fold as compared to when we use gnomAD alone. This study will improve the identification of pathogenic variant for the rare diseases more accurately as it will help in filtering variants for South Asian ancestry more accurately.
The complex history of Asian populations and population structure has also been reported in this study. This study shows that people from India, Malaysia and Indonesia consists of multiple ancestral populations as well as multiple admixed groups. The rate of recessive diseases has increased because of strong founder effects. Our study found that the indigenous and the tribal population groups have higher identity by descent (IBD) as compared to other groups. Further, we found that the urban population from Chennai (size of 9 million) has an IBD score which is 1.3 times higher than the Finnish group. This suggests that our population group from Southern part of India have higher founder effect and also carry a higher chance of having recessive disorders.
Variation in certain regions in the genome that are ancestry related sometimes have implications to drug responses. In several clinics globally, the recommendations for dosing of certain drugs are guided by apparent or self-reported population identity. In this study, we assessed the allele frequencies of key pharmacogenomic variants in the GenomeAsia pilot dataset to identify inter-population differences that have potential implications on drug testing and treatment. Interestingly, the study has identified drugs such as carbamezepine, clopidigrel, peginterferon and warfarin as the drugs with largest impact on genetic variation related to ethnicity and has predicted adverse drug responses in several population sub-groups. For example, a genetic variant in HLA-B gene is associated with risk for development of Steven Johnson syndrome in patients treated with carbamazepine was found to occur at an increased frequency in Austronesian group people (~400 million) from Indonesia, Malaysia and the Philippines. Also, the study assessed the allele frequencies of key pharmacogenomic variants in our dataset to identify inter-population differences that have potential implications on drug testing and treatment, and these novel findings can help the Pharma industry to reduce time and investment in their research while assessing the efficacy and toxicity of new drug development. The GenomeAsia has deeply catalogued population specific genetic variants in “very important pharmacogenes” (VIP genes) such as VKORC1, IFNL3, CYP2B6, CYP2D6 and CYP2C19, affecting dosage, efficacy and toxicity of associated FDA approved drugs.
Human genetic studies taking place across the world have minimum representation from Asian population groups. Most of these studies have been performed on people with European origin. Now, discoveries and genetic associations found from the European population is not necessarily can be translated to non-European population group. This limits the researchers in understanding human diseases accurately for the non-European population including those from Asia (which represent 60% of the global population). Recently, there has been slight improvement in non-European studies but still it remains highly underrepresented. This study has also published a imputation reference panel available at Michigan Imputation Server (MIS – https://imputationserver.sph.umich.edu/index.html). Our analysis revealed that our panel provides much superior imputation for South Asian ancestry as compared to the existing published reference panel. This will help the GWAS studies performed on the South Asian ancestry.
The GenomeAsia consortium is continuously collecting and analyzing several thousands of diverse genomes across Asia, which creates a unique platform for genetic studies, pharmacogenomic genomic research, which can pave way to the well-being of people in Asia. The pilot study genome browser is available freely and can accessed using the following link https://browser.genomeasia100k.org.
GenomeAsia100K Consortium (2019). The GenomeAsia 100K Project enables genetic discoveries across Asia. Nature
David Reich et. al. (2009). Reconstructing Indian population history. Nature Genetics
Analabha Basu et. al. (2016). Genomic reconstruction of the history of extant populations of India reveals five distinct ancestral components and a complex structure. PNAS.