The GenomeIndia Project — India's reference dataset for genetics and genomics — could help improve how disease is diagnosed, predict one's response to a drug and kick-start precision medicine efforts in India, according to a research article.
Launched in 2020 by the department of biotechnology, GenomeIndia aims to build a database that captures the genetic diversity of India's population.
In the first phase, genomes of 10,000 individuals were sequenced — data of which was published in January 2025 — for future research.
The Genome India Project offers us 10,000 genomes that represent the 4 broad linguistic groups, and regions across India incl tribal populations. They are available at the Indian Biological Data Centre at RCB, Faridabad for academics to analyze and unveil mysteries held in them. pic.twitter.com/YjTE07nLNc
— CCMB (@ccmb_csir) January 12, 2025
Writing in , researchers from over 20 institutes, including those of the Centre for Scientific and Industrial Research (CSIR) and All India Institute of Medical Sciences (AIIMS), have published the preliminary findings of the genomic sequencing.
Kumarasamy Thangaraj, CSIR Bhatnagar Fellow at the CSIR-Centre for Cellular and Molecular Biology, Hyderabad, told PTI, "The comment was written to announce to the scientific community that India has completed the whole genome sequencing of 10,000 individuals. We have carefully selected the 83 population groups across India representing different linguistic groups and geographical regions."
The preliminary findings of the GenomeIndia project, in which CCMB played a key role, is now out @NatureGenet: https://t.co/3exJHnHApe.
— CCMB (@ccmb_csir) April 8, 2025
This massive, multi-institute effort generated whole genome seq data of 10K healthy Indians & will pave way to new genomics research in India. pic.twitter.com/cg73tqrI3I
The team looked at four major linguistic groups of Indo-European, Dravidian, Austro-Asiatic and Tibeto-Burman. Within a broad geographic region, populations belonging to distinct bio-geographies were sampled.
Further, genomes of about 160 unrelated people from each non-tribal group and 75 from each tribal group were sequenced.
The researchers identified genetic variants, describing the extensive genetic diversity "hitherto uncaptured in the Indian population".
The work is a good beginning, the first of a population-scale sequencing ever undertaken in India, Thangaraj said, in response to a question about the small fraction of genomes sequenced, compared to the country's population.
Further, "the model that the GenomeIndia project has created for analysing genomes of Indians will be useful for future large-scale research projects in the country," the geneticist added.
GenomeIndia's efforts in building a dataset representative of Indians is similar to those in the UK (UK Biobank) and Europe, with an objective of creating a standard 'Indian reference genome'.
"Yes, (the dataset) is of a similar kind, but what is interesting is that being one of the oldest populations, next to the Africans, Indians are unique in their origin and hence, genetic profiles," Thangaraj said.
An individual's genome can be compared against the standard to identify differences and 'gene variants', or changes to one's DNA sequence. The principle forms the basis of genome-wide association studies, which help understand the genetic basis of a disease or trait in a population and how to prevent and treat it.
"Some of these variants might be associated with disease, while others would provide information about how a drug is metabolised in an individual's body. Therefore, when we complete the ongoing in-depth analysis of the sequenced genomes, we'll get information about all these aspects," Thangaraj said.
Findings of the ongoing analysis could be expected to be published in a peer-reviewed journal by the end of this year, he added.
The comment says, "In-depth analysis of 9,772 diverse genomes along with the blood biochemistry and anthropometry data will improve disease diagnostics, predict the genetic basis of drug responses, and kick-start precision medicine efforts in India."
The "sampling strategy of GenomeIndia is extensive, nuanced and balanced, with respect to ethnic, socio-cultural, geographic, biogeographic and linguistic diversity of India," compared to previous studies, the authors said.
GenomeIndia is also expected to "facilitate future large-scale genetic association studies in the country".
A need for a dataset representative of the Indian population has been felt for decades, as research from around the world is largely based on European populations, introducing a heavy bias in study results and findings.
However, now, costs of sequencing a genome, advances in India's technical and technological capacity and government support are some of the factors that have come together, aiding in the development of this nationally representative genetic database, Thangaraj said.
You may also like
DBT scheme enhanced transparency, curbed leakages: PM Modi
Florida State University shooting: Classes to resume, in-person attendance not mandatory
Congress has criticised judiciary multiple times: Assam CM on BJP MP's remarks
Weather maps show UK to bask in 96-hour 'mini heatwave' with 23C - full list of cities hit
Requested Bengal Guv to ensure safety of people in Murshidabad: NCW Chief (Lead)