I am a PhD candidate in the Bioinformatics and Integrative Genomics PhD Program, Harvard Medical School. Co-advised by Drs. Mark Daly and Hilary Finucane, I am working at the Analytic and Translational Genetics Unit, Massachusetts General Hospital and affiliated with the Medical and Population Genetics Program, Broad Institute of MIT and Harvard. I received fellowships from the Nakajima Foundation and the Masason Foundation.
As a statistical geneticist, my research focuses on cross-population analysis of complex diseases and traits to better understand their genetic architecture and diversity across multiple populations. In my thesis project, I create a comprehensive atlas of putative causal variants from multiple large-scale biobanks to gain insights from complex trait fine-mapping across diverse populations. Prior to joining ATGU, I completed B.S. degree in Japan, where I worked closely with Dr. Yukinori Okada to study genetics of complex traits in the Japanese population using the BioBank Japan data.
My curriculum vitae can be found
|Jul 8, 2021||Added The COVID-19 Host Genetics Initiative (2021).|
|Jul 1, 2021||New website|
Selected Publications and Preprints [full list]
* denotes equal contribution
The COVID-19 Host Genetics Initiative.Nature (2021)
The genetic makeup of an individual contributes to susceptibility and response to viral infection. While environmental, clinical and social factors play a role in exposure to SARS-CoV-2 and COVID-19 disease severity1,2, host genetics may also be important. Identifying host-specific genetic factors may reveal biological mechanisms of therapeutic relevance and clarify causal relationships of modifiable environmental risk factors for SARS-CoV-2 infection and outcomes. We formed a global network of researchers to investigate the role of human genetics in SARS-CoV-2 infection and COVID-19 severity. We describe the results of three genome-wide association meta-analyses comprised of up to 49,562 COVID-19 patients from 46 studies across 19 countries. We reported 13 genome-wide significant loci that are associated with SARS-CoV-2 infection or severe manifestations of COVID-19. Several of these loci correspond to previously documented associations to lung or autoimmune and inflammatory diseases3–7. They also represent potentially actionable mechanisms in response to infection. Mendelian Randomization analyses support a causal role for smoking and body mass index for severe COVID-19 although not for type II diabetes. The identification of novel host genetic factors associated with COVID-19, with unprecedented speed, was made possible by the community of human genetic researchers coming together to prioritize sharing of data, results, resources and analytical frameworks. This working model of international collaboration underscores what is possible for future genetic discoveries in emerging pandemics, or indeed for any complex human disease.
Leveraging fine-mapping and non-European training data to improve trans-ethnic polygenic risk scores*Weissbrod, O., *Kanai, M., *Shi, H., ..., Gazal, S., Peyrot, W., Khera, A., Okada, Y., The Biobank Japan Project., Martin, A., Finucane, H., and Price, A. L. [Show fewer authors]medRxiv (2021)
Polygenic risk scores (PRS) based on European training data suffer reduced accuracy in non-European target populations, exacerbating health disparities. This loss of accuracy predominantly stems from LD differences, MAF differences (including population-specific SNPs), and/or causal effect size differences. Here, we propose PolyPred, a method that improves trans-ethnic polygenic prediction by combining two complementary predictors: a new predictor that leverages functionally informed fine-mapping to estimate causal effects (instead of tagging effects), addressing LD differences; and BOLT-LMM, a published predictor. In the special case where a large training sample is available in the non-European target population (or a closely related population), we propose PolyPred+, which further incorporates the non-European training data, addressing MAF differences and causal effect size differences. We applied PolyPred to 49 diseases and complex traits in 4 UK Biobank populations using UK Biobank British training data (averageN=325K), and observed statistically significant average relative improvements in prediction accuracy vs. BOLT-LMM ranging from +7% in South Asians to +32% in Africans (and vs. LD-pruning + P-value thresholding (P+T) ranging from +77% to +164%), consistent with simulations. We applied PolyPred+ to 23 diseases and complex traits in UK Biobank East Asians using both UK Biobank British (averageN=325K) and Biobank Japan (averageN=124K) training data, and observed statistically significant average relative improvements in prediction accuracy of +24% vs. BOLT-LMM and +12% vs. PolyPred. In conclusion, PolyPred and PolyPred+ improve trans-ethnic polygenic prediction accuracy, ameliorating health disparities.
*Sakaue, S., *Kanai, M., Tanigawa, Y., ..., Karjalainen, J., Kurki, M., Koshiba, S., Narita, A., Konuma, T., Yamamoto, K., Akiyama, M., Ishigaki, K., Suzuki, A., Suzuki, K., Obara, W., Yamaji, K., Takahashi, K., Asai, S., Takahashi, Y., Suzuki, T., Sinozaki, N., Yamaguchi, H., Minami, S., Murayama, S., Yoshimori, K., Nagayama, S., Obata, D., Higashiyama, M., Masumoto, A., Koretsune, Y., FinnGen., Ito, K., Terao, C., Yamauchi, T., Komuro, I., Kadowaki, T., Tamiya, G., Yamamoto, M., Nakamura, Y., Kubo, M., Murakami, Y., Yamamoto, K., Kamatani, Y., Palotie, A., Rivas, M. A., Daly, M., Matsuda, K., and Okada, Y. [Show fewer authors]medRxiv (2020)
The current genome-wide association studies (GWASs) do not yet capture sufficient diversity in terms of populations and scope of phenotypes. To address an essential need to expand an atlas of genetic associations in non-European populations, we conducted 220 deep-phenotype GWASs (disease endpoints, biomarkers, and medication usage) in BioBank Japan ( n = 179,000), by incorporating past medical history and text-mining results of electronic medical records. Meta-analyses with the harmonized phenotypes in the UK Biobank and FinnGen ( n total = 628,000) identified over 4,000 novel loci, which substantially deepened the resolution of the genomic map of human traits, benefited from East Asian endemic diseases and East Asian specific variants. This atlas elucidated the globally shared landscape of pleiotropy as represented by the MHC locus, where we conducted fine-mapping by HLA imputation. Finally, to intensify the value of deep-phenotype GWASs, we performed statistical decomposition of matrices of phenome-wide summary statistics, and identified the latent genetic components, which pinpointed the responsible variants and shared biological mechanisms underlying current disease classifications across populations. The decomposed components enabled genetically informed subtyping of similar diseases (e.g., allergic diseases). Our study suggests a potential avenue for hypothesis-free re-investigation of human disease classifications through genetics.
Trans-biobank analysis with 676,000 individuals elucidates the association of polygenic risk scores of complex traits with human lifespan*Sakaue, S., *Kanai, M., Karjalainen, J., ..., Akiyama, M., Kurki, M., Matoba, N., Takahashi, A., Hirata, M., Kubo, M., Matsuda, K., Murakami, Y., Daly, M. J., Kamatani, Y., and Okada, Y. [Show fewer authors]Nature Medicine 26, 542–548 (2020)
While polygenic risk scores (PRSs) are poised to be translated into clinical practice through prediction of inborn health risks1, a strategy to utilize genetics to prioritize modifiable risk factors driving heath outcome is warranted2. To this end, we investigated the association of the genetic susceptibility to complex traits with human lifespan in collaboration with three worldwide biobanks (ntotal = 675,898; BioBank Japan (n = 179,066), UK Biobank (n = 361,194) and FinnGen (n = 135,638)). In contrast to observational studies, in which discerning the cause-and-effect can be difficult, PRSs could help to identify the driver biomarkers affecting human lifespan. A high systolic blood pressure PRS was trans-ethnically associated with a shorter lifespan (hazard ratio = 1.03[1.02-1.04], Pmeta = 3.9 \times 10-13) and parental lifespan (hazard ratio = 1.06[1.06-1.07], P = 2.0 \times 10-86). The obesity PRS showed distinct effects on lifespan in Japanese and European individuals (Pheterogeneity = 9.5 \times 10-8 for BMI). The causal effect of blood pressure and obesity on lifespan was further supported by Mendelian randomization studies. Beyond genotype-phenotype associations, our trans-biobank study offers a new value of PRSs in prioritization of risk factors that could be potential targets of medical treatment to improve population health.
Martin, A. R., Kanai, M., Kamatani, Y., ..., Okada, Y., Neale, B. M., and Daly, M. J. [Show fewer authors]Nature Genetics 51, 584–591 (2019)
Polygenic risk scores (PRS) are poised to improve biomedical outcomes via precision medicine. However, the major ethical and scientific challenge surrounding clinical implementation of PRS is that those available today are several times more accurate in individuals of European ancestry than other ancestries. This disparity is an inescapable consequence of Eurocentric biases in genome-wide association studies, thus highlighting that-unlike clinical biomarkers and prescription drugs, which may individually work better in some populations but do not ubiquitously perform far better in European populations-clinical uses of PRS today would systematically afford greater improvement for European-descent populations. Early diversifying efforts show promise in leveling this vast imbalance, even when non-European sample sizes are considerably smaller than the largest studies to date. To realize the full and equitable potential of PRS, greater diversity must be prioritized in genetic studies, and summary statistics must be publically disseminated to ensure that health disparities are not increased for those individuals already most underserved.
Genetic analysis of quantitative traits in the Japanese population links cell types to complex human diseasesKanai, M., Akiyama, M., Takahashi, A., ..., Matoba, N., Momozawa, Y., Ikeda, M., Iwata, N., Ikegawa, S., Hirata, M., Matsuda, K., Kubo, M., Okada, Y., and Kamatani, Y. [Show fewer authors]Nature Genetics 50, 390–400 (2018)
Clinical measurements can be viewed as useful intermediate phenotypes to promote understanding of complex human diseases. To acquire comprehensive insights into the underlying genetics, here we conducted a genome-wide association study (GWAS) of 58 quantitative traits in 162,255 Japanese individuals. Overall, we identified 1,407 trait-associated loci (P < 5.0 \times 10-8), 679 of which were novel. By incorporating 32 additional GWAS results for complex diseases and traits in Japanese individuals, we further highlighted pleiotropy, genetic correlations, and cell-type specificity across quantitative traits and diseases, which substantially expands the current understanding of the associated genetics and biology. This study identified both shared polygenic effects and cell-type specificity, represented by the genetic links among clinical measurements, complex diseases, and relevant cell types. Our findings demonstrate that even without prior biological knowledge of cross-phenotype relationships, genetics corresponding to clinical measurements successfully recapture those measurements’ relevance to diseases, and thus can contribute to the elucidation of unknown etiology and pathogenesis.