Keio University

The Range of Big Data

Writer Profile

  • Keisuke Kataoka

    School of Medicine Professor, Division of Hematology

    Specialization / Hematology, Cancer Genetics

    Keisuke Kataoka

    School of Medicine Professor, Division of Hematology

    Specialization / Hematology, Cancer Genetics

2021/05/28

In recent years, the volume of data in medical sciences and healthcare has been increasing rapidly. The term "big data" first appeared in medical sciences in the 2008 special feature "Science in the Petabyte Era" in the journal "Nature." Particularly in genomic medicine, with the spread of next-generation sequencers, more than an exabyte of data is generated annually, far surpassing the volume of data in astronomy, Twitter, and YouTube, which have traditionally handled big data.

Currently, while being involved in hematology clinical practice at Keio University, I am also affiliated with the National Cancer Center Research Centers and Institutes, working on genetic analysis research of cancer, primarily hematological malignancies. In genomic medicine, "cancer" has benefited the most from next-generation sequencers; many large-scale studies have identified various genetic abnormalities that act as cancer drivers and the molecular pathways where they accumulate. Furthermore, drugs targeting these abnormalities (molecular targeted drugs) have been developed in a short period, and there are several cases where they have actually led to improvements in patient prognosis.

While we have entered an era where society can enjoy the benefits of big data analysis in medical sciences and healthcare, the limits of the third AI boom, centered on machine learning and deep learning, are also becoming clear as it begins to pass. Fundamentally, big data analysis is retrospective observational research and is susceptible to various biases. Furthermore, the quality of individual data is a mixed bag, making selection and filtering crucial.

In actual analysis, the focus is on frequency and correlation analysis, and there are not many situations where causal relationships can be definitively stated. Therefore, the importance of conventional interventional studies through clinical trials and research into disease mechanisms remains unchanged, functioning complementarily with big data analysis.

In Japan, the importance of information science, including big data analysis, has long been emphasized in the fields of medical sciences and healthcare, but understanding of its essence is insufficient. Currently, large-scale national projects such as the Action Plan for Whole Genome Analysis are underway. To promote the utilization of this big data and maximize its effectiveness, it is vital to share the possibilities and limitations of big data—that is, its range.

*Affiliations and titles are as of the time of publication.