A Slow-Starting Relationship Heats Up
Biomedical researchers are experts at studying high-volume data sets to describe complex molecules, disease processes and genetic sequencing. But as a group, they’ve been slow to adopt Big Data analytics tools and procedures. But that’s changing quickly. A few early successes are encouraging signs that Big Data analytics is making headway.
Lots of Data and Lots to Do
When biomedical researchers read the term Big Data, what comes to mind? Opportunities and challenges that all biomedical researchers face when they access, manage, analyze, and integrate datasets of diverse medical information such as:
- Imaging data
Structure and function of complex molecules (e.g. DNA, proteins) - Disease exposure and other health-related data
- Health-related patient behavior
- Imaging data
The problem is, the data sets are larger, more diverse, and more complex than ever. And they exceed the abilities of most currently used data management and analytics approaches.
But really, how big is biomedical Big Data?
Breakthroughs in genomics and computing over the past several years have multiplied the volume of available data by hundreds or even thousands of times. Large datasets from a number of research platforms are typically more than 10 terabytes and range to several petabytes. In front-line research, a dataset of 30 billion clinical elements is not unusual.
This deluge of genomics-related data is imminent. To make Big Data analytics practical in biomedical research, scientists face two challenges. First, they must shrink the amount of time an analysis takes from months to hours. And, they must learn to manage and organize that scale of information in ways that make high-volume, high-speed analysis possible.
Analytics Technologies Advance Biomedical Research Capabilities
Can data scientists glean insights from vast quantities of data and push it into the hands of healthcare professionals? Not yet. But pioneering biomedical investigations involve Big Data in these areas:
- Genomics (genotyping, gene expression, and next-generation DNA sequencing)
- Crowdsourcing biomedical R&D funding
- Analyzing data generated by biodots and other medical sensors
- Human microbiome research, the genetics and behavior of microorganisms in the human body
How do pioneering researchers use data analytics? They create and query computational models of diseases, looking for cause-and-effect relationships of disease symptoms and biochemical reactions in the body. Other research makes analytical systems smarter to discover which types of treatments work best on specific diseases.
Pioneering research organizations do their work with a basic toolkit of:
- Commodity hardware
- Open source software (notably Hadoop)
- Ubiquitous instrumentation
- Machine learning and natural language processing capabilities
But these technologies have only started to make a mark in actual research studies.
Enabling Technologies Still Underused
Big Data analytics in biomedicine lags its use in finance and commerce. Reviews of biomedical research methods show that few researchers take advantage of Hadoop, parallelized computing and other commercial methods of handling large data sets.
Many bioinformatics researchers still spend lots of time structuring and organizing their data, before they can harvest scientific insights. That can be a serious obstacle, because traditional relational data warehousing technology can’t efficiently handle the tens of billions of elements in biomedical datasets.
But things are changing rapidly.
The most successful approaches analyze many types of structured and unstructured data and scale up to interpret massive amounts of data. Modern healthcare analytics solutions use column store, MapReduce and similar types of architectures to analyze massive datasets. For example, scientists at Kaiser Permanente used Hadoop to create Archimedes, a Big Data analytics framework, which later became commercial disease modeling software.