the Data Biosphere

creating a vibrant ecosystem of interoperable modules and data environments for the biomedical community

About

 

Data Environments

 

Modular Components

Description

About

Problems the Data Biosphere is trying to solve Modern technologies produce enormous datasets in the life sciences which researchers struggle to manage. Genomics dataset sizes can easily be in the terabyte or even petabyte range, making it difficult to extract meaning for all but a few. The obvious solution is cloud-based data storage and computation designed to make biomedical research more accessible. But, how should these systems be designed? In our blog post we explore the motivations for creating the Data Biosphere organization and the approaches we think are key to its missions.

The principles of the Data Biosphere We created the Data Biosphere with the idea of cultivating a vibrant ecosystem of modular and interoperable components that can be assembled into diverse data environments. The Data Biosphere is based on four governing principles: it should be: 1. modular, composed of functional components with well-specified interfaces; 2. community-driven, created by many groups to foster a diversity of ideas; 3. open, developed under open-source licenses that enable extensibility and reuse, with users able to add custom, proprietary modules as needed; and 4. standards-based, consistent with standards developed by coalitions such as the Global Alliance for Genomics and Health (GA4GH).

Modules and Data Environments in the Data Biosphere The Data Biosphere will work best if it is designed around key modular components — each having discrete capabilities and clear rules of interaction, and each served by multiple alternative implementations. In the Data Biosphere, Modules Components are the critical building blocks, but they are not alone enough to meet the needs of the biomedical community. In addition we need Data Environments, or suites of services assembled from the Modular Components, that allow users to carry out a set of related tasks in an easy way. We envision multiple Data Environments, targeting a diversity of communities, use cases, and data types.

The state of the Data Biosphere The biomedical community has already begun the work of creating the Data Biosphere. We, along with many others, are actively contributing to the GA4GH definitions of standard interfaces and are currently developing Modular Components and data environments as part of the Data Biosphere effort. We share many of our reusable modules on our GitHub organization and full Data Environments are already operational and used by many researchers each and every day. We invite further contributions by the community and encourage you to contact us to learn more.

Data Environments

use ready-to-go Data Biosphere systems

Gen3

Use Gen3 to build your own Data Environment

Terra

Use Terra to find, analyze, and explore data

Dockstore

Create, share, and use bioinformatics workflows

Modular Components

The building blocks of the Data Biosphere, explore our GitHub organization to see modules shared with the communmity.

Data Biosphere on GitHub

Getting Involved

Contact us to learn more