Big Data to Knowledge (BD2K)


Massive data collections derived from millions of daily interactions within the health care system are increasingly available, but we urgently need more advanced tools to support research that can produce better personalized predictions about prognosis and response to treatments; a deeper understanding of the complex factors and their interactions that influence health at the level of the patient, the health system, and society; and more effective methods of causal inference, mitigating bias and error. We also need to find new ways to think about data, while promoting its value for individuals and organizations in the service of promoting better decisions and outcomes. Accordingly, we have assembled a remarkable team of world-class experts with complementary skills who are dedicated to deep collaboration in order to produce high impact methods and tools, primed for widespread use, that will unleash the information potential of data produced as part of routine health care delivery.

Project Objectives

The Yale Big Data to Knowledge (Yale-BD2K) project aims to develop and deliver novel methodological approaches and tools that will advance our ability to generate meaningful knowledge from large, complex health care data collections. Conventional methods used in the fields of epidemiology, health services research, and other clinical research are decades old and often do not scale well to complex, multidimensional data collections.

To overcome these limitations, we have assembled a team of international leaders in outcomes research, mathematical and modeling sciences, informatics, computer science, engineering, software development, economics, and implementation science that will work together to integrate and apply methods from their respective fields towards complex questions utilizing big healthcare data.  We are committed to interdisciplinary approaches and have developed approaches that bring individuals together to work on common problems. We have included the analytic, translational, and clinical skills necessary to cover the breadth of expertise necessary for transformational progress.

We work on specific classes of research questions, or use cases, focusing the entire team in a joint effort to produce new methods. Examples of use cases include:

  • Health risk determination
  • Comparative effectiveness research (causal inference)
  • Surveillance
  • Discovery (hypothesis generation).
With the development of successful approaches into generalizable tools, we are also committed to disseminating these tools, training researchers in their use, and collaborating externally to ensure that our work makes a difference for patients and society.