It is the responsibility of researchers to tell “the truest truth,” says Daniel Boffa, MD, professor of surgery (thoracic). The accuracy and precision of medical research depend on the data used.
The medical community has increasingly recognized the importance of data sharing. The 21st Century Cures Act, passed in 2016 to accelerate medical discovery, encourages scientists to share data more openly so that other investigators may build upon it. More recently, editors of JAMA announced that researchers must include a data sharing plan in their manuscripts. The compilation of greater data sets allows investigators to generate a more complete picture of what is happening in patients. Furthermore, it helps them understand differences among various groups of patients, which will ultimately lead to providing more personalized medicine. However, data sharing also comes with the risk of loss of patient privacy, Boffa warns in a commentary published April 18 in JAMA.
“We’ve never had a better opportunity to leverage patient information to make powerful changes,” says Boffa. “But it has to be done in a way in which patient privacy is secure—which is challenging, but possible.”
Data Sharing Projects Present Patient Privacy Risks
Boffa, a thoracic surgeon who specializes in cancer, is engaged in an initiative to compile all cancer data for patients in the United States through the National Cancer Database.
When a patient is diagnosed with cancer, the hospital generates a record with patient data. However, patients commonly receive care from more than one institution, resulting in a single patient’s data becoming scattered among various locations.
Then, hospitals will share their records to various cancer databases without the identifying information. Because the data is anonymous, researchers are left with an incomplete picture—data collected for one patient related to testing, treatment, cancer stage, or patient attributes in one database is often missed by another. “All of these databases have unique, incompletely overlapping pictures of each cancer patient,” says Boffa.
To address this, he and his colleagues are trying to create a national cancer identifier. “We basically take the identifying information and use advanced cryptography to turn it into an encrypted identifier that cannot be reversed to reveal the patient’s identify,” he says. This new identifier is like a tag that can be used to tie all a patient’s data together in the national database. This new tool, he says, will be “incredibly powerful.”
“When you have the data for every single cancer patient at your fingertips, the number of discoveries we will be able to make will be mind-blowing,” he says. “You may one day be able to use artificial intelligence to ask and answer cancer questions within a massive pool of patient information, similar to how platforms like ChatGPT use internet data.”
This task, however, presents a significant challenge: protecting patient privacy. Because of advances in computer technology, the theoretical risk of reidentification of anonymous data is very high. “Anonymous data is not private,” says Boffa. “If you put all of this information together, even if no name is included, a patient can still be identified.” In collaboration with Yale computer scientists, Boffa’s team has poured massive time and energy into ensuring their project protects patient privacy.
Boffa is excited about new data sharing policies such as JAMA’s but is concerned that they come with little guidance for doing so in a safe and secure way.
Researchers Must Take Steps to Secure Patient Data
Making patient data anonymous is an important first step, says Boffa, but researchers also need to share data in a way that is trackable and accountable. In other words, they should know of everyone who has access to it and understand the security of the computing environment—such as whether the servers are secure and passwords are encrypted at the secondary institutions.
Furthermore, researchers should avoid downstream sharing and exchanging information in nonsecure ways, he says. This includes not emailing anonymous datasets or leaving them on unencrypted laptops. “I would treat anonymous data the same way I would treat data that has identifying information,” says Boffa. “It should be treated as sensitive and as potentially harmful as data that has a patient’s social security number.”
Although data sharing presents these complicated challenges, overcoming them will be critical for the future of personalized medicine. Right now, researchers are accomplishing so much with incomplete information, says Boffa. But conducting research at scale that includes many different variables will open the door to many more discoveries. “There are so many more knowable pieces to the puzzle now,” he says. “By tying all of this together, that is the most credible way of determine for every single patient, what is the best, safest, and most effective treatment for them.”
Yale Protocols Are a Model for Data Use
Other leaders at Yale are also dedicated to meeting these challenges and making data more accessible. A little over a decade ago, Harlan Krumholz, MD, Harold H. Hines, Jr. Professor of Medicine (Cardiology) and Joseph Ross, MD, professor of medicine (general medicine) and of public health (health policy and management), co-founded the Yale Open Data Access Project (YODA) with a goal to make data more widely available and to promote open science. “The data sits within a repository so that researchers can work on it in a private, safe space,” says Krumholz. “It has guardrails up so that it can be both high ethics and high science.”
As a result of the project, over 100 manuscripts have been published that would not have been possible without the sharing of data. “We’re leaving an era where most investigators had the perception that they had no ethical responsibility to ensure that the most that can come of it occurs,” says Krumholz. “We’re trying to promote this idea that ethically, for the money, time, and willingness of people to be part of studies, we ought to be working hard to figure out how we safely and securely leverage data that’s generated for the greatest amount of public good possible.”
In his JAMA commentary, Boffa commends YODA as “an accountable and transparent repository and distributor of clinical trials data” that successfully pursues the goal of sharing patient information in a more secure manner. He writes that YODA shows how sound techniques developed at the federal level can also be embraced by individual institutions and organizations.