Project seeks to protect patient privacy in health data analysis

Eric Ragan

A team of Texas A&M scholars is investigating how health science researchers can accurately analyze troves of available patient data from various sources while maintaining patient privacy.

The three-year, multidisciplinary study is supported by a $1 million grant from the [Patient-Center Outcomes Research Institute] ( , a group that produces and promotes evidence-based health information.

In health science studies, researchers often use large datasets gathered from multiple, so-called secondary sources, such as hospital discharges and insurance records, then employ software that identifies data from those sources that refers to an individual patient, while at the same time maintaining the patients’ confidentiality.

But data from these disparate sources can be unwieldy.

“In the data, there are typos and different kinds of abbreviations and formatting,” said Eric Ragan, assistant professor of [visualization] ( , who is part of the study.

In many of these cases, he said, a computer can look at the data and determine if entries from various sources are actually referring to the same patient. But automated methods aren't perfect.

“For the especially difficult cases, a person needs to look at the data and decide whether or not to include it in the study,” said Ragan. “But since health science researchers work with such large datasets, there's a lot of these difficult cases. Random researchers or their assistants can’t have access to everyone's personal and confidential information, but the research cannot happen unless these decisions are made accurately.”

In their project, Ragan said he and his fellow researchers are studying what types of patient data need to be made visible and how much can be hidden without affecting decision accuracy.

The team is testing an interface enhanced to highlight data inconsistencies, comparing the decision-making performance, speed and quality of novices and experts.

“Ideally, we’ll come up with a technique that will allow novices to make decisions just as well as experts,” said Ragan. “We hope we can do this and minimize the amount of data details people see.”

Richard Nira

posted June 26, 2017