Sunday, December 14, 2025
The Israel Chronicle News
  • Home
  • Israel
  • Global
  • Political
  • Defense
  • Business
  • Health
  • Sports
  • Tech
  • Entertainment
  • Lifestyle
No Result
View All Result
The Israel Chronicle News
  • Home
  • Israel
  • Global
  • Political
  • Defense
  • Business
  • Health
  • Sports
  • Tech
  • Entertainment
  • Lifestyle
No Result
View All Result
The Israel Chronicle News
No Result
View All Result
Home Health

How to Approach the Statistical De-Identification Process Effectively

News Desk by News Desk
September 26, 2025
in Health
Reading Time: 7 mins read
A A
0
How to Approach the Statistical De-Identification Process Effectively
Share on FacebookShare on Twitter

[ad_1]

Innovation in health care relies on the ability to figure out what the data is trying to teach us. Data analytics, including but not limited to GenAI powered data analytics, presents an insatiable demand for large, well-curated, searchable data sets. This is already a challenge — we have lots of data, but not a lot of good data. Exacerbating this challenge to data curation is often a legal, policy, ethical or business risk mandate that the curated data also be “de-identified.” For data sets that include Protected Health Information (PHI), rendering data de-identified must be done in accordance with one of two methods set forth in the HIPAA regulations. And consistently, the method that typically works for data analytics is the statistical method.

The statistical method is not new. And contrary to public myth, it is not considered “less compliant” than the alternative, the so-called safe harbor method. Initially, the Office of Civil Rights, which administers HIPAA, had proposed only including the statistical method. But the regulated community wanted an easy, rinse and repeat standard that would not require them to obtain statistical guidance in every case, which was seen as a severe transactional burden. The safe harbor method, which requires the removal of 18 enumerated fields, extends administrative ease to the regulated community, but comes with a heavy price. In many cases, the data remaining after redacting or obfuscating all of the data required under safe harbor de-identification is no longer fit for purpose.

Statistical de-identification is as much a tactical activity as a strategic one. There are several concrete steps the regulated community can take to get the most out of your statistical de-identification initiatives.

Motivation matters: Safe harbor and statistical de-identification present different strategic opportunities and compliance hurdles. Safe harbor de-identification enables a regulated party to have a relatively easy method of self-administering de-identification by the removal of 18 enumerated fields, provided none of those fields are necessary for the intended activity. It is robotic, but also inflexible. The statistical method, in contrast, is intended to provide flexibility by looking at the actual, measurable risks of re-identification presented by a range of factors, including the data but also the recipient, the other information available to the recipient and policy and contractual safeguards. It requires a governance program to make sure the parameters of the opinion are followed but in exchange nearly universally enables greater data to persist into the de-identified data set.

Involve counsel: If this is the first time you’re doing statistical de-identification or this statistical exercise is strategically or materially different from past opinions, the process will likely raise legal and compliance questions and legal advice will be important.

Think big first: The statistical exercise is a good opportunity to involve business stakeholders to understand short- and medium-term data plans. Start by thinking about (1) the maximum data that would be helpful to persist in the de-identified data set; (2) the potential recipients of the de-identified data set, and reasonable controls around their usage; and (3) the range of possible use cases and business priorities. Working with your expert, you may need to retreat from certain data fields or purposes, but by thinking broadly at the outset, you can work more effectively with your expert.

More than redaction: In setting the data dictionary element of the opinion, data redaction (the removal of certain fields) is the most obvious tool. Your statistician, however, can provide guidance with more nuance, both in terms of privacy protections and retaining data utility. For example, data randomization or data shifting, adding noise to make it harder to discern re-identifying patterns, including synthetic data, creating look-alike fields, and a range of other data obfuscation techniques can be explored. Cryptographic techniques for creating private IDs will need to be carefully applied to ensure private IDs are not practically reversible, including by choosing appropriate cryptographic keys. Data transformation techniques need to be fit for purpose — in some cases, certain data manipulations might mean that the data could not be used, for example, for certain FDA-regulated purposes. But this is part of the strategic discussion.

More than just tables: Statistical de-identification can be used to de-identify unstructured data, including text, clinical notes and medical images. Technology and capabilities evolve rapidly, and unstructured data has moved from niche and only selectively tractable to a scalable option in just a few years. When considering the maximum data in the de-identified dataset, it’s important to validate assumptions around what’s practically achievable to ensure options aren’t artificially restricted. 

Be ready to horse trade: In many cases, a well-designed statistical opinion will present you with tradeoffs on available data fields or granularity. To illustrate with a simple example, ethnicity-related data fields may be allowed, but not in certain locations where they would be highly identifying due to the local population demographics. Instead of the opinion requiring the redaction of ethnicity or location in all cases, it can permit data fields under certain parameters but “grey out” the availability of the data fields in others. If you can implement the data architecture to do this, you create a menu of options for your business, allowing recipients to access certain data within a flexible framework.

Opinion as recipe: The data that will persist in the de-identified data set (usually called the data dictionary) is just one element in the overall opinion. The opinion will have several other ingredients — all of which matter, and you will need to comply with all of them for the opinion to be applicable. For example, the statisticians may consider the presence of certain contractual clauses or policies to be relevant to measuring risk. Or, the statistician may have taken into account the stated purpose of the de-identified data set. Just as a bread recipe wouldn’t make a loaf if you opted to forgo the yeast or ignore the water, you need to implement and comply with the opinion as a whole.

Build a statistical relationship: The initial lift for the opinion is the biggest. But the opinion will need to be renewed, typically every 18 months although time frames vary. And you may find that the assumptions in the opinion need to be reviewed or changed. If your statistical expert is a strong partner, they will help you grow and adapt the opinion in line with your strategic priorities, even between renewal periods.

Build a crosswalk: One of the insights embedded in the HIPAA de-identification standards is the need (under either method) to refresh de-identified data over time. Institutions can implement a linking code that enables them to de-identify new data as it comes in and associate it with individuals in the data set. Though not necessary for every purpose, longitudinal de-identified data sets are essential to many of the purposes described above. Tokenization and linkage technologies can also be applied to link between discrete datasets without sharing PHI or identifying elements, though it’s important to ensure the resulting linked dataset meets HIPAA de-identification standards.

Data puddle or data lake: In some cases, the data you need to de-identify is discrete and will be generated on a case-by-case basis applying the opinion’s parameters. In other cases, your business may present a range of future, unspecified and/or varied data use cases. In the latter case, you may want to develop a data lake—a large, curated, data set at rest that is available to provision smaller data cuts for particular projects. A well-designed opinion is equally applicable for the whole and subsets.

De-identification versus data aggregation: Data Aggregation is a term of art under HIPAA that involves the use of PHI from multiple covered entities for benchmarking and other joint activities. The regulated community often uses “de-identified” and “aggregated” interchangeably, but they are not. Make sure what you need is de-identified data for a particular project.

Invest in data tagging: Data tagging will enable your organization to have more dexterity in the data it deems available for de-identification and will provide granularity at the field level. It’s technical operational and administrative work that might not seem glamorous, but it’s an essential building block of lucrative data sets.

Role of AI: It’s impossible to say anything about a health care or data topic right now without talking about AI. So we’ll just say this: AI is a burden and a gift in de-identification. AI tools can help to de-identify unstructured data (notoriously difficult) and can accelerate de-identification tools and data set analysis. AI can also be used to double check statistical assumptions on residual risk. But AI tools can also potentially change the re-identification risk calculus if AI tools can interrogate data and identify patterns leveraged for re-identification in new ways.

As data demands grow, de-identification is an essential governance and strategic priority for stakeholders in the digital data economy. De-identification projects enable engineers, business leaders, compliance leaders and counsel to work together collaboratively and create a conversation around data governance that pays dividends beyond the data set itself.

Photo: Weiquan Lin, Getty Images

Jordan Collins is a results-oriented, strategic leader with over 20 years’ experience in analytic functions focused on enabling data-driven decisions at an enterprise level. He is currently the General Manager of Privacy Analytics, an IQVIA company. Privacy Analytics enables organizations to unleash the value of sensitive data for secondary purposes while managing privacy considerations. Jordan has a PhD in Philosophy from the University of Auckland, an MA in Applied Statistics from York University, an MSc in Pure Mathematics from McMaster University, and a BSc (Hon.) degree in Mathematics from Mount Allison University. Jordan has a strong analytics background, starting his career as a statistician. He has deep consulting experience with an entrepreneurial bent, having stood up his own statistical consulting practice focusing on statistical applications in healthcare as well as industrial process and business optimization. For the past 10 years he has applied these analytic skills to technical privacy challenges globally.

Jennifer Geetter is a partner in McDermott Will & Schulte’s DC office. With a practice focused primarily on the development, delivery and implementation of digital health solutions, data and research, Jennifer works closely with both adopters and developers to bring their innovative healthcare solutions to patients and providers. In order to design and deploy digital health technologies effectively, Jenn offers valuable guidance on key issues, like patient on-boarding, provider implementation, privacy and regulatory issues. She advises global life sciences, healthcare and informatics clients on legal issues attendant to digital health, biomedical innovation, research compliance, global privacy and data security laws, and financial relationship management.

This post appears through the MedCity Influencers program. Anyone can publish their perspective on business and innovation in healthcare on MedCity News through MedCity Influencers. Click here to find out how.

[ad_2]

Source link

Tags: ApproachDeIdentificationEffectivelyProcessStatistical
Previous Post

The Hidden Power of Tech-Enabled Healthcare Services: Fueling the Next Wave of Healthcare AI

Next Post

Netanyahu vows at UN to ‘finish the job’ in Gaza, after dozens of representatives walk out

Related Posts

Support your neighborhood scientist
Health

Support your neighborhood scientist

November 17, 2025
Why Shared Savings Still Isn’t a Viable Business Model for Hospitals
Health

Why Shared Savings Still Isn’t a Viable Business Model for Hospitals

November 17, 2025
Merck Makes Big Antiviral Move With $9B Deal to Land Cidara’s Late-Stage Drug for Flu Prevention
Health

Merck Makes Big Antiviral Move With $9B Deal to Land Cidara’s Late-Stage Drug for Flu Prevention

November 14, 2025
Bridging the Gap Between Data and Quality Improvement for Hospitals
Health

Bridging the Gap Between Data and Quality Improvement for Hospitals

November 14, 2025
How to Choose the Right Dental Supply Company for Your Practice
Health

How to Choose the Right Dental Supply Company for Your Practice

November 14, 2025
AI Companies That Invest in Better Data Pipelines are Winning Faster Regulatory Approvals — Here’s Why
Health

AI Companies That Invest in Better Data Pipelines are Winning Faster Regulatory Approvals — Here’s Why

November 14, 2025
Next Post
Netanyahu vows at UN to ‘finish the job’ in Gaza, after dozens of representatives walk out

Netanyahu vows at UN to ‘finish the job’ in Gaza, after dozens of representatives walk out

By ruling out West Bank annexation, Trump called Netanyahu’s bluff

By ruling out West Bank annexation, Trump called Netanyahu's bluff

Hostage families protest Netanyahu’s UNGA address: ‘End this war, bring all 48 home’

Hostage families protest Netanyahu's UNGA address: 'End this war, bring all 48 home'

CATEGORIES

  • Business
  • Defense
  • Entertainment
  • Global News
  • Health
  • Human Rights
  • Israel News
  • Lifestyle
  • Political
  • Society
  • Sports
  • Technology
  • Uncategorized
No Result
View All Result

LATEST UPDATES

  • Comment Martine Kléber-Rossillon a plongé son propriétaire dans la ruine
  • Eddy Van Ryne: “Slovenia’s Emerging Voice for Peace: A New Moral Force at the UN Security Council”
  • Herzog to Adams: ‘You are a dear friend of Israel and the Jewish People’
  • Inflation cools to 2.2% as gas, grocery prices fall in October
  • About us
  • Disclaimer
  • Privacy Policy
  • DMCA
  • Cookie Privacy Policy
  • Terms and Conditions
  • Contact us

Copyright © 2024 The Israel Chronicle News.
The Israel Chronicle News is not responsible for the content of external sites.

No Result
View All Result
  • Home
  • Israel
  • Global
  • Political
  • Defense
  • Business
  • Health
  • Sports
  • Tech
  • Entertainment
  • Lifestyle

Copyright © 2024 The Israel Chronicle News.
The Israel Chronicle News is not responsible for the content of external sites.