How to Achieve Accurate Identity Matching without the Sweat Equity
If you’ve ever renovated a house or bought a “fixer-upper,” you know all about sweat equity—it’s the painstaking investment of time and labor that goes into the project. What you might not know is when you buy a modern identity matching solution—whether a Data Quality (DQ), Master Data Management (MDM), or Master Patient Index (MPI) tool—you are signing up for more sweat equity than you realize.
I’m not talking about the time and money to implement and configure the software itself—you no doubt accounted for that already. I’m talking about the additional time and money you haven’t accounted for in order to realize the promised benefit of highly accurate matched and deduplicated records.
What you probably don’t realize is that DQ, MDM, and MPI solutions only automatically match approximately 70% of records that could be matched. The other 30% fall into a queue of potential matches—records that are not similar enough to be automatically matched, but not different enough to be automatically considered non-matches.
This leaves you with three options:
- Tweak your matching tool’s settings so that more than 70% of matches are automatically made. BUT, these looser settings will generate false positives—matches that are made despite the two records referring to different people.
- Ignore the 30% of matches that should be made and treat them all as non-matches. BUT, this excessive number of “false negatives” leaves your systems riddled with duplicates.
- Commit to sweat equity. Muster together a data stewardship team that will go through the queue of potential matches one by one, potentially taking years and costing many hundreds of thousands of dollars in operational expenses. For example, faced with 500,000 matchable record pairs, a typical MDM implementation will automatically find 350,000—leaving the other 150,000 in a queue of potential matches that will take 4 full time employees 2 years to resolve.
So while many DQ, MDM, and MPI solutions claim much higher match accuracy rates, their dirty little secret is that these higher rates require you to be ok with (1) having a lot of false positives and mushing together records about different people, (2) having a lot of false negatives and creating duplicate records of the same person, or (3) getting nice and sweaty for a year or more.
Why do current matching solutions require so much sweat equity?
Current state-of-the-art matching solutions use deterministic or probabilistic algorithms to match two customer or patient records together. These algorithms match two records by comparing them directly to each other. If the identity data between the two records is close, then a match is made. This identity data includes attributes like name, address, birthdate, and social security number.
But identity data is always changing and is notoriously rife with errors. In fact, 30-40% of identity data in any given database is out-of-date, incorrect, or incomplete. Names change as people get married; addresses and phone numbers change as people move; even social security numbers change when people recover from identity theft. And manual data entry often causes identity data to be incomplete and to contain missing letters, inverted names, and transposed numbers.
This means that the identity data in two records that should match to the same person is often very different. In one record, the address might be old; in another, the last name might be misspelled and the SSN might be missing. Both records refer to the same person, but the task of matching those records—automatically and with a high degree of confidence—just got a lot harder for an algorithm.
A better way to match that doesn’t require the sweat equity
Thankfully, there is a better way to match records. Verato has created an innovative referential matching engine that can automatically match up to 98% of customer, patient, or member records—without any sweat equity.
Rather than comparing two customer records directly to each other, Verato matches each record to an identity in its proprietary reference database, CARBON™. If both records match to the same CARBON identity, then they match to each other. Because the identity data in CARBON spans over 30 years and includes old, erroneous, and incorrect data as well as new, clean, and accurate data, the Verato referential matching engine can make a match even when the identity data in two records is sparse, is out-of-date, and contains errors.
Traditional matching solutions may quote impressive statistics about match rates and numbers of false positives and false negatives, but these statistics hinge on the assumption that you will invest the time and money to manually review a large portion of your data. (I’ll discuss more of the math behind these “impressive” statistics in my next blog post.) On the other hand, the Verato approach uses a fast and easy cloud-based platform to automatically find 98% of the actual matching duplicate records within a dataset.
To read about the Verato referential matching engine in action, check out our success story with San Diego Health Connect, a leading Health Information Exchange. We were able to automatically decrease their sweat equity by 75%, as well as increase the total number of matches in their MPI by 110%.