In records management, one of the recurring challenges we face is when clients can’t locate their records in our system, even though the records exist, simply because the names used in their database don’t match ours exactly. It might be a small typo, a shortened name, or inconsistent formatting, but the result is the same: missed records, time-consuming manual checks, and frustration on both sides.

As I previously worked on a research project focused on detecting AI-generated content, where I used Python extensively for text analysis. That experience showed me just how powerful Python can be when it comes to identifying subtle patterns and inconsistencies in large sets of data. That’s when I realised Python could help us here too.

I worked with my manager to develop a process that compares names from both datasets, not to determine which one is right, but to measure how similar they are. This gives us a clear way to flag discrepancies and decide which records might need further checking. It's now part of our daily QA routine, helping us catch inconsistencies before data goes into our system and ultimately improving the quality and reliability of our records.

Metadata Quality Assurance Using Python | Presented By Van Hieu Tran

Speakers

Van Hieu Tran