Extracting Structured Information from Lab Reports: Challenges and Learnings

Extracting Structured Information from Lab Reports: Challenges and Learnings
Photo by Louis Reed / Unsplash

Medical records are fundamental to any healthcare ecosystem. They contain invaluable information when put together coherently can result in massive improvements in healthcare outcomes. In India, this wealth of information is either fading away on printed papers, or is locked away in archaic software systems of private hospitals, or is scattered across hundreds of email and WhatsApp messages.

Have you ever wondered, why is it that we can easily track 10 rupees paid to a chai shop 2 years ago, but find it nearly impossible to gather all the Thyroid reports done over the last few years? It can primarily be attributed to:

  • Non-adherence to storing health information based on coding standards such as SNOMED-CT, LOINC, and ICD10, which ensures a single source of truth to represent a medical concept
  • Non-adherence to wrapping health data in HL7 format, which allows data interoperability across health facilities
  • Absence of a unified interface like UPI in health care.

As a result, the end-user ends up with an unstructured, hard-copied version of their invaluable health data. Thankfully, recent efforts by NRCeS within the NDHM are addressing some of these challenges at a systemic level. If implemented as planned, such efforts will resolve concerns like data ownership, security, data interoperability, and consent to share.

Eka Vault

While systemic changes in India bear fruit over the years, EkaCare is empowering consumers and making it easier for them to tap the potential of their health records. Eka mobile app already provides a number of valuable features

  • Smart reports: Our AI system extracts medically meaningful data from your lab reports and plots the trend of your vitals to facilitate tracking over time.
  • Search: We enable search through the content of the medical documents, even when it is an image. With this feature, you can filter your health records for a specific hospital, lab vital, or illness.
  • Auto-tag: Our algorithms automatically generate tags based on the content of your reports to facilitate organizing the records.
  • Share: You can readily share your health records with the doctors on Eka.Care, or use any 3rd party app to share it with any of your contacts.
Smart reports and Search feature of Eka secured vault 

Structured content from lab reports

Smart report functionality in Eka.Care converts your lab report into a medically and semantically rich digital format which involves codifying each vital using a LOINC identifier. This feature allows consolidating all your vitals across reports over time into an intuitive graph that displays the trend of your vitals. In this section, we highlight some of the key challenges and learnings we have had in the process.

Isn’t extracting lab parameters from a report about performing Optical Character Recognition (OCR) on the images?

Well, while OCR is certainly a the first step, it doesn’t comprise even 5% of the task. Some of the technical difficulties and nuances are described in the following sections.

Variability across labs

OCR provides us with the textual content and its spatial location, but which of these elements represent lab tests, their values, units, and ranges needs to be figured out. We exploit both the textual content and its spatial location in our neural network-based machine learning models to perform this task. In this process, one of the biggest challenges is to handle different layouts of the reports across labs. Here are some examples:

Different layout formats across labs. We see that the type of columns and their order can vary. 

Variations in test names

It’s not just the variations in the structural layout, different labs also often use different local terms for specifying a test. This problem is further accentuated by the fact that OCR also produces errors. Given the diversity in names, spelling mistakes, and OCR errors, the identification of a lab test becomes a challenging task. Here are some examples.

Different surface representations of test names across labs

The real challenge here is to design an algorithm that can truly measure semantic similarity across two concepts. Shallow domain unaware fuzzy string matching algorithms would result in matching Vitamin D2 with Vitamin D3,  and T3 Free with T4 Free for instance, since it involves difference of just a character.

LOINC linking of the test names

This is the most critical and challenging step in converting your lab reports to graphs. Plotting test results on a graph would make sense only when all the data points are mapped to the same LOINC identifier. Let’s understand the LOINC linking step with the help of an example.

Let's say we encounter Creatinine as the test name, what more do we need to link this string to a LOINC identifier?  Creatinine can be measured from both serum and urine specimens. In order to correctly link and interpret the value of Creatinine one has to first identify its specimen from the report. Even within the urine, the sample could be a spot urine sample or a 24-hour sample. In addition, Creatinine value is reported in two different forms; moles/volume and mass/volume, which has to be inferred by looking at the units mg/dL or umol/L. Both these units might also have different surface level variations. LOINC identifiers also differ based on the method used for the test. Only when we correctly identify all the contextual information, we can successfully link it to a LOINC identifier.

Our system gathers this nuanced information by contextually parsing different components of your lab report such as panel names, specimens, method of the test, and so on.

Normalization of the test values

If 4 historical values of the platelet count are to be plotted in a graph, they need to be first converted to the same scale. Since there is no standardization of units in which test results can be reported, there exists a wide variation in both scale and surface representation of these units. For example, the platelet count can be reported in 10*3/uL or in 10*5/uL or in lakh/uL and many other variants. For creating graphs one has to understand these units and convert them to the same scale.

All historical values are normalized to the same unit

Lets jump to some action and see this feature working in the demo video below!

Vikalp Sahni (CEO Co-Founder Eka.Care) demonstrating the smart report feature

Excited to try it yourself? Use the link below to download our mobile app and convert your lab reports to smart reports and visualise trends. We would love to hear your feedback and suggestions.

Download Eka.Care app  

**At the time of writing this blog the feature is released as beta.