As a medical affairs professional, you may have heard the term "text mining" or Natural Language Processing (NLP) used to refer to a variety of approaches that transform unstructured document and database text into data that can be used for research, analysis, and insights. Furthermore, with the advent of generative AI and large language models like the recent GPT4 and ChatGPT, this topic has exploded into the popular consciousness.
These developments are coming against a backdrop of unrelenting data volumes and complexity. An estimated 97 Zettabytes of data were generated in 2022 according to Statista, and the growth is exponential. Another estimate puts the percentage of health-related data at 30% of all data – a staggering number. Given the broad remit of medical affairs, technology must be used to manage the overwhelming influx of information from various sources such as online conversations, medical records, scientific literature, social media, and organizational data. NLP can extract structured and usable data from within these complex sources and decipher ambiguities of language to extract key facts and relationships or provide summary views of the content to help users find what they really need.
The best part? NLP programs can do this without fatigue and in a consistent and unbiased manner. Attempting to do so as a single person is a near impossible feat.
Increasingly complex therapies and a stricter regulatory landscape mean that medical expertise is expected to operate at increased strategic and technical levels, all the way from evidence generation through to dissemination and education. This means medical affairs teams need to capture insights and evidence from the near-infinite data sources that exist and communicate it meaningfully across multiple stakeholders at various levels. And this all must be accomplished without increasing cost to the business.
Consider new evidence that emerges around the efficacy of a product that generates a spike in questions and inquiries from healthcare professionals (HCPs) via medical science liaisons (MSLs). How can you quickly get a view of the latest research for your product's therapy area, and zoom in on those that focus on efficacy, to provide your MSLs with the right background and context, fast? Pre-built large language models, machine learning and linguistics can be used to extract relevant content and reduce noise in the results. This can short-circuit laborious reviewing, giving your team the confidence, and crucially the data, to have the right conversations with HCPs and broader stakeholders.
NLP technology addresses challenges such as the inefficiencies of manually reading and reviewing thousands of documents, or by repeating analyses of the most current data. Similar NLP technologies can help answer the same questions but using multiple disparate data sources, both simplifying processes and centralizing outputs.
NLP is also used to normalize and extract information as part of real-world observational studies, where rich information is locked in sources like clinical, admission, and/or discharge notes. A current example of this sees free-text data for over 4,000 patients be submitted via an EDC (Electronic Data Capture) platform as part of a combined 2-year retrospective/4-year prospective study. The free text is used to derive the rationale for treatment decision from prescribers in 11 countries.
Another example that is growing in importance, especially in the United States, is the identification of markers for social determinants of health (SDOH) from electronic medical records (EMR) data, to provide a better understanding of patient challenges and how these impact their health and access care, for example.
In addition, NLP can be used with data from social media and online forums to understand how patients talk about their illnesses, treatments, and care. NLP helps medical affairs teams uncover trends and insights to increase responsiveness, help craft effective educational materials, medical communications, and more. In one recent project we are using machine learning to sift through noisy data to uncover the most relevant and useful tweets where patients specifically talk about personal experience with treatments for example. When you consider the scale of social media data especially for more common diseases, you must have a technology capable of mining the data and alerting you as the end user. Otherwise, there would be simply too much information to digest.
Start by taking a look at your under-utilized data sources—those that have a material amount of free text or document-based information—and consider engaging with an organization with deep experience in the life sciences industry in which you can pilot or proof of concept for an NLP focused on one or two questions or challenges. At IQVIA, we've been working with NLP for two decades, and have embraced the latest AI methods and our platforms allow business teams to get value from NLP without the need for internal development.
Combining the expertise and experience of medical professionals with technology and AI is critical for life sciences organizations if they are truly to become data driven in the digital age. Technologies should, empower and enable these teams by reducing or removing as much of the manual, tedious, and repetitive work as possible, allowing them to do what they do best which is to bring impactful therapies to patients