Transformer-based active learning for multi-class text annotation and classification

Journal article


Hussain, M. 2024. Transformer-based active learning for multi-class text annotation and classification. Digital Health. pp. 1-21. https://doi.org/10.1177/20552076241287357
AuthorsHussain, M.
Abstract

Objective

Data-driven methodologies in healthcare necessitate labeled data for effective decision-making. However, medical data, particularly in unstructured formats, such as clinical notes, often lack explicit labels, making manual annotation challenging and tedious.

Methods
This paper introduces a novel deep active learning framework designed to facilitate the annotation process for multiclass text classification, specifically using the SOAP (subjective, objective, assessment, plan) framework, a widely recognized medical protocol. Our methodology leverages transformer-based deep learning techniques to automatically annotate clinical notes, significantly easing the manual labor involved and enhancing classification performance. Transformer-based deep learning models, with their ability to capture complex patterns in large datasets, represent a cutting-edge approach for advancing natural language processing tasks.

Results

We validate our approach through experiments on a diverse set of clinical notes from publicly available datasets, comprising over 426 documents. Our model demonstrates superior classification accuracy, with an F1 score improvement of 4.8% over existing methods but also provides a practical tool for healthcare professionals, potentially improving clinical documentation practices and patient care.

Conclusions

The research underscores the synergy between active learning and advanced deep learning, paving the way for future exploration of automatic text annotation and its implications for clinical informatics. Future studies will aim to integrate multimodal data and large language models to enhance the richness and accuracy of clinical text analysis, opening new pathways for comprehensive healthcare insights.

Keywordshealthcare; medical data ; automatic text annotation; clinical informatics
Year2024
JournalDigital Health
Journal citationpp. 1-21
PublisherSAGE Journals
ISSN2055-2076
Digital Object Identifier (DOI)https://doi.org/10.1177/20552076241287357
Web address (URL)https://journals.sagepub.com/doi/full/10.1177/20552076241287357
Output statusPublished
Publication dates
Online17 Oct 2024
Publication process dates
Deposited21 Nov 2024
Permalink -

https://repository.derby.ac.uk/item/qqzw5/transformer-based-active-learning-for-multi-class-text-annotation-and-classification

  • 3
    total views
  • 0
    total downloads
  • 1
    views this month
  • 0
    downloads this month

Export as

Related outputs

Optimizing Aerospace Product Maintenance A Novel Multi-Modal Knowledge Graph and LLM Approach for Enhanced Decision Support
Awill, R., Khan, W., Hussain, M. and Anderson, B. 2024. Optimizing Aerospace Product Maintenance A Novel Multi-Modal Knowledge Graph and LLM Approach for Enhanced Decision Support. The Extended Semantic Web Conference 2024: Fabrics of Knowledge: Knowledge Graphs and Generative AI. The Extended Semantic Web .
Data-driven knowledge acquisition, validation, and transformation into HL7 Arden Syntax
Hussain, Maqbool, Afzal, Muhammad, Ali, Taqdir, Ali, Rahman, Khan, Wajahat Ali, Jamshed, Arif, Lee, Sungyoung, Kang, Byeong Ho and Latif, Khalid 2015. Data-driven knowledge acquisition, validation, and transformation into HL7 Arden Syntax. Artificial Intelligence in Medicine. 92, pp. 51-70. https://doi.org/10.1016/j.artmed.2015.09.008
The mining minds digital health and wellness framework
Banos, Oresti, Bilal Amin, Muhammad, Khan, Wajahat Ali, Afzal, Muhammad, Hussain, Maqbool, Kang, Byeong Ho and Lee, Sungyong 2016. The mining minds digital health and wellness framework. BioMedical Engineering OnLine. 15 (S1). https://doi.org/10.1186/s12938-016-0179-9
Multi-model-based interactive authoring environment for creating shareable medical knowledge
Ali, Taqdir, Hussain, Maqbool, Khan, Wajahat Ali, Afzal, Muhammad, Hussain, Jamil, Ali, Rahman, Hassan, Waseem, Jamshed, Arif, Kang, Byeong Ho and Lee, Sungyoung 2017. Multi-model-based interactive authoring environment for creating shareable medical knowledge. Computer Methods and Programs in Biomedicine. 150, pp. 41-72. https://doi.org/10.1016/j.cmpb.2017.07.010
An adaptive semantic based mediation system for data interoperability among health information systems
Khan, Wajahat Ali, Khattak, Asad Masood, Hussain, Maqbool, Amin, Muhammad Bilal, Afzal, Muhammad, Nugent, Christopher and Lee, Sungyoung 2014. An adaptive semantic based mediation system for data interoperability among health information systems. Journal of Medical Systems. 38 (8). https://doi.org/10.1007/s10916-014-0028-y
Acquiring guideline-enabled data driven clinical knowledge model using formally verified refined knowledge acquisition method
Afzal, Muhammad, Malik, Khalid M., Ali, Taqdir, Ali Khan, Wajahat, Irfan, Muhammad, Jamshrf, Arif, Lee, Sungyoung and Hussain, Maqbool 2020. Acquiring guideline-enabled data driven clinical knowledge model using formally verified refined knowledge acquisition method. Computer Methods and Programs in Biomedicine. https://doi.org/10.1016/j.cmpb.2020.105701