Language model-based automatic prefix abbreviation expansion method for biomedical big data analysis
Journal article
Authors | Anjum, Ashiq |
---|---|
Abstract | In biomedical domain, abbreviations are appearing more and more frequently in various data sets, which has caused significant obstacles to biomedical big data analysis. The dictionary-based approach has been adopted to process abbreviations, but it cannot handle ad hoc abbreviations, and it is impossible to cover all abbreviations. To overcome these drawbacks, this paper proposes an automatic abbreviation expansion method called LMAAE (Language Model-based Automatic Abbreviation Expansion). In this method, the abbreviation is firstly divided into blocks; then, expansion candidates are generated by restoring each block; and finally, the expansion candidates are filtered and clustered to acquire the final expansion result according to the language model and clustering method. Through restrict the abbreviation to prefix abbreviation, the search space of expansion is reduced sharply. And then, the search space is continuous reduced by restrained the effective and the length of the partition. In order to validate the effective of the method, two types of experiments are designed. For standard abbreviations, the expansion results include most of the expansion in dictionary. Therefore, it has a high precision. For ad hoc abbreviations, the precisions of schema matching, knowledge fusion are increased by using this method to handle the abbreviations. Although the recall for standard abbreviation needs to be improved, but this does not affect the good complement effect for the dictionary method. |
Keywords | Abbreviation expansion; Biomedical text analysis; Language model |
Year | 2019 |
Journal | Future Generation Computer Systems |
Publisher | Elsevier |
ISSN | 0167739X |
Digital Object Identifier (DOI) | https://doi.org/10.1016/j.future.2019.01.016 |
Web address (URL) | http://hdl.handle.net/10545/623968 |
hdl:10545/623968 | |
Publication dates | 28 Mar 2019 |
Publication process dates | |
Deposited | 04 Jul 2019, 15:18 |
Accepted | 13 Jan 2019 |
Contributors | University of Derby |
File | File Access Level Open |
https://repository.derby.ac.uk/item/922v1/language-model-based-automatic-prefix-abbreviation-expansion-method-for-biomedical-big-data-analysis
Download files
91
total views0
total downloads2
views this month0
downloads this month