Hide metadata

dc.date.accessioned2023-02-07T18:11:50Z
dc.date.available2023-02-07T18:11:50Z
dc.date.created2022-12-07T09:29:40Z
dc.date.issued2022
dc.identifier.citationKrauer, Fabienne Schmid, Boris Valentijn . Mapping the plague through natural language processing. Epidemics. 2022, 41
dc.identifier.urihttp://hdl.handle.net/10852/99755
dc.description.abstractPandemic diseases such as plague have produced a vast amount of literature providing information about the spatiotemporal extent, transmission, or countermeasures. However, the manual extraction of such information from running text is a tedious process, and much of this information remains locked into a narrative format. Natural Language processing (NLP) is a promising tool for the automated extraction of epidemiological data, and can facilitate the establishment of datasets. In this paper, we explore the utility of NLP to assist in the creation of a plague outbreak dataset. We produced a gold standard list of toponyms by manual annotation of a German plague treatise published by Sticker in 1908. We investigated the performance of five pre-trained NLP libraries (Google, Stanford CoreNLP, spaCy, germaNER and Geoparser) for the automated extraction of location data compared to the gold standard. Of all tested algorithms, spaCy performed best (sensitivity 0.92, F1 score 0.83), followed closely by Stanford CoreNLP (sensitivity 0.81, F1 score 0.87). Google NLP had a slightly lower performance (F1 score 0.72, sensitivity 0.78). Geoparser and germaNER had a poor sensitivity (0.41 and 0.61). We then evaluated how well automated geocoding services such as Google geocoding, Geonames and Geoparser located these outbreaks correctly. All geocoding services performed poorly – particularly for historical regions – and returned the correct GIS information only in 60.4%, 52.7% and 33.8% of all cases. Finally, we compared our newly digitized plague dataset to a re-digitized version of the plague treatise by Biraben and provide an update of the spatio-temporal extent of the second pandemic plague outbreaks. We conclude that NLP tools have their limitations, but they are potentially useful to accelerate the collection of data and the generation of a global plague outbreak database.
dc.languageEN
dc.publisherElsevier BV
dc.rightsAttribution 4.0 International
dc.rights.urihttps://creativecommons.org/licenses/by/4.0/
dc.titleMapping the plague through natural language processing
dc.title.alternativeENEngelskEnglishMapping the plague through natural language processing
dc.typeJournal article
dc.creator.authorKrauer, Fabienne
dc.creator.authorSchmid, Boris Valentijn
cristin.unitcode185,15,29,50
cristin.unitnameCentre for Ecological and Evolutionary Synthesis
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode1
dc.identifier.cristin2089849
dc.identifier.bibliographiccitationinfo:ofi/fmt:kev:mtx:ctx&ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.jtitle=Epidemics&rft.volume=41&rft.spage=&rft.date=2022
dc.identifier.jtitleEpidemics
dc.identifier.volume41
dc.identifier.pagecount8
dc.identifier.doihttps://doi.org/10.1016/j.epidem.2022.100656
dc.type.documentTidsskriftartikkel
dc.type.peerreviewedPeer reviewed
dc.source.issn1755-4365
dc.type.versionPublishedVersion
cristin.articleid100656
dc.relation.projectNFR/288551


Files in this item

Appears in the following Collection

Hide metadata

Attribution 4.0 International
This item's license is: Attribution 4.0 International