Labeled entities from social media data related to avian influenza disease - INRAE - Institut national de recherche pour l’agriculture, l’alimentation et l’environnement Accéder directement au contenu
Article Dans Une Revue (Data Paper) Data in Brief Année : 2022

Labeled entities from social media data related to avian influenza disease

Résumé

This dataset is composed by spatial (e.g. location) and thematic (e.g. diseases, symptoms, virus) entities concerning avian influenza in social media (textual) data in English. It was created from three corpora: the first one includes 10 transcriptions of YouTube videos and 70 tweets manually annotated. The second corpus is composed by the same textual data but automatically annotated with Named Entity Recognition (NER) tools. These two corpora have been built to evaluate NER tools and apply them to a bigger corpus. The third corpus is composed of 100 YouTube transcriptions automatically annotated with NER tools. The aim of the annotation task is to recognize spatial information such as the names of the cities and epidemiological information such as the names of the diseases. An annotation guideline is provided in order to ensure a unified annotation and to help the annotators. This dataset can be used to train or evaluate Natural Language Processing (NLP) approaches such as specialized entity recognition.
Fichier principal
Vignette du fichier
Schaeffer_2022.pdf (281.29 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03733909 , version 1 (21-07-2022)

Licence

Paternité

Identifiants

Citer

Camille Schaeffer, Roberto Interdonato, Renaud Lancelot, Mathieu Roche, Maguelonne Teisseire. Labeled entities from social media data related to avian influenza disease. Data in Brief, 2022, 43, pp.108317. ⟨10.1016/j.dib.2022.108317⟩. ⟨hal-03733909⟩
71 Consultations
19 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More