Some decades have passed since the concept of "named entity" was used for the first time. Since then, new lines of research have emerged in this environment, such as works on the Entity Linking (EL) task, which links the (named) entity mentions in a text collection with their corresponding knowledge-base entries. However, within this task, there is often a lack of consensus in the literature on the definition of the concept of "entity".
This vocabulary aims to model fine-grained categories for EL in order to tackle this problem. Each category divides the universe of mentions into subclasses based on current entity definitions. The vocabulary is designed to extend the NIF format.
Entity Linking (EL) is a task in Information Extraction that links the entity mentions in a text collection with their corresponding knowledge-base (KB) entries. With EL, we can take advantage of a large amount of information available in publicly available KBs (e.g., Wikipedia, DBpedia, Wikidata) about real-world entities and their relationships to obtain semantic information that can be used to achieve a better understanding of text corpora. While the previous challenges for EL are well-known, another more fundamental issue is often overlooked by the community: the question of what is an “entity”? Though several definitions have emerged about what an entity should be [Grishman at al., 1996][Eckhardt at al., 2014][Uren at al., 2006][Perera at al., 2016], there is, as of yet, no clear consensus [Borrega at al., 2007][Ling at al., 2015]. This vocabulary proposes terms representing fine-grained categories of mentions and links in order to make different design choices for the EL task explicit.
All examples in this document are written in the Turtle RDF syntax. Throughout the document, the following namespaces are used:
Prefix | Namespace | Description |
---|---|---|
owl | http://www.w3.org/2002/07/owl | The OWL 2 Schema vocabulary (OWL 2) |
xsd | http://www.w3.org/2001/XMLSchema# | XML Schema |
rdfs | http://www.w3.org/2000/01/rdf-schema# | The RDF Schema vocabulary (RDFS) |
dc | http://purl.org/dc/terms/ | DCMI Metadata Terms |
vann | http://purl.org/vocab/vann/ | A vocabulary for annotating vocabulary descriptions |
lexinfo | http://www.lexinfo.net/ontology/2.0/lexinfo# | Version 2.0 of LexInfo Ontology, based on Lemon |
doap | http://usefulinc.com/ns/doap# | Description of a Project vocabulary |
void | http://rdfs.org/ns/void# | Vocabulary of Interlinked Datasets |
gold | http://purl.org/linguistics/gold | Genderal Ontology for Linguistic Description |
skos | http://www.w3.org/2004/02/skos/core# | SKOS Simple Knowledge Organization System Namespace Document |
This ontology has the following classes and properties.
This vocabulary is organized as a hierarchy, where the first level categories are: fel:BaseFormClass, fel:PartOfSpeechClass, fel:OverlapClass and fel:ReferenceClass. Each of them is a partition of the universe of mentions with the goal of categorizing different types of entities. Additionally, we link to related external terms (in gray).
IRI: https://w3id.org/vcb/fel#BaseFormClass
IRI: https://w3id.org/vcb/fel#ProperForm
This class gathers all mentions based on names (proper nouns), e.g., 'Michael Jackson', 'USA', 'King of the Pop', 'B. Obama', etc. Such mentions do not have to be nouns if they are based on proper nouns, as in the case of 'French, 'Orwellian', etc. Such mentions may use abbreviated or extended forms of names; we add a new level in the class hierarchy to separate them: Full, Extended, Short or Alias.
IRI: https://w3id.org/vcb/fel#FullProperForm
This class gathers all proper-form mentions that (almost) exactly match with the label of the Knowledge-Base entity. For example, the mention 'Michael Jackson' targeting wiki:Michael_Jackson is considered Full. This class also includes mentions that are syntactically close to the knowlegebase entity, sharing the same morpheme(s), for instance 'German' pointing to wiki:Germany is also considered a FullProperForm.
IRI: https://w3id.org/vcb/fel#ShortProperForm
This class is concerned with all the proper-name mentions that are shorter than the label of the Knowledge-Base entity while still being based on the label. For instance, the mentions 'Jackson' or 'M. Jackson' targeting wiki:Michael_Jackson are considered ShortProperForm.
IRI: https://w3id.org/vcb/fel#ExtendedProperForm
This class gathers all proper-name mentions longer than the label of the Knowledge-Base entity but containing the label. For example, the mention 'Michael Joseph Jackson' targeting wiki:Michael_Jackson is considered an ExtendedProperForm.
IRI: https://w3id.org/vcb/fel#AliasProperForm
This class is concerned with all the proper-noun mentions with a different morpheme than the primary label of the knowledge base entity to which if refers (though it may be a known alias). For instance, the mention 'King of Pop' targeting wiki:Michael_Jackson is considered an AliasProperForm.
IRI: https://w3id.org/vcb/fel#NumericTemporalForm
This class gathers all mentions based on numeric and temporal expressions, such as: '1', 'one', '12/23/2019', etc. (as were included in MUC-6).
IRI: https://w3id.org/vcb/fel#CommonForm
This class gathers all the mentions with a corresponding entity in the knowledgebase, but that does not correspond to a Proper Form, Pro-Form or Numeric/Temporal Form. For instance, the mention 'belt' referring to wiki:Belt_(clothing) is considered CommonForm.
IRI: https://w3id.org/vcb/fel#ProForm
This class gathers all mentions based on pronouns, pro-adjective, etc. For example, the mentions 'he', 'theirs', etc., are considered ProForm (assuming they link to a knowledgebase entity).
IRI: https://w3id.org/vcb/fel#PartOfSpeechClass
This meta-class gathers classes that divide annotations according to the part-of-speech of their mention.
IRI: https://w3id.org/vcb/fel#NounPhrasePoS
This class gathers all the noun mentions.
IRI: https://w3id.org/vcb/fel#SingularNounPhrasePoS
This class gathers all the singular noun mentions, including 'documentary', 'Germany', etc.
IRI: https://w3id.org/vcb/fel#PluralNounPhrasePoS
This class gathers all the plural noun mentions. For instance, 'political parties' may refer to wiki:Political_party.
IRI: https://w3id.org/vcb/fel#VerbPoS
This class gathers all the verb mentions. For instance the verb mention 'assassinated' may link to wiki:Assassination.
IRI: https://w3id.org/vcb/fel#AdjectivePoS
This class gather all the adjective mentions. For example, there is a wikipedia page (wiki:Red) about the color 'red'.
IRI: https://w3id.org/vcb/fel#AdverbPoS
This class gathers all the Adverb mentions. For instance, 'comercially' could be associated to wiki:Commerce
IRI: https://w3id.org/vcb/fel#OverlapClass
This meta-class gathers classes that divide annotations based on whether or not their mention overlaps with others. For example, in the sentence 'Living with Michael Jackson is a television documentary' the mention 'documentary' does not overlap with another mention; for this reason it is considered non-overlapping. On the other hand, the mentions 'Living with Michael Jackson' and 'Michael Jackson' have overlap.
IRI: https://w3id.org/vcb/fel#NoOverlap
This class gathers all the mentions without overlap.
IRI: https://w3id.org/vcb/fel#MaximalOverlap
This class describes all the mentions that overlap with others and that, more specifically, contain other mentions entirely inside them but are not contained in other mentions. For instance, 'Living with Michael Jackson' is considered as maximal overlap assuming 'Michael Jackson' is also annotated and it is not contained inside another mention.
IRI: https://w3id.org/vcb/fel#IntermediateOverlap
This class describes all the mentions that overlap with others and that, more specifically, both contain and are contained in other mentions. For instance, in the mention 'New York Police Department Museum', the mention 'New York Police Department' has intermediate overlap because it is contained in the overall mention and contains the mention 'New York'.
IRI: https://w3id.org/vcb/fel#MinimalOverlap
This class describes all the mentions that overlap with others and that, more specifically, are contained in but do not contain other mentions. For instance, in the annotation 'Living with Michael Jackson', the mention 'Michael Jackson' is considered to have minimal overlap.
IRI: https://w3id.org/vcb/fel#ReferenceClass
This meta-class gathers classes that divide annotations based on how the mention references its entity. Examples of types of reference include Anaphoric, Direct, Descriptive, Metaphoric, Metonymic and Related.
IRI: https://w3id.org/vcb/fel#AnaphoricReference
This class gathers mentions that are pro-forms referring to an antecedent or postcedent in the text. For instance, in the sentence 'His son was widely regarded ...' the mention 'His' may be an anaphoric reference to wiki:Joe_Jackson_(manager). (Note that noun phrases such as 'His son' referring to wiki:Michael_Jackson' should rather be marked as descriptive references.).
IRI: https://w3id.org/vcb/fel#DirectReference
This class gathers mentions with references based on the direct, literal meaning of the words and names. For instance, the reference 'Michael Jackson' referring to wiki:Michael_Jackson, or the reference 'talent manager' referring to wiki:Talent_manager, are considered direct references.
IRI: https://w3id.org/vcb/fel#DescriptiveReference
This class gathers mentions based on describing the entities they refer to. For instance, the mention 'the capital of Peru' refers descriptively to wiki:Lima, or in the sentence 'Michael Jackson and his father', the mention 'his father' refers to wiki:Joe_Jackson_(manager). Note that proforms should rather be marked as anaphoric reference.
IRI: https://w3id.org/vcb/fel#MetaphoricReference
This class gathers mentions that make reference based on a figurative rather than literal meaning of the words. For example, in the phrase 'the King of Pop', the mention 'King' can be considered a metaphoric reference to wiki:King; in the sentence 'they added spice to their relationship', the mention 'spice' (wiki:Spice) is again a metaphoric reference.
IRI: https://w3id.org/vcb/fel#MetonymicReference
This class gathers mentions that refer to something specific by reference to a broader related entity (often, but not always, countries). For example, in the phrase 'Russia announced today', the mention 'Russia' is a metonymic reference to wiki:Government_of_Russia; in the phrase 'Poland won 3-2 on penalties', 'Poland' may be a metonymic reference to wiki:Poland_national_football_team, etc.
IRI: https://w3id.org/vcb/fel#RelatedReference
This class gathers mentions that refer to something for which there is (only) something closely related in the knowledge-base. For instance, in the phrase 'The Russian daily RBK', the mention 'daily' refers to a daily newpaper, but in Wikipedia we only have wiki:Newspaper, so 'daily' can be seen as a reference to the closely related wiki:Newspaper. (Such references are sometimes reflected, for example, with redirects in Wikipedia, or pointers to a subsection of an entity's article.)
IRI: https://w3id.org/vcb/fel#entityType
To specify the entity type of a KB-entity. The domain is URIs/IRIs of EK entities, and the range types of entities, e.g., Organization, Place, Person, etc.
An example cannot be overlooked! Below we show in NIF format the sentence "The program 'Living with Michael Jackson' was broadcast." and the annotation of three mentions. We incorporate the categorization of each anntation, as well as the entity type of their links.
If you use FEL vocabulary in a research work, we would ask you to reference the following paper that describes the [categories] in detail (Note, it doesn't describe the vocabulary):
Henry Rosales-Méndez, Aidan Hogan, Barbara Poblete. "A Fine-Grained Categorisation for Entity Linking". In the Proceedings of the Conference on Empirical Methods in Natural Language Processing and International Joint Conference on Natural Language Processing (EMNLP–IJCNLP), Hong Kong, China, November 3–7, 2019.
This class gathers definitions that mainly recognize Proper Nouns as entities (e.g., MUC-6 definition), with other more flexible definitions, such as those that allow pronouns, numbers, temporal expressions, and any mention with a related KB-entity. All mentions fill in this category, the separation is provided by its subclasses: fel:ProperForm, fel:NumericTemporalForm, fel:CommonForm, and fel:ProForm.