MSNBC and ACE2004 are two popular gold standard datasets, commonly used in quality validation of EL approaches.
MSNBC, available here, was constructed from MSNBC news and an
automatic disambiguation named entities process. On the other hand, the original ACE2004 is available from here, containing an annotated subset of ACE co-reference data
set.
Due to the increasing availability of tools aimed to deal with NIF, we transform both datasets to NIF, making available
these new versions of them MSNBC and ACE2004. It is part of our work [1].
[1] Henry Rosales-Méndez, Aidan Hogan and Barbara Poblete. NIFify: Supporting NIF for Entity
Linking. (in
progress)