DocRED A Large-Scale Document-Level Relation Extraction Dataset阅读笔记



Problem Statement

existing datasets for document-level RE

  • either only have a small number of manually-annotated relations and entities,
  • or exhibit noisy annotations from distant supervision,
  • or serve specific domains or approaches.

Contribution (DocRED)

  • constructed from Wikipedia and Wikidata
  • DocRED contains 132, 375 entities and 56, 354 relational facts annotated on 5, 053 Wikipedia documents
  • As at least 40.7% of the relational facts in DocRED can only be extracted from multiple sentences
  • also provide large-scale distantly supervised data to support weakly supervised RE research

  • indicate the existing methods deal with the taks document level RE is more difficult sentence-level RE.