existing datasets for document-level RE
- either only have a small number of manually-annotated relations and entities,
- or exhibit noisy annotations from distant supervision,
- or serve specific domains or approaches.
- constructed from Wikipedia and Wikidata
- DocRED contains 132, 375 entities and 56, 354 relational facts annotated on 5, 053 Wikipedia documents
- As at least 40.7% of the relational facts in DocRED can only be extracted from multiple sentences
also provide large-scale distantly supervised data to support weakly supervised RE research
indicate the existing methods deal with the taks document level RE is more difficult sentence-level RE.