Dataset | Type | Domain | Anotation | Data time | Number of samples |
---|---|---|---|---|---|
Fake.br | Claim | News | Annotated | 01/2016 - 01/2018 | 7.200 |
FakeRecogna | Source | News | Agency | 03/2017 - 05/2020 | 11.773 |
Central de Fatos | Source | News | Agency | 01/2013 - 05/2021 | 10.461 |
Fact-check_tweet (pt split) | Claim-source pair | Tweets-News | Auto-Agency | 2019 - 2021 | 656 - 656 |
FakeNewsSet | Claim-source pair | Tweets-News | Auto-Agencys | 26.970 - 598 |
from datasets import load_dataset
data = load_dataset("fake-news-UFG/FactChecksbr")
We additionally upload raw versions from Fake.br, FakeRecogna, Central de Fatos, and FakeNewsSet.
Review urls were tagged using review id.
- Notebook generation script and EDA is located at process.ipynb.
- Builder scripts for Dataset Hub are located at builders/.
There are 23,467 sources in total, of which there are 20,028 unique sources. The biggest overlap is between "FakeRecogna" and "Central de Fatos". There is no source in common between all datasets.
From 3303 duplicated sources, we excluded 130 contradictory examples, in which one dataset indicates that source alledges “fake” while not alledges as "not fake".
If you evaluated any dataset, please feel free to pull a request. 😄
Dataset | Model | Accuracy | Precision | Recall | macro-F1 | URL |
---|---|---|---|---|---|---|
Fake.br | Bertimbau | 99,22% | - | - | - | repo |
Fake.Br | GloVe 100-600D - HAN | 97% | - | - | - | paper |
Fake.br | Bertimbau + Regressão Logística | 96,14% | 96,40% | 95,49% | 96,13% | paper |
Fake.Br | BoW | 96% | - | - | - | paper |
Fake.br | GloVe 100D + BiLSTM | 93.56% | - | - | - | repo |
Fake.br | TfidfVectorizer | 92,85% | 92,19% | 93,36% | - | repo |
Fake.BR | BoW | 89% | 89% | 89% | 89% | paper |
Fake.br | BoW + MLP | 88,65% | - | - | - | repo |
FakeNewsSetGen | Detective | 97,93% | 97,93% | - | - | repo |
Fact-check_tweet | XLM-R | 84,08% | - | - | 83,63% | paper |
FakeRecogna | MLP + BoW | 93,1% | 93,1% | 93,1% | 93,0% | repo |
@misc{FactChecksbr,
author = {R. S. Gomes, Juliana},
title = {FactChecks.br},
url = {https://github.com/fake-news-UFG/FactChecks.br},
doi = { 10.57967/hf/1016 },
}
This work has been supported by the FAPEG (Fundação de Amparo à Pesquisa do Estado de Goiás) and ANATEL (Agência Nacional de Telecomunicações).