Skip to content
This repository was archived by the owner on May 4, 2021. It is now read-only.

Commit 54bbd3e

Browse files
authored
Adding monolingual data links
1 parent 3ec4526 commit 54bbd3e

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

monolingual/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
For monolingual [Common Crawl](http://commoncrawl.org) data and code to process it please refer to these resources:
2+
* [University of Edinburgh N-gram site](http://statmt.org/ngrams)
3+
* Code to process corpora: https://github.com/kpu/preprocess
4+
* Code to produce raw monolingual files from CommonCrawl: https://github.com/treigerm/CommonCrawlProcessing
5+
* Alternative monolingual data extraction under development in ParaCrawl project: https://github.com/paracrawl/extractor

0 commit comments

Comments
 (0)