Skip to content
This repository was archived by the owner on May 4, 2021. It is now read-only.

Commit 2dd960b

Browse files
committed
Merge branch 'master' into dev
2 parents 4be52d0 + 54bbd3e commit 2dd960b

File tree

1 file changed

+5
-0
lines changed

1 file changed

+5
-0
lines changed

monolingual/README.md

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
For monolingual [Common Crawl](http://commoncrawl.org) data and code to process it please refer to these resources:
2+
* [University of Edinburgh N-gram site](http://statmt.org/ngrams)
3+
* Code to process corpora: https://github.com/kpu/preprocess
4+
* Code to produce raw monolingual files from CommonCrawl: https://github.com/treigerm/CommonCrawlProcessing
5+
* Alternative monolingual data extraction under development in ParaCrawl project: https://github.com/paracrawl/extractor

0 commit comments

Comments
 (0)