Skip to content

Biswajit2902/SimpleWikiParser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SimpleWikiParser

An Simplified Wiki Data Parser

Installation

pip install simple-wikiparser

Usage:

from wikiparser.core import WikiMediaDumpParser

# initialise Parser for a language (say Hindi)
wiki_dump_parser = WikiMediaDumpParser(language="Hindi")

# parse
wiki_dump_parser.parse()

# export
wiki_dump_parser.export_hf_dataset("/path/to/data.jsonl", "dataset_name")

About

An Simplified Wiki Data Parser

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages