Skip to content

The dataset comprises transcripts of speeches delivered by members of the πŸ‡¦πŸ‡± Albanian Assembly during parliamentary sessions, spanning from 2013.

License

Notifications You must be signed in to change notification settings

KushtrimVisoka/Albania-Parliament-Transcriptions

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

19 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Albania-Parliament-Transcriptions

Open In Colab

The dataset comprises transcripts of speeches delivered by members of the Albanian Assembly during parliamentary sessions spanning from 2013. The goal of this repository is to provide a valuable resource for researchers and professionals interested in natural language processing, or political discourse analysis.

Data source

The dataset was compiled from publicly available transcripts published on the current and old official website of the Albanian Assembly (https://parlament.al/).

Data Preperation

The dataset was compiled by downloading PDF files and converting them to a text format using OCR. The resulting text was then cleaned to fix punctuation and spelling errors. It's important to note that due to the complexity of the PDF-to-text conversion process, the dataset may still contain typos and other errors. As a result, the dataset is provided "as is".

To do

  • Conduct additional quality assurance checks to identify and correct any remaining errors in the dataset.
  • Add a column for the party of the speaker.

Dataset structure

The dataset contains the following fields: text, speaker, date, id, num_tokens.

Usage

from datasets import load_dataset

dataset = load_dataset('Kushtrim/Albania-Parliament-Transcriptions')

Citation

If you use this dataset in your research, please consider citing this repository.

About

The dataset comprises transcripts of speeches delivered by members of the πŸ‡¦πŸ‡± Albanian Assembly during parliamentary sessions, spanning from 2013.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published