-
Notifications
You must be signed in to change notification settings - Fork 4
API reference: FileIO V0.3a
FileIO
is located in src/fileIO.py
. It contains functions used to handle file operations in the Aligner.
The description here is of module version 0.2a.
- Support of Dataset formats including
bitext
andtritext
is now dropped. All datasets will now useDataset
Parameters:
-
result
:Alignment
, detailed description of this format -
fileName
:str
, the file to export to
Parameters:
-
fFiles
:list
ofstr
, files for source language to read. Order: Original text, POS Tag. -
eFiles
:list
ofstr
, files for target language to read. Order: Original text, POS Tag. -
alignmentFile
:str
, optional, the alignment file to read -
linesToLoad
:int
, the lines to read
Return:
-
Dataset
, detail of this format.
Parameters:
-
fileName
:str
, the Alignment file to read -
linesToLoad
:int
, the lines to read
Return:
-
GoldAlignment
, detail of this format.
UTF-8 text files. Each line contains one sentence, sentences are segmented in which words are separated by space
. One language each file.
UTF-8 text files. Each line contains one sentence. Alignments of words of in one sentence are separated by space
. Each alignment is represented in the following format:
-
"NN-MM"
, whereNN
andMM
are integers, means that there is a certain alignment between theNN
th word of the source sentence and theMM
th word of the target sentence. In addition,MM
could be of the format:"M1,M2,M3,..."
which means that there are certain alignments between theNN
th word of the source sentence and each of theMi
th words of the target sentence. -
"NN?MM"
, whereNN
andMM
are integers, means that there is a probable alignment between theNN
th word of the source sentence and theMM
th word of the target sentence. In addition,MM
could be of the format:"M1,M2,M3,..."
which means that there are probable alignments between theNN
th word of the source sentence and each of theMi
th words of the target sentence. -
"NN-MM-TT"
, whereNN
andMM
are integers,TT
is astr
representing the type of the alignment. It means that there is a certain alignment between theNN
th word of the source sentence and theMM
th word of the target sentence, both of which are ofTT
type.