Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tagging errors (not in original Hunpos) #29

Open
heatherleaf opened this issue Mar 26, 2021 · 9 comments
Open

Tagging errors (not in original Hunpos) #29

heatherleaf opened this issue Mar 26, 2021 · 9 comments

Comments

@heatherleaf
Copy link

This version of hunpos behaves differently than the original compiled binary:

$ hunpos-tag suc3_suc-tags_default-setting_utf8.model < example.txt
jag	PN.UTR.SIN.DEF.SUB	
och	UO	
du	PN.UTR.SIN.DEF.SUB

Original version (downloaded from https://code.google.com/archive/p/hunpos/downloads):

$ hunpos-tag suc3_suc-tags_default-setting_utf8.model < example.txt
jag	PN.UTR.SIN.DEF.SUB	
och	KN	
du	PN.UTR.SIN.DEF.SUB	

The original version gives the correct output: "och" is the most common Swedish conjunction (KN) and not a foreign word (UO). The language model is available from here: https://github.com/spraakbanken/sparv-models/raw/master/hunpos/suc3_suc-tags_default-setting_utf8.model (beware, the model is 14MB)

I compiled both on Mac OS Catalina, and on Devuan Linux, and it behaves the same on both platforms (i.e., gives the wrong postag for "och").

Note: there are problems with at least the folloing common Swedish conjunctions:

  • "jag eller du": eller gets tagged asUO instead of KN
  • "jag som du": som gets tagged as HA instead of KN
  • "jag om du": om gets tagged asPL instead of SN (subjunction)
@Nakilon
Copy link

Nakilon commented Sep 22, 2021

I used the old binary of v1.0 until now since it stopped working on macOS so I've rebuilt it and have noticed that in the "looked up the date" the "date" became a verb. Then I took the cache I preserved since using the previous version to compare:

diff --git a/2.txt b/1.txt
index 573cf80..5c3aa7e 100644
--- a/2.txt
+++ b/1.txt
@@ -1,19 +1,20 @@
-(	NNS	
-remember	VBP	
+(	VBZ	
+remember	VB	
 =	SYM	
-I	PRP	
+I	NNP	
 looked	VBD	
-up	RP	
-the	DT	
-date	NN	
-in	IN	
+up	IN	
+the	VBP	
+date	VB	
+in	RP	
 the	DT	
 logs	NNS	
-and	CC	
-checked	VBD	
+and	NNP	
+checked	VBN	
 which	WDT	
-comic	NN	
+comic	JJ	
 I	PRP	
-referred	VBD	
-to	TO	
+referred	VBN	
+to	JJ	
 )	VB	
+

"the date" -- "VBP VB" -- is that correct? Maybe I'm supposed to take some updated model file from somewhere?

@giuliopaci
Copy link
Contributor

Hi all!

Do you know the exact source code corresponding to the old binary?
Can you share the model files that you are using?

@Nakilon
Copy link

Nakilon commented Sep 22, 2021

I have no idea about the source code of the old binary, I just downloaded it from https://code.google.com/archive/p/hunpos/downloads
md5 : 4baee5cc5d9d3b0c3c691e375616d2a9

md5 en_wsj.model : f666dc61f7cbf3cc69366010a4e1f29f

Maybe the upload date has some relation to code version.

The new one was compiled without any issue following the instructions, after brew install ocaml.

@giuliopaci
Copy link
Contributor

I am able to reproduce the issue with that model.
Yesterday I had a quick look at it and remembered about issue #21.
Maybe we are having a similar issue. I will try to setup an old OCaml environment (e.g., 3.10.x, which was the recommended version to compile Hunpos back when the binaries on Google code were compiled) as soon as I have some time and check if reverting to that environment improves the situation.

@giuliopaci
Copy link
Contributor

giuliopaci commented Sep 25, 2021

Indeed by compiling current source code with an older OCaml version (It works until 3.12.1 and breaks starting from 4.00.0) "solved" the issue.

I guess that #21 was only partially addressed and further investigation is needed. I do not know when I will have time to investigate the issue.

In the meahwhile you can either retrain the model or compile with OCaml 3.12.1.

Obviously, anyone willing to investigate and solve the issue is welcome. :-)

@Nakilon
Copy link

Nakilon commented Sep 25, 2021

Oh, cool. I just wonder how do I install specific version of OCaml on macOS. I just used brew install ocaml and homebrew does not really provide a way to install old versions of formulas. Is there any OCaml installation manager?

@giuliopaci
Copy link
Contributor

giuliopaci commented Sep 25, 2021

Yes, it is called opam.

From https://opam.ocaml.org/doc/Install.html I can see:

brew install gpatch
brew install opam

Once you have opam installed you can install specific versions following instructions at https://ocaml.org/docs/install.html.

On a clean setup it should be something like:

# environment setup (Required only the first time you use opam):
opam init
eval $(opam env)

# install given version of the compiler
opam switch create 3.12.1

# enable last opam setup (e.g., the setup you configured using switch opam command; you will have to run this command every time that you want to configure a shell to use this opam environment)
eval $(opam env)

# check you got what you want
which ocaml
ocaml -version

@Nakilon
Copy link

Nakilon commented Sep 29, 2021

Hmmm, by default the v4 is installed, then the latest v3 is 3.12.1 and:

# cc -I../byterun -DCAML_NAME_SPACE -DNATIVE_CODE -DTARGET_amd64 -DSYS_macosx  -O -D_FILE_OFFSET_BITS=64 -D_REENTRANT   -c -o startup.o startup.c
# startup.c:161:3: error: implicit declaration of function 'caml_debugger_init' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
#   caml_debugger_init (); /* force debugger.o stub to be linked */
#   ^
# 1 error generated.
# make[3]: *** [startup.o] Error 1
# make[2]: *** [makeruntimeopt] Error 2
# make[1]: *** [opt-core] Error 2
# make: *** [world.opt] Error 2

<><> Error report <><><><><><><><><><><><><><><><><><><><><><><><><><><><><>  🐫 
┌─ The following actions failed
│ λ build ocaml-base-compiler 3.12.1

@giuliopaci
Copy link
Contributor

You are right, I tested 3.12.1 and not 3.12.2. I changed my comments above to reflect this.

As for the error you are experiencing, maybe you can open an issue either to opam or ocaml. I guess it should be possible to compile by setting some C compiler flags so that it does not fail due to this issue. Probably removing -Werror CC flag is enough. Unfortunately I do not know how to do that with opam. 😅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants