Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Best coverage subsets for three varying numbers of datasets #18

Open
LinguList opened this issue Feb 19, 2018 · 1 comment
Open

Best coverage subsets for three varying numbers of datasets #18

LinguList opened this issue Feb 19, 2018 · 1 comment

Comments

@LinguList
Copy link
Contributor

If we follow the plan to offer three different networks, namely one high-coverage with many languages and, say 300 concepts, one with less languages, but more concepts of, say 600 concepts, and one with the maximum we can get, we need to use the coverage code in lingpy to account for this.

This code is now straightforward, but the question is: do we still and actually need this, or do we rather just take the full dump of 2000 concepts? Given that we know the frequency of each concept IN CLICS, we can easily even visualize this by showing the size. And the communities still make sense, so far, we do not suffer from skewed data...

@xrotwang
Copy link
Contributor

I think this could be solved by adding some sort of frequency (percentage of languages having a counterpart for a concept) measure to the concept labels (or using the frequency for bubble size in the visualization).

@xrotwang xrotwang transferred this issue from clics/clics2 May 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants