Best coverage subsets for three varying numbers of datasets #18

LinguList · 2018-02-19T10:57:14Z

If we follow the plan to offer three different networks, namely one high-coverage with many languages and, say 300 concepts, one with less languages, but more concepts of, say 600 concepts, and one with the maximum we can get, we need to use the coverage code in lingpy to account for this.

This code is now straightforward, but the question is: do we still and actually need this, or do we rather just take the full dump of 2000 concepts? Given that we know the frequency of each concept IN CLICS, we can easily even visualize this by showing the size. And the communities still make sense, so far, we do not suffer from skewed data...

xrotwang · 2019-05-22T10:37:59Z

I think this could be solved by adding some sort of frequency (percentage of languages having a counterpart for a concept) measure to the concept labels (or using the frequency for bubble size in the visualization).

xrotwang transferred this issue from clics/clics2 May 22, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best coverage subsets for three varying numbers of datasets #18

Best coverage subsets for three varying numbers of datasets #18

LinguList commented Feb 19, 2018

xrotwang commented May 22, 2019

Best coverage subsets for three varying numbers of datasets #18

Best coverage subsets for three varying numbers of datasets #18

Comments

LinguList commented Feb 19, 2018

xrotwang commented May 22, 2019