Skip to content

Commit c429c86

Browse files
committed
remove unnecessary text
1 parent f55ca53 commit c429c86

File tree

1 file changed

+0
-86
lines changed

1 file changed

+0
-86
lines changed

clustering_metrics/metrics.py

-86
Original file line numberDiff line numberDiff line change
@@ -1,92 +1,6 @@
11
# -*- coding: utf-8 -*-
22

33
"""
4-
5-
Motivation
6-
----------
7-
8-
The motivation behind the given re-implementation of some clustering metrics is
9-
to avoid the high memory usage of equivalent methods in Scikit-Learn. Using
10-
sparse dictionary maps avoids storing co-incidence matrices in memory leading to
11-
more acceptable performance in multiprocessing environment or on very large data
12-
sets.
13-
14-
A side goal was to investigate different association metrics with the aim of
15-
applying them to evaluation of clusterings in semi-supervised learning and
16-
feature selection in supervised learning.
17-
18-
Finally, I was interested in the applicability of different association metrics
19-
to different types of experimental design. At present, there seems to be both
20-
(1) a lot of confusion about the appropriateness of different metrics, and (2)
21-
relatively little attention paid to the type of experimental design used. I
22-
believe that, at least partially, (1) stems from (2), and that different types
23-
of experiments call for different categories of metrics.
24-
25-
Contingency Tables and Experimental Design
26-
------------------------------------------
27-
28-
Consider studies that deal with two variables whose respective realizations can
29-
be represented as rows and columns in a table. Roughly adhering to the
30-
terminology proposed in [1]_, we distinguish four types of experimental design
31-
all involving contingency tables.
32-
33-
========= ===================================
34-
Model O all margins and totals are variable
35-
Model I only the grand total is fixed
36-
Model II one margin (either row or column totals) is fixed
37-
Model III both margins are fixed
38-
========= ===================================
39-
40-
Model O is rarely employed in practice because researchers almost always have
41-
some rough total number of samples in mind that they would like to measure
42-
before they begin the actual measuring. However, Model O situation might occur
43-
when the grand total is not up to researchers to fix, and so they are forced to
44-
treat it as a random variable. An example of this would be astronomy research
45-
that tests a hypothesis about a generalizable property such as dark matter
46-
content by looking at all galaxies in the Local Group, and the researchers
47-
obviously don't get to choose ahead of time how many galaxies there are near
48-
ours.
49-
50-
Model I and Model II studies are the most common and usually the most confusion
51-
arises from mistaking one for the other. In psychology, interrater agreement is
52-
an example of Model I approach. A replication study, if performed by the
53-
original author, is a Model I study, but if performed by another group of
54-
researchers, becomes a Model II study.
55-
56-
Fisher's classic example of tea tasting is an example of a Model III study [2]_.
57-
The key difference from a Model II study here is that the subject was asked to
58-
call four cups as prepared by one method and four by the other. The subject was
59-
not free to say, for example, that none of the cups were prepared by adding milk
60-
first. The hypergeometric distribution used in the subsequent Fisher's exact
61-
test shares the assumption of the experiment that both row and column counts are
62-
fixed.
63-
64-
Choosing an Association Metric
65-
------------------------------
66-
67-
Given the types of experimental design listed above, some metrics seem to be
68-
more appropriate than others. For example, two-way correlation coefficients
69-
appear to be inappropriate for Model II studies where their respective regression
70-
components seem more suited to judging association.
71-
72-
Additionally, if there is implied causality relationship, one-sided measures
73-
might be preferred. For example, when performing feature selection, it seems
74-
logical to measure the influence of features on the class label, not the other
75-
way around.
76-
77-
Using Monte Carlo methods, it should be possible to test the validity of the
78-
above two propositions as well as to visualize the effect of the assumptions
79-
made.
80-
81-
References
82-
----------
83-
84-
.. [1] `Sokal, R. R., & Rohlf, F. J. (2012). Biometry (4th edn). pp. 742-744.
85-
<http://www.amazon.com/dp/0716786044>`_
86-
87-
.. [2] `Wikipedia entry on Fisher's "Lady Tasting Tea" experiment
88-
<https://en.wikipedia.org/wiki/Lady_tasting_tea>`_
89-
904
"""
915

926
import warnings

0 commit comments

Comments
 (0)