Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KMeansClustering no longer works with custom objects. #15

Open
exhuma opened this issue Aug 22, 2014 · 6 comments
Open

KMeansClustering no longer works with custom objects. #15

exhuma opened this issue Aug 22, 2014 · 6 comments
Labels
Milestone

Comments

@exhuma
Copy link
Owner

exhuma commented Aug 22, 2014

The following code gives an error:

from cluster import KMeansClustering
from os import urandom
from pprint import pprint
from random import randint


class ObjectWithMetadata(object):

    def __init__(self, value):
        self.value = value
        self.uid = urandom(10).encode('base64').strip()

    def __repr__(self):
        return 'ObjectWithMetadata({!r}, {!r}'.format(self.value, self.uid)


data = [ObjectWithMetadata(randint(0, 1000))
        for _ in range(200)]

cl = KMeansClustering(data, lambda x, y: float(abs(x.value-y.value)))
clustered = cl.getclusters(10)
pprint(clustered)
print(len(clustered))

The error:

Traceback (most recent call last):
  File "metadata.py", line 21, in <module>
    clustered = cl.getclusters(10)
  File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 109, in getclusters
    res = self.assign_item(item, cluster)
  File "/home/exhuma/work/github/python-cluster/cluster/method/kmeans.py", line 124, in assign_item
    if self.distance(item, centroid(cluster)) < self.distance(
  File "/home/exhuma/work/github/python-cluster/cluster/util.py", line 175, in centroid
    for i in range(len(data[0])):
TypeError: object of type 'ObjectWithMetadata' has no len()
@exhuma
Copy link
Owner Author

exhuma commented Aug 24, 2014

With the current implementation of KMeansClustering, solving this would get quite messy. Running the K-Means method requires 3 functions which strongly depend on the nature of items:

  • distance
  • equality
  • centroid

It would make more sense to enforce non-tuple data elements to be a subclass of an ABC which has the above methods as abstract. Where distance could be implemented as __sub__ and equality could be implemented as __eq__.

I will follow semantic versioning, and as this would change the external API, I will postpone this for 2.0

exhuma added a commit that referenced this issue Aug 24, 2014
This should probably get dropped in favour of subclassing data-elements
(see #15)
@anupamme
Copy link

anupamme commented May 10, 2017

I believe HierarchicalClustering also does not work with custom objects e.g. I have following code:

cl = HierarchicalClustering(salary_head_probables_list, lambda x,y: _find_squared_distance(
        x['poly_center'], y['poly_center']))

where,
salary_head_probables_list is a list of custom object
_find_squared_distance returns of type float

I am getting this error trace:

File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/cluster/method/hierarchical.py", line 79, in __init__
    BaseClusterMethod.__init__(self, sorted(data), distance_function)
TypeError: unorderable types: dict() < dict()

@exhuma
Copy link
Owner Author

exhuma commented May 10, 2017

Thanks for raising this. I have to admit that this package has fallen a bit off my radar since I moved jobs a couple of years ago. I'll try to find some time to work on this. The package certainly could do with some love again... :)

@anupamme
Copy link

Sure, is there a quick hacky fix for this issue? which I can do and get unblocked.

@exhuma
Copy link
Owner Author

exhuma commented May 11, 2017

Sorry for the late reply... I wrote the answer above right before hitting the sack. I'll see if I can find something.

@exhuma
Copy link
Owner Author

exhuma commented May 11, 2017

@anupamme I've looked at your problem, and it's not related to this issue (#15). I've opened a new one (see #23) for your issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants