-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request - allow metadata in processing #14
Comments
I am not 100% certain if I understood your question. The way I understood it, is that you want to be able to cluster more complex objects than simple numeric values. This is already possible if you specify a distance function. The distance function takes two arguments and returns a If you want to run the clustering on objects where you cannot simply do As a more practical example, assume that we have an object encapsulating the values as members Then the code would become:
Does this help you? Does it answer the question? If not, let met know. |
While working on this, I have run into an issue with This seems to be a regression, because I think that worked in the past. So, if you are planning to use KMeans, you will still be blocked :( I will have a look at this as quickly as possible. |
Thank you so much for responding. Unfortunately, I'm asking something different. Allow to explain a bit more. Assume I have a table of data that looks like this:
Of course, I want to cluster based on the data, and so I'd create a list of said data as: The obvious answer is to apply the cluster based on the data itself The ideal situation (for me, at least) would be to pass a list of tuples to the processor, as [(1,34), (2,78), (3,77), (4,35), (5,35), (6, 22)], cluster based on the second item in each tuple (the data) and then have the output grouped by the first item in each tuple (the ID value). For example, as [[1, 4, 5], [2,3],[6]] Then, I can easily refer to the original value (or any other metadata related to that datum.) OR perhaps I'm just looking at this all the wrong way... I am quite the newbie... |
Wait, I think I need to re-read your answer more closely -- perhaps you are answering it for me? I don't intuitively understand the code. I'll try your example and see if that helps me understand... |
OK, yes, this works. I do understand now how to do it. Sorry for the confusion, and thanks! |
No problem. Always glad to help :) |
Thanks for a great library!
I do have one feature request, however. It would be incredibly useful to allow for the addition of metadata that could be passed through during processing.
As a specific use case, I'd like to be able to input data along with an ID field (for instance, as a list of tuples such as [(uid, data), (uid, data), ...] with the resulting clusters referring to the UID and not the data (again, such as: [[uid, uid, uid,], [ uid, uid, uid, uid,] ... ]) which would allow me to easily manipulate, store, and process the objects themselves. As it stands, I get the clusters I want but I can no longer identify them from the original data and so i'm stuck... :-(
I've looked at the core code and unfortunately making such a change and offering as a pull-request is beyond my abilities. I am hoping this might be a (somewhat) straightforward thing to do. And I'd be happy to help in any way I can.
The text was updated successfully, but these errors were encountered: