Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ML - online and offline modes #11

Open
j7zAhU opened this issue Sep 25, 2022 · 2 comments
Open

ML - online and offline modes #11

j7zAhU opened this issue Sep 25, 2022 · 2 comments
Labels
question Further information is requested

Comments

@j7zAhU
Copy link

j7zAhU commented Sep 25, 2022

Hello,

I have been looking into MN to see whether it is appropriate to my use case.

I have microsecond log data which will be used as an input to a ML classifier. I would like to use the same code when batch processing historical data as I do when the classifier is running live. The event stream system in use is proprietary.

Is MN suitable? Many thanks :)

@MainRo
Copy link
Member

MainRo commented Sep 26, 2022

Yes, one of the main goals of Maki-Nage is to mutualize as much code as possible for stream and batch processing (and this is how we use it). You may have seen that for now, the Maki-Nage package focuses mainly on the streaming use-case, and more precisely on Kafka. However, the connector API can be used to plug virtually any source of data.

That being said, Maki-Nage is still in an early stage and you should be aware of this before using it in production use-case:

  • The whole code is written in python, you should consider using pypy to get the best performances. Good vs bad performance is really dependent on your context and expectations, so there is no clear answer for this part.
  • We still regularly fix bugs in error management, and we did not implement all we want in that aspect. Debugging issues can be cumbersome.

However, I obviously encourage you to give it a try and see if it may fit your needs. We are interested in any feedback. We typically use it as Kafka micro-services and Kubeflow pipelines components.

Also, if you are ready to use the foundation of maki-nage, you can write your own application/library directly with rxsci. The advantage of this is that for batch processing, you can parallelize your processing via ray (see rxray). The aim is to integrate ray in a seamless way into Maki-Nage but we are still far from it.

If you need an already mature solution, then apache beam is undoubtedly a solution to consider.

@MainRo MainRo added the question Further information is requested label Sep 26, 2022
@j7zAhU
Copy link
Author

j7zAhU commented Sep 28, 2022

Thank you kindly. I will investigate these suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants