⚠️ Update: This repository will no longer be actively maintained. Please check the Ververica fork.
See the slides for more context.
To keep things simple, this demo uses a Docker Compose setup that makes it easier to bundle up all the services you need:
docker-compose build
docker-compose up -d
docker-compose ps
You should be able to access the Flink Web UI (http://localhost:8081), as well as Superset (http://localhost:8088).
What are people asking more frequently about in the Flink User Mailing List? How can you make sense of such a huge amount of random text?
The model in this demo was trained using a popular topic modeling algorithm called LDA and Gensim, a Python library with a good implementation of the algorithm. The trained model knows to some extent what combination of words are associated with certain topics, and can just be passed as a dependency to PyFlink.
Don't trust the model. 👹
docker-compose exec jobmanager ./bin/flink run -py /opt/pyflink-nlp/pipeline.py -d
Once you get the Job has been submitted with JobID <JobId>
green light, you can check and monitor its execution using the Flink WebUI:
To visualize the results, navigate to (http://localhost:8088) and log into Superset using:
username: admin
password: superset
There should be a default dashboard named "Flink User Mailing List" listed under Dashboards
:
And that's it!
For the latest updates on PyFlink, follow Apache Flink on Twitter.