Skip to content

Exercises for the "Introduction to Data Ingestion and NOSQL" class at EDEM.

Notifications You must be signed in to change notification settings

echiner/edem-mda-data-ingestion-and-nosql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EDEM - Master Big Data and Cloud - Data Ingestion and NOSQL

Exercises for the "Introduction to Data Ingestion and NOSQL" class at EDEM. In this course we learn how to ingest data using Apache NiFi and CDC (Debezium), and how to store it in MongoDB.

Introduction

Throughout these three sessions (Data Ingestion with NiFi, CDC with Debezium, and NoSQL with MongoDB), we will work through a series of interconnected exercises. By the end, we will have completed an end-to-end project that integrates all the components.

We'll begin by using Apache NiFi to ingest data and store it in MongoDB. Next, we'll explore real-time data ingestion from databases using Debezium. Finally, we'll dive deeper into MongoDB, performing queries, setting up indexes, and more.

Initial Setup

We will be using the Docker Compose in this root folder. Let's start by downloading all the services (images):

docker compose pull

Find below the list components which we will be using:

Component Description Docker Service Port Credentials
Apache NiFi Data flow and integration tool nifi 8443 admin / ctsBtRBKHRAx69EqUghvvgEvjnaLjFEB
Kafka Distributed event streaming platform kafka 9092 N/A
Zookeeper Coordination service for Kafka and other distributed systems zookeeper 2181, 2888, and 3888 N/A
Kafka Connect Tool for scalable and reliable data streaming between Kafka and other systems connect 8083 N/A
Kafka UI Web UI to manage Kafka topics and consumer groups redpanda-console 9000 N/A
MySQL Relational database management system mysql 3306 Debezium: mysqluser / mysqlpw
Admin: root / debezium
Adminer (MySQL UI) Web-based database management tool adminer 8090 Ditto
MongoDB NoSQL database system mongo 27017 root / example
MongoDB Express - UI Web-based MongoDB administration tool mongo-express 8081 admin / pass

Here is a view of the architecture we will be using:

Architecture

Exercises

Here is the list of exercises we will follow:

Optional exercises are more advanced, and will be done during the class if we have time. Otherwise they will optional for the trainee to do as homework.

Cluster administration

Here are some useful commands you might need:

# Launch all services for the first time
docker compose up -d
# Shut down and destroy the cluster
docker compose down
# Start a specific service
docker compose start <SERVICE>
# Stop a specific service
docker compose stop <SERVICE>
# List the running services
docker ps

Authors

This course and exercises were created by:

About

Exercises for the "Introduction to Data Ingestion and NOSQL" class at EDEM.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages