Lectures S.Yu. Papulin (papulin.study@yandex.ru) Big Data Processing Systems Introduction to Big Data File Systems: Hadoop Distributed File System (HDFS) Cluster Managers: Yet Another Resource Negotiator (YARN) Batch processing: MapReduce Framework Spark RDD Spark DataFrame User-Defined Functions (UDF) in PySpark Coordination: Introduction to Zookeeper Streaming: Apache Storm Spark Streaming Spark Structured Streaming Graphs: Apache Giraph Spark GraphX Containers: Introduction to Docker Spark on Kubernetes Additional topics Spark Machine Learning: Stochastic Gradient Descent Naive Bayes Classifier Recommendation Systems using ALS Linux and Python: Introduction to Linux Kernel Completely Fair Scheduler Virtual File System Network Protocols CPython