Skip to content

Demonstration of the salt hash join technique with Apache Spark

Notifications You must be signed in to change notification settings

lrmendess/spark-salt-hash-join-technique

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Salt hash join technique with Apache Spark

A common challenge in distributed processing is data skew, which occurs in Spark when data is unevenly distributed across the partitions of a DataFrame and, consequently, among the cluster's workers.

Unbalanced partitions usually lead to performance degradation and memory leaks, potentially making it impossible to execute a simple but poorly optimized job.

In this project, I demonstrate how to mitigate this issue in join operations. Access the salt_hash_join.ipynb file to check the step-by-step explanation of the implementation and demonstration of results.

About

Demonstration of the salt hash join technique with Apache Spark

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published