Skip to content

Latest commit

 

History

History
20 lines (19 loc) · 828 Bytes

README.md

File metadata and controls

20 lines (19 loc) · 828 Bytes

Notes

Compilation of notes for different technologies

When to use what for data analysis:

  • Python:
    • the data fits the memory
    • there is a library that does what you need in a efficient manner
    • you manage pandas and geopandas very well
  • SQL
    • the data does not fit the memory
    • you plan to use many other tables already present in the database
    • you plan to write very complex queries
  • Scala/Kotlin
    • the data fits the memory
    • you didn't find a library in Python that does what you need in a efficient way
    • you need to write your own algorithm/solutions in an environment that is fast and capable by nature
  • Rust
    • you need to write your own algorithm/solutions that runs very fast and that you can port to Python as a binding
  • Pyspark
    • your data is massive and needs to be spread on different clusters