All the necessary libraries to run the code were already available in Anaconda distribution of Python. This script was written using Python version 3.*.
For developing this project, I was interested in using Kaggle data from its 2019 Machine Learning and Data Science Survey to better understand:
- What is the educational background of those acting in Data Science's fields and which courses do they usually attend?
- Which are the main activities that they perform in their companies? Do machine learning tasks play a big role in their daily activities?
- Which are the most used tools when it comes to programming languages, frameworks and databases?
- How well are they paid for their work in Data Science and what aspects may affect their incomes?
- Notebook - Jupyter Notebook (English and Portuguese versions) with the script developed for answering the questions presented above. Along with the script, there are markdown cells contextualizing the adopted steps.
- Html file - the html file is a version of the notebook with a functional version of table of contents to facilitate the navigation through the file.
- Csv file - a copy of the dataset used in the exploratory analysis. The original one can be found in the Kaggle website through the link in the Licesing section below.
- Licensing - MIT License covering this project.
The main findings of the code answering the proposed questions can be found at the Medium post available here.
Credits must give credit to the Kaggle platform for making the data available. You can find the Licensing for the data, other descriptive information adn the original dataset at the Kaggle Survey's page - link available here.