This repository contains jupyter-notebooks to accompany the tutorials for our data science lectures. The following topics are covered (each within a separate folder).
- Dataset Visualization (Boston Housing minus the linear regression; also other datasets like Flower, MNIST-digits, 20newsgroups) working/visualizing one dataset (incl. Matplotlib; .describe attribute; box-plot, min-max-normilization; boston housing; linear reg c/o dsP)
- Clustering
- Association Rule Learning (dataset yet to be determined; preferably from scikit learn)
- Regression (linear regression from Boston Housing and Car Prices)
- Bayes Learning (for spam filtering/text classification)
- Classification with Decision Trees (start with small 5-line dataset)
- Neural Networks (use keras.io to build a neural network for MNIST-digit classification) keras (for MNIST class); OPT gensim (for word2vec; pick dataset from tensorflow); then auto-encoder for representatino learning
- OPTIONAL MapReduce
See our python-tutorials on instructions how to set this up on your machine.
- Python (>= 2.7 or >= 3.3)
- NumPy (>= 1.6.1)
- SciPy (>= 0.9)
- scikit-learn (>=0.18.1); documentation, also as pdf with Quick Start and Tutorials
- Matplotlib >= 2.1.1
- Pandas; [documentation] also as pdf