This repository contains
- installation instructions for a minimal python environment
- Jupyter notebooks to introduce the basic concepts of python
- quizzes to test the understanding of said concepts
It is related to my follow-up repository data-science-tutorials.
- access to a computing environment with installation rights
- programming experience, i.e. familiarity with basic concepts like control flow and data structures
+--Motivation
Installation Instruction
Programming Environments
+--Data Types
Control Flow
Modules
+--package NumPy
+--package SciPy
+--package scikit-learn
+--package matplotlib (Python plotting, object-oriented)
+--package pandas (Python Data Analysis Library)
We use Python 3 via the Anaconda distribution.
-
download the latest (64-bit) Anaconda3-installer from http://continuum.io/download and launch it with
$ bash Anaconda3-1.9.1-Linux-x86_64.sh
-
The installer will then ask ...
- for your agreement on the license agreement
- for a target directory (this is optional; default is
~/anaconda3
, my choice is~/local/share/anaconda3
) - to run the
conda init
script to check/set some paths (obviously also optional) and if necessary add them to.bashrc
(You may already have this, e.g. via .profile)
-
Next, we'll add three channels to the default one (in this order) :
$ conda config --add channels conda-forge $ conda config --add channels defaults $ conda config --add channels r $ conda config --add channels bioconda
bioconda
is for bioinformatics (what's your requirement?) and will receive the highest priority.r
is required for bioinformatics and contains moduls for the GNU R programming language. Thedefaults
channel already contains plenty of packages (?TODO list?). Finally,conda-forge
contains several community-build packages that are not already in thedefault
channel.
The installer is for the full package coming with the anaconda
meta-package. Let's update it with :
$ conda update anaconda
Here's the full package list.
Anaconda is not only the name of the python-distribution, but also the name of its largest meta-package. To maximize compatibility (and minimize maintenance effort), we have the following priorities
- packages from the standard library (see below)
- packages from the anaconda meta-package (?packages marked as "In Installer" [here](https://docs.anaconda.com/anaconda/packages/py3.6_linux-64/ ?)
- packages from conda's
default
channel, likekeras
(but not in the default installer's full list of packages?) - ONLY IF NECESSARY packages from selected additional conda channels
(
r
,bioconda
)
(for reference, when you have to reproduce your environment outside
of anaconda, e.g. in SageMath
)
datetime
csv
marked "In Installer" (I)
- interface
conda
(I)jupyter
(I)
- math
numpy
(I)scipy
(I)statsmodels
(I)
- data analysis
pandas
(I)scikit-learn
sklearn
(I)
- visualization
matplotlib
(I)seaborn
(I)
keras
merely listed here for reference, because they might pop up repeatedly
- pybrain by Jürgen Schmidhuber et.al. (latest release 0.3 dates from 18 Nov 2009)
:
$ conda update conda
$ conda update anaconda
this confuses me (and others), so the current recommendation in anaconda's blog1 for "What 95% of People Want" is :
$ conda update --all
$ conda
Optionally :
$ conda remove anaconda
And if things break :
$ conda clean --all
- (default) python interactive shell
- MYCHOICE ipython shell
web-application for interactive python worksheets
- MYCOICE spyder, a quick introduction by Joey Bernard
- pycharm
- Eclipse with PyDev plugin
- Emacs with ...
Note: Spyder is already available in the standard installation, but if we want/need more advanced profiling, there's
$ conda config --add channels spyder-ide
$ conda install -c spyder-ide spyder-line-profiler
$ conda install -c spyder-ide spyder-memory-profiler
are called modules in python
fast numerical computing, in particular with large arrays and matrices; is part of SciPy, but can also be loaded individually
References:
- Nicolas P. Rougier, From Python to Numpy
(large) scientific computing library, based on NumPy arrays (and including NumPy)
machine learning, built to work well with NumPy and SciPy -- along with the Intel extension for acceleration
Pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive
Matplotlib is a Python 2D plotting library which produces publication quality figures in a variety of hardcopy formats and interactive environments across platforms. Here's a short introduction.
Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow or Theano. It was developed with a focus on enabling fast experimentation. Being able to go from idea to result with the least possible delay is key to doing good research.
to simulate quantum systems. Very short introduction by Joey Bernard.
see the complete list at https://docs.python.org/3/library/
This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained.
This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, dict, list, set, and tuple.
import/export of csv-files
This module implements a number of iterator building blocks inspired by constructs from APL, Haskell, and SML. Each has been recast in a form suitable for Python.
It provides access to the mathematical functions defined by the C standard.
multiprocessing is a package that supports spawning processes using an API similar to the threading module.
This module provides a portable way of using operating system dependent functionality.
This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. It is always available.
This module provides various time-related functions.
This module provides a standard interface to extract, format and print stack traces of Python programs. It exactly mimics the behavior of the Python interpreter when it prints a stack trace.
- Scipy Lecture Notes for Python, NumPy, Matplotlib, Scipy (with exercises)
- Code Challenge
- Python Tutorial at python.org (quite complete and extensive; probably to read selectively)
- basic Python Class by Google
- DataCamp has two Python Courses (for Data Science): Intro and Intermediate
- After Hours Programming has an interactive Python Tutorial
- LearnPython.org has an interactive Python Tutorial
- [https://unsupervisedmethods.com/over-150-of-the-best-machine-learning-nlp-and-python-tutorials-ive-found-ffce2939bd78](Over 150 of the Best Machine Learning, NLP, and Python Tutorials I’ve Found)
- Swaroop, A Byte of Python, CC-BY-SA. (Free pdf and epub download, audience: programming beginners)
- Allen B. Downey, Think Python, 2nd edition, 2015. (Cave: 1st edition uses python2. Free pdf and html download, sample code available on webpage and github, audience: python beginners with programming experience)
- Idris2016
- McKinney2012
- Codecademy's Python track
- Udacity's Intro to Computer Science
- MIT's 6.001 Introduction to Computer Science and Programming in Python by Ana Bell, Eric Grimson, and John Guttag; "for students with little or no programming experience"; available on edX
- MIT's 6.002 Introduction to Computational Thinking and Data Science by Ana Bell, Eric Grimson, and John Guttag; continuation of MIT 6.001; archived on edX
- 15 single-choice questions by TripleByte
- Mega Project List by Karan Goel
- 100 days of algorithms by Tomáš Bouda with github repository
- check the setup part for anaconda & friends (let's say as number 0)
- split every notebook into mandatory/optional part
- for numpy: improve didactical structures. Parts are redundant (e.g. operations), parts dont follow perfect logic order (broadcasting)
- compare to Christin Seifert's minimal tutorial at https://github.com/chseifert/tutorials/blob/master/PythonTutorial/PythonTutorial.ipynb