diff --git a/workshops/python-workshop-1/plan/day3/5 ERP Analysis with Pandas and Seaborn2.ipynb b/workshops/python-workshop-1/plan/day3/5 ERP Analysis with Pandas and Seaborn2.ipynb new file mode 100644 index 0000000..2e9c74b --- /dev/null +++ b/workshops/python-workshop-1/plan/day3/5 ERP Analysis with Pandas and Seaborn2.ipynb @@ -0,0 +1,2943 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# pip install --upgrade xarray seaborn pandas numpy requests tqdm" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "import xarray as xr\n", + "import matplotlib.pyplot as plt \n", + "import numpy as np\n", + "import pandas as pd\n", + "import seaborn as sns" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Download the dataset" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stderr", + "output_type": "stream", + "text": [ + "Downloading data/steinmetz_2016-12-14_Cori.nc: 100%|██████████| 28.2M/28.2M [00:08<00:00, 3.41MB/s]\n" + ] + } + ], + "source": [ + "from pathlib import Path\n", + "import requests\n", + "from tqdm import tqdm\n", + "\n", + "def download_from_sciebo(public_url, to_filename, is_file=True):\n", + " \"\"\"\n", + " Downloads a file or folder from a shared URL on Sciebo.\n", + " \"\"\"\n", + " # Create the folder if a longer path was described\n", + " path = Path(to_filename)\n", + " if len(path.parts) > 1:\n", + " Path(to_filename).parent.mkdir(parents=True, exist_ok=True)\n", + "\n", + " r = requests.get(public_url + \"/download\", stream=True)\n", + "\n", + " if 'Content-Length' in r.headers and is_file:\n", + " total_size = int(r.headers['Content-Length'])\n", + " progress_bar = tqdm(desc=f\"Downloading {to_filename}\", unit='B', unit_scale=True, total=total_size)\n", + " else:\n", + " progress_bar = None\n", + "\n", + " with open(to_filename, 'wb') as f:\n", + " for chunk in r.iter_content(chunk_size=8192):\n", + " f.write(chunk)\n", + " if progress_bar:\n", + " progress_bar.update(len(chunk))\n", + "\n", + " if progress_bar:\n", + " progress_bar.close()\n", + "\n", + "download_from_sciebo('https://uni-bonn.sciebo.de/s/JFeueaaWCTVhTZh', 'data/steinmetz_2016-12-14_Cori.nc')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# ERP Analysis With Pandas And Seaborn" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "\n", + "## Overview\n", + "\n", + "We will continue to use [Steinmetz et al, 2019 in Nature](https://www.nature.com/articles/s41586-019-1787-x) dataset. The experiment involved a mouse being presented with two gradients of varying intensities. The mouse's task was to adjust a wheel to center the brighter gradient on the screen. Simultaneously, Local Field Potential (LFP) measurements were recorded across various brain areas. These measurements were taken 250 times in 2.5 seconds, with data collected at 0.01-second intervals. \n", + "\n", + "\n", + "**Analysis goals**\n", + "\n", + "In these exercises, our primary objective is to analyze and visualize Local Field Potential (LFP) data collected from distinct brain regions separately. Through this analysis, we aim to:\n", + " - compute trial statistics on LFP amplitudes (e.g. mean, min, max)\n", + " - compare these statistics between different brain areas\n", + " \n", + "\n", + "**Learning goals**\n", + "\n", + "In this notebook, we'll focus on learning Seaborn's:\n", + " - `sns.catplot()` funciton for categorical plots\n", + " - `sns.lineplot()` function for plotting time series models\n", + " - `sns.relplot()` for making faceted rows and columns of data of figures effectively using relplot and\n", + " - `sns.heatmap()` for using colors to compare trends." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "---\n", + "## Extracting Data from XArray Datasets into Tidy DataFrames\n", + "### Load Dataset\n", + "\n", + "In this section, we'll work with a dataset from a single session recording of Cori the mouse ('steinmetz_2016-12-14_Cori.nc'). \n", + "\n", + "Our primary objective is to read this data and convert it into a Pandas dataframe, which will serve as the foundation for the subsequent exercises.\n", + "\n", + "**Load dataset and convert to Pandas dataframe:**\n", + "\n", + "| Method/Code | Description |\n", + "|--------------------------------------------------------|-------------------------------------------------------------------------------|\n", + "| `dset = xr.load_dataset(\"path/to/file/like/this.nc\")` | Loads the dataset from the specified file path using xarray (`xr`). |\n", + "| `df = dset['column1'].to_dataframe()` | Extracts the 'column1' data variable from the dataset and converts it into a Pandas DataFrame (`df`). |\n", + "| `df.reset_index()` | Resets the index of the 'df' DataFrame to create a default integer index. |\n", + "| `dset['column1'].to_dataframe().reset_index()` | All of it, together! |\n", + "| `dset[['column1', 'column2']].to_dataframe().reset_index()` | Extracts column1 and column2, converts to dataframe, and resets index |\n", + "| `df.catplot(data=df, x='categorical_column_1', y='continuous_column', kind='bar'/'count'/'box'), col='categorical_column_2` | Makes categorical plots of specified kind split into columns based on categories in categorical_column_2 |" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Exercises**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make a variable called `dset` by calling by Xarray's `xr.load_dataset()` function on the 'steinmetz_2016-12-14_Cori.nc' session file. Confirm that the \"lfp\" data variable is there." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
<xarray.Dataset>\n", + "Dimensions: (trial: 364, time: 250, cell: 734,\n", + " waveform_component: 3, sample: 82, probe: 384,\n", + " brain_area_lfp: 7, spike_id: 2446173)\n", + "Coordinates:\n", + " * trial (trial) int32 1 2 3 4 5 6 7 ... 359 360 361 362 363 364\n", + " * time (time) float64 0.01 0.02 0.03 0.04 ... 2.48 2.49 2.5\n", + " * cell (cell) int32 1 2 3 4 5 6 7 ... 729 730 731 732 733 734\n", + " * waveform_component (waveform_component) int32 1 2 3\n", + " * probe (probe) int32 1 2 3 4 5 6 7 ... 379 380 381 382 383 384\n", + " * brain_area_lfp (brain_area_lfp) object 'ACA' 'LS' ... 'SUB' 'VISp'\n", + " * spike_id (spike_id) int32 1 2 3 4 ... 2446171 2446172 2446173\n", + "Dimensions without coordinates: sample\n", + "Data variables: (12/31)\n", + " contrast_left (trial) int8 100 0 100 0 50 0 0 ... 0 100 50 50 0 25 100\n", + " contrast_right (trial) int8 0 50 50 0 100 0 0 ... 25 100 25 25 50 0 100\n", + " gocue (trial) float64 1.027 0.8744 0.8252 ... nan nan nan\n", + " stim_onset (trial) float64 0.5 0.5 0.5 0.5 0.5 ... 0.5 0.5 0.5 0.5\n", + " feedback_type (trial) float64 1.0 1.0 1.0 1.0 -1.0 ... nan nan nan nan\n", + " feedback_time (trial) float64 1.187 1.438 0.986 2.296 ... nan nan nan\n", + " ... ...\n", + " waveform_w (cell, sample, waveform_component) float32 0.0 ... -0...\n", + " waveform_u (cell, waveform_component, probe) float32 0.0 ... 0.0\n", + " lfp (brain_area_lfp, trial, time) float64 -2.851 ... 5.571\n", + " spike_time (spike_id) float32 0.2676 2.308 0.8535 ... 2.189 2.399\n", + " spike_cell (spike_id) uint32 1 1 1 1 1 1 ... 734 734 734 734 734\n", + " spike_trial (spike_id) uint32 21 21 31 37 43 ... 364 364 364 364 364\n", + "Attributes:\n", + " session_date: 2016-12-14\n", + " mouse: Cori\n", + " stim_onset: 0.5\n", + " bin_size: 0.01