Skip to content

Commit

Permalink
minor paper updates
Browse files Browse the repository at this point in the history
  • Loading branch information
TomDonoghue committed Feb 21, 2022
1 parent 9d7efce commit 8b7dc7c
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 31 deletions.
48 changes: 21 additions & 27 deletions paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -7,34 +7,28 @@ @article{jupyter_nbgrader_2019
number = {11},
journal = {Journal of Open Source Education},
author = {Jupyter, Project and Blank, Douglas and Bourgin, David and Brown, Alexander and Bussonnier, Matthias and Frederic, Jonathan and Granger, Brian and Griffiths, Thomas and Hamrick, Jessica and Kelley, Kyle and Pacer, M and Page, Logan and Pérez, Fernando and Ragan-Kelley, Benjamin and Suchow, Jordan and Willing, Carol},
month = jan,
year = {2019},
}

@software{chris_holdgraf_2019_2799972,
author = {Chris Holdgraf and
Jan Kleinert and
Elizabeth DuPre and
Mainak Jas and
Alexander Morley and
Matthew Brett and
Matt Craig and
Erik Sundell and
Sam Lau and
Luke and
gaow and
stafforddavidj and
cnydw and
Zachary Sailer and
Tom and
Mathieu Boudreau and
James Mason and
Ariel Rokem},
title = {jupyter/jupyter-book: v0.5},
month = may,
year = 2019,
@software{executable_books_community_2020_4539666,
author = {Executable Books Community},
title = {Jupyter Book},
year = 2020,
publisher = {Zenodo},
version = {v0.5},
doi = {10.5281/zenodo.2799972},
url = {https://doi.org/10.5281/zenodo.2799972}
}
version = {v0.10},
doi = {10.5281/zenodo.4539666},
url = {https://doi.org/10.5281/zenodo.4539666}
}

@article{donoghue_teaching_2020,
title = {Teaching {Creative} and {Practical} {Data} {Science} at {Scale}},
volume = {29},
issn = {1069-1898},
url = {https://www.tandfonline.com/doi/full/10.1080/10691898.2020.1860725},
doi = {10.1080/10691898.2020.1860725},
number = {sup1},
journal = {Journal of Statistics Education},
author = {Donoghue, Thomas and Voytek, Bradley and Ellis, Shannon E.},
year = {2021},
pages = {S27--S39},
}
8 changes: 4 additions & 4 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ bibliography: paper.bib

# Summary

Data Science in Practice is a collection of openly available materials, including tutorials and assignments, for learning how to integrate the many skills of data science. The course materials focus on the day-to-day practicalities of hands-on data science, with a particular emphasis on gaining a working familiarity with real-world applications and gaining a 'data intuition'. This collection of materials was originally developed for a course at UC San Diego, but have been updated and publicly released, and are available at: https://datascienceinpractice.github.io/
Data Science in Practice is a collection of openly available materials, including tutorials and assignments, for learning how to integrate the many skills of data science. The course materials focus on the day-to-day practicalities of hands-on data science, with a particular emphasis on gaining a working familiarity with real-world applications and gaining a 'data intuition'. This collection of materials was originally developed for a course at UC San Diego designed to teach creative and practical data science at scale [@donoghue_teaching_2020]. The materials for this course have been updated and made publicly available, and are hosted at: https://datascienceinpractice.github.io/.

Topics covered in the data science in practice tutorials include:

Expand All @@ -44,19 +44,19 @@ These topics are further explored in available assignments, which cover:
* Collecting web data, applying data protection policies, anonymizing data, and adversarial attacks for deanonymizing data.
* Data analyses, including statistical analyses, applying linear models, and creating visualizations.

The materials are developed in the Python (>= 3.6) programming language, using a standard collection of packages in the scientific Python environment, which can be installed using the Anaconda distribution. All materials are built as Jupyter notebooks, with the assignments being built with the nbgrader extension [@jupyter_nbgrader_2019]. All the materials are hosted online, using the Jupyter Book tool [@chris_holdgraf_2019_2799972], from which all the source notebooks can be downloaded to be run locally.
The materials are developed in the Python (>= 3.6) programming language, using a standard collection of packages in the scientific Python environment, which can be installed using the Anaconda distribution. All materials are built as Jupyter notebooks, with the assignments being built with the nbgrader extension [@jupyter_nbgrader_2019]. All the materials are hosted online, using the Jupyter Book tool [@executable_books_community_2020_4539666], from which all the source notebooks can be downloaded to be run locally.

# Statement of Need

The field of data science has been rapidly expanding, creating a need for accessible and scalable materials. There is high interest for instruction in data science, and a need in both academia and industry for trained and skilled practitioners. Developing such skills requires hands-on experience and expertise. To address this need, the materials here are focused on practical code-based tutorials, and guided assignments that allow users to practice applying the topics and ideas under study.

There are many available resources for topics within and related to data science, including dedicated tutorials for data science tools and software packages. What can still be difficult, for the novice, is learning how to find and navigate through these materials. A key goal of this course and these materials is to offer a curated introduction to the many topics and available tools, and some initial guided work to make sure users can start to engage with the many aspects of data science. Throughout the course materials, there are many links to other resources. The goal is that these materials be a starting place for the potential user, and a launching off point to the many other more specific resources and tutorials available.

Data science is an interdisciplinary field, requiring expertise from across a range of relevant fields - including technical aspects such as software, computation, statistics, mathematics and machine learning, as well as topics such as research design, contextual understanding of data, ethics, and an understanding of the potential impacts. These materials aim to encompass these multiple elements of data science, focusing not only on the technical aspects of doing data science, but also acknowledging and emphasizing the social impacts and responsibilities of practicing data scientists. These materials are part of an emerging field of integrated data science, as compared to some more traditional courses and materials that focus on, for example, more detailed machine learning or computation.
Data science is an interdisciplinary field, requiring expertise from across a range of relevant fields - including technical aspects such as software, computation, statistics, mathematics and machine learning, as well as topics such as research design, contextual understanding of data, ethics, and an understanding of the potential impacts. These materials aim to encompass these multiple elements of data science, focusing not only on the technical aspects of doing data science, but also acknowledging and emphasizing the social impacts and responsibilities of practicing data scientists. These materials are part of an emerging field of integrated data science, as compared to more traditional courses and materials that focus on, for example, detailed machine learning or computation.

# Instructional Design

This set of materials were originally created as core materials for a university course, Data Science in Practice, taught at UC San Diego. This course was first taught in the Spring of 2017 and has about 400 students per iteration. The scale of this course originally prompted the development of standalone materials and assignments, that we are now making more generally available.
This set of materials were originally created as core materials for a university course, Data Science in Practice, taught at UC San Diego [@donoghue_teaching_2020]. This course was first taught in the Spring of 2017 and has about 400 students per iteration. The scale of this course originally prompted the development of standalone materials and assignments, that we are now making more generally available.

The full course is supplemented by lectures and lab sections, and is designed as a project-based course. Students work through the materials and assignments presented here, with the goal of building towards doing realistic data science projects. In these projects, students must find openly available datasets, develop a proposal, and then execute analyses to come to an answer. Students must then contextualize the results as a computational notebook that lists their questions and hypotheses, background, ethical considerations, data sources and reliability, results, and conclusion, intermixed with the code and visualizations used to perform the analyses.

Expand Down

0 comments on commit 8b7dc7c

Please sign in to comment.