Skip to content

Commit

Permalink
Merge branch 'master' of github.com:castor-software/rethread
Browse files Browse the repository at this point in the history
  • Loading branch information
ErikNatanael committed Dec 9, 2024
2 parents 29adb5e + 2fa5370 commit 8bdcf05
Show file tree
Hide file tree
Showing 69 changed files with 4,187 additions and 10,033 deletions.
34 changes: 34 additions & 0 deletions code/myriad/loam_paper/dataset/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Dataset

This is the **Myriad People** dataset. All these files have been generated by running `script.py` in the `../mining` folder. Here are their descriptions.

`all_loggedin_contributors.json`: list of all logged-in (i.e. not anonymous) GitHub contributors, with:
- `type`: type of contributor, `User` or `Bot`
- `id`: GitHub username
- `contributions`: list of repositories they contributed to, with:
- `repo_name`: name of repository
- `contributions`: number of contributions that they made to this project

`categories_info.json`: list of categories, with:
- `category`: name of the category
- `repos`: list of names of repositories in that category

`repos_info.json`: list of all repositories for which the GitHub API managed to fetch the data, with:
- `name`: name of the repository
- `category`: category it belongs to
- `exclusivity`: either the name of an artwork if this repository was exclusively used in that artwork, or `null` if it was used in at least two artworks
- `created_at`: creation date of the repository, in the Python `datetime` format
- `total_contributions`: total number of contributions
- `anonymous_contributors`: number of anonymous contributors
- `loggedin_contributors`: number of logged-in contributors

`gh_api_failures.json`: list of repositories for which the GitHub API failed (because they are too big), with `name`, `category` and `exclusivity`, as in `repos_info.json`

`individual_repos` folder: one file per repository, in the format `owner&name.json`, with:
- `repo_name`: name of the repository
- `contributors`: list of contributors, with:
- `type`: type of contributor, `User` or `Bot`
- `id`: GitHub username
- `contributions`: number of contributions that they made to this project

In all the files, repository names attributes are in the format `owner/name`.
Loading

0 comments on commit 8bdcf05

Please sign in to comment.