Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI extension for automated collaboration analyses #303

Open
wants to merge 13 commits into
base: master
Choose a base branch
from
Open
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@ __kaiaulu 0.0.0.9700 (in development)__

### MINOR IMPROVEMENTS

* The exec scripts now offer interfaces for git entity analysis and developer collaboration network construction. [#302](https://github.com/sailuh/kaiaulu/issues/302)
* The line metrics notebook now provides further guidance on adjusting the snapshot and filtering.
* The R File and R Function parser can now properly parse R folders which contain folders within (not following R package structure). Both `.r` and `.R` files are also now captured (previously only one of the two were specified, but R accepts both). [#235](https://github.com/sailuh/kaiaulu/issues/235)
* Refactor GoF Notebook in Graph GoF and Text GoF Notebooks [#224](https://github.com/sailuh/kaiaulu/issues/224)
Expand Down
4 changes: 2 additions & 2 deletions R/graph.R
Original file line number Diff line number Diff line change
Expand Up @@ -434,12 +434,12 @@ temporal_graph_projection <- function(graph,mode,weight_scheme_function = NULL,t
# use the weight_scheme_cum_temporal().

combinations <- merge(combinations,edgelist,all.x=TRUE,by.x = "from_edgeid", by.y="edgeid",
sorted = FALSE)
sort = FALSE)
setnames(combinations,
old=c("from","weight","datetimetz"),
new=c("from_projection","from_weight","from_datetimetz"))
combinations <- merge(combinations,edgelist,all.x=TRUE,by.x = "to_edgeid", by.y="edgeid",
sorted = FALSE)
sort = FALSE)
setnames(combinations,
old=c("from","weight","datetimetz"),
new=c("to_projection","to_weight","to_datetimetz"))
Expand Down
171 changes: 171 additions & 0 deletions conf/kaiaulu_analysis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
# -*- yaml -*-
# https://github.com/sailuh/kaiaulu
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved. This file is offered as-is,
# without any warranty.

# Project Configuration File #
#
# To perform analysis on open source projects, you need to manually
# collect some information from the project's website. As there is
# no standardized website format, this file serves to distill
# important data source information so it can be reused by others
# and understood by Kaiaulu.
#
# Please check https://github.com/sailuh/kaiaulu/tree/master/conf to
# see if a project configuration file already exists. Otherwise, we
# would appreciate if you share your curated file with us by sending a
# Pull Request: https://github.com/sailuh/kaiaulu/pulls
#
# Note, you do NOT need to specify this entire file to conduct analysis.
# Each R Notebook uses a different portion of this file. To know what
# information is used, see the project configuration file section at
# the start of each R Notebook.
#
# Please comment unused parameters instead of deleting them for clarity.
# If you have questions, please open a discussion:
# https://github.com/sailuh/kaiaulu/discussions

project:
website: http://itm0.shidler.hawaii.edu/kaiaulu
openhub: https://www.openhub.net/p/kaiaulu

version_control:
# Where is the git log located locally?
# This is the path to the .git of the project repository you are analyzing.
# The .git is hidden, so you can see it using `ls -a`
log: ../../rawdata/git_repo/kaiaulu/.git
# From where the git log was downloaded?
log_url: https://github.com/sailuh/kaiaulu
# List of branches used for analysis
branch:
- master

# mailing_list:
# Where is the mbox located locally?
#mbox: ../../rawdata/mbox/geronimo-dev.mbox
# What is the domain of the chosen mailing list archive?
#domain: http://mail-archives.apache.org/mod_mbox
# Which lists of the domain will be used?
#list_key:
# - geronimo-dev

# issue_tracker:
# jira:
# # Obtained from the project's JIRA URL
# domain: https://sailuh.atlassian.net
# project_key: SAILUH
# # Download using `download_jira_data.Rmd`
# issues: ../../rawdata/issue_tracker/kaiaulu/issues/
# issue_comments: ../../rawdata/issue_tracker/kaiaulu/issue_comments/
# github:
# # Obtained from the project's GitHub URL
# owner: sailuh
# repo: kaiaulu
# # Download using `download_github_comments.Rmd`
# replies: ../../rawdata/github/kaiaulu

#vulnerabilities:
# Folder path with nvd cve feeds (e.g. nvdcve-1.1-2018.json)
# Download at: https://nvd.nist.gov/vuln/data-feeds
#nvd_feed: rawdata/nvdfeed

# Commit message CVE or Issue Regular Expression (regex)
# See project's commit message for examples to create the regex
# commit_message_id_regex:
# issue_id: \#[0-9]+
# #cve_id: ?

filter:
keep_filepaths_ending_with:
- R
remove_filepaths_containing:
- test

# Third Party Tools Configuration #
#
# See Kaiaulu's README.md for details on how to setup these tools.
tool:
# Depends allow to parse file-file static dependencies.
# depends:
# accepts one language at a time: cpp, java, ruby, python, pom
# You can obtain this information on OpenHub or the project GiHub page right pane.
# code_language: java
# Specify which types of Dependencies to keep - see the Depends tool README.md for details.
# keep_dependencies_type:
# - Cast
# - Call
# - Import
# - Return
# - Set
# - Use
# - Implement
# - ImplLink
# - Extend
# - Create
# - Throw
# - Parameter
# - Contain
# Uctags allows finer file-file dependency parsing (e.g. functions, classes, structs)
uctags:
# See https://github.com/sailuh/kaiaulu/wiki/Universal-Ctags for details
# What types of file-file dependencies should be considered? If all
# dependencies are specified, Kaiaulu will use all of them if available.
keep_lines_type:
c:
- f # function definition
cpp:
- c # classes
- f # function definition
java:
- c # classes
- m # methods
python:
- c # classes
- f # functions
r:
- f # functions

# Analysis Configuration #
analysis:
# You can specify the intervals in 2 ways: window, range or enumeration
window:
# If using gitlog, use start_commit and end_commit. Timestamp is inferred from gitlog
#start_commit: 224a729f44f554af311ca52cf01b105ded87499b
#end_commit: 74cd4d4835a02e01e310476c6776192ad0d97173
# Use datetime only if no gitlog is used in the analysis.
# start_datetime: 2020-05-23 15:12:20
# end_datetime: 2023-12-08 11:50:21
# You can provide a set of dates to create time windows between these
# dates for evolutionary analyses.
ranges:
- 2020-05-23 15:12:20
- 2020-06-25 11:47:58
- 2020-08-18 08:09:27
- 2021-05-30 06:34:04
- 2021-09-05 10:57:50
- 2021-10-02 09:10:34
- 2022-11-14 22:34:57
- 2023-02-13 03:44:28
- 2023-06-09 09:29:01
- 2023-09-01 09:05:27
- 2023-12-08 11:50:21
# Instead of manually providing dates, you can also generate
# time windows automatically by specifying a desired window size in
# days. Note that explicit ranges will be preferred and used by default
# if provided.
#size_days: 365
# enumeration:
# If using gitlog, specify the commits
# commit:
# - 9eae9e96f15e1f216162810cef4271a439a74223
# - f1d2d568776b3708dd6a3077376e2331f9268b04
# - c33a2ce74c84f0d435bfa2dd8953d132ebf7a77a
# Use datetime only if no gitlog is used in the analysis. Timestamp is inferred from gitlog
# datetime:
# - 2013-05-01 00:00:00
# - 2013-08-01 00:00:00
# - 2013-11-01 00:00:00
# You can pass additional parameters for the CLI
143 changes: 143 additions & 0 deletions conf/kaiaulu_cli.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,143 @@
# # -*- yaml -*-
# https://github.com/sailuh/kaiaulu
#
# Copying and distribution of this file, with or without modification,
# are permitted in any medium without royalty provided the copyright
# notice and this notice are preserved. This file is offered as-is,
# without any warranty.

# CLI Configuration File #
#
# To perform analysis with kaiaulu's CLI, you need to specify several
# configuration options which are otherwise passed in function calls
# directly. This file serves to keep track of these parameters and make
# their choice available to others.
#
# Note that this configuration file is an extension to the standard
# project configuration file <project>.yml.
# Please check https://github.com/sailuh/kaiaulu/tree/master/conf to
# see if a project configuration file already exists. Otherwise, we
# would appreciate if you share your curated file with us by sending a
# Pull Request: https://github.com/sailuh/kaiaulu/pulls
#
# Not all of these parameters might be relevant for your analysis.
#
# Please comment unused parameters instead of deleting them for clarity.
# If you have questions, please open a discussion:
# https://github.com/sailuh/kaiaulu/discussions


# Options for the git CLI.
git:
# File-based analysis mode.
file:
# Identity matching.
identity:
# Should identities be matched when parsing the gitlog?
match: TRUE
# Columns in which identities should be matched.
columns:
- author_name_email
- committer_name_email
# Should identities be matched based on names only or also based
# on e-mail-addresses?
names_only: FALSE
# Apply file filters from the project configuration file.
filter: TRUE
# Entity-based analysis mode.
entity:
# Identity matching.
identity:
# Should identities be matched when parsing the gitlog?
match: TRUE
# Columns in which identities should be matched.
columns:
- author_name_email
- committer_name_email
# Should identities be matched based on names only or also based
# on e-mail-addresses?
names_only: FALSE
# Apply file filters from the project configuration file.
filter: TRUE
# When splitting the time windows, should the interval boundaries
# be included or excluded?
window:
# Parameters for evolutionary analyses spanning multiple time windows
# You can choose to split the git log by either author or committer
# time.
time: committer_datetimetz
start_include: TRUE
end_include: FALSE


# Options for the graph CLI.
graph:
# Bipartite networks.
bipartite:
# File-based analysis mode.
file:
# When creating bipartite networks, you can choose between different
# combinations of authors, commits and files to connect.
# Make sure to prepare and pass a suitable parsed git log to the CLI.
# Bipartite file network options: author-file, commit-file
network_type: author-file
# When creating a bipartite projection, you can choose whether to
# apply it to the first or second node.
mode: TRUE # TRUE: first node
# Networks can be directed or undirected.
directed: TRUE
# The weight scheme will determine how the edge weights between nodes
# are calculated.
# Options: weight_scheme_count_deleted_nodes, weight_scheme_sum_edges,
# weight_scheme_cum_temporal, weight_scheme_pairwise_cum_temporal
weight_scheme: weight_scheme_sum_edges
# Entity-based analysis mode.
entity:
# When creating bipartite networks, you can choose between different
# combinations of authors, committers, commits and entities to connect.
# Make sure to prepare and pass a suitable parsed git log to the CLI.
# Entity network options: author-entity, committer-entity,
# commit-entity, author-committer
network_type: author-entity
# When creating a bipartite projection, you can choose whether to
# apply it to the first or second node.
mode: TRUE # TRUE: first node
# Networks can be directed or undirected.
directed: TRUE
# The weight scheme will determine how the edge weights between nodes
# are calculated.
# Options: weight_scheme_count_deleted_nodes, weight_scheme_sum_edges,
# weight_scheme_cum_temporal, weight_scheme_pairwise_cum_temporal
weight_scheme: weight_scheme_sum_edges
# Temporal networks.
temporal:
# File-based analysis mode.
file:
# You can choose between author or committer collaboration.
mode: author
# Networks can be directed or undirected.
directed: TRUE
# You may consider only the last or all preceding developers to
# calculate the temporal network's edge weights.
# Options: one_lag, all_lag
lag: all_lag
# The weight scheme will determine how the edge weights between
# nodes are calculated.
# Options: weight_scheme_count_deleted_nodes, weight_scheme_sum_edges,
# weight_scheme_cum_temporal, weight_scheme_pairwise_cum_temporal
weight_scheme: weight_scheme_cum_temporal
# Entity-based analysis mode.
entity:
# You can choose between author or committer collaboration.
mode: author
# Networks can be directed or undirected.
directed: TRUE
# You may consider only the last or all preceding developers to
# calculate the temporal network's edge weights.
# Options: one_lag, all_lag
lag: all_lag
# The weight scheme will determine how the edge weights between
# nodes are calculated.
# Options: weight_scheme_count_deleted_nodes, weight_scheme_sum_edges,
# weight_scheme_cum_temporal, weight_scheme_pairwise_cum_temporal
weight_scheme: weight_scheme_cum_temporal
Loading
Loading