Skip to content

Commit

Permalink
i #241 Refactor Kaiaulu Architcture (#283)
Browse files Browse the repository at this point in the history
The overall architecture of Kaiaulu has been refactored
from "downloaders/parsers/networks" to an architecture
that reflect the data source or tool interface, i.e.:
source code/git/mail/jira/bugzilla/ etc. The functionality
of the downloaders, parsers and network transformations
are then now in the respective modules.

This reflects the intuition that a user will seek for, for example,
Git functionality, or want to download JIRA issues, instead of
broadly look to browse all parsers in Kaiaulu. A benefit
of this new architecture is that the parsers, downloaders
and network functionality now will no longer grow in size
to every new tool or data source added, nor the associated
unit test files.

For more details and rationale, see the respective issue for
this commit.

Signed-off-by: Carlos Paradis <carlosviansi@gmail.com>
  • Loading branch information
carlosparadis authored Mar 3, 2024
1 parent 7747fb4 commit 94045d1
Show file tree
Hide file tree
Showing 70 changed files with 3,266 additions and 3,211 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ __kaiaulu 0.0.0.9700 (in development)__

### NEW FEATURES

* Kaiaulu architecture has been refactored. Instead of using a parser, download, network module structure, Kaiaulu now uses a combination of data type and tool structure. In that manner, various parser functions of download,R, parser.R, and network.R now are separated in git.R, jira.R, git.R, etc. When only small functionality of a tool is required, functions are grouped based on the data type they are associated to, for example, src.R. Kaiaulu API documentation has been updated accordingly. Functions signature and behavior remain the same: The only modification was the new placement of functions into files. For further rationale and changes, see the issue for more details. [#241](https://github.com/sailuh/kaiaulu/issues/241)
* Temporal bipartite projections are now weighted. The temporal projection can be parameterized by `weight_scheme_cum_temporal()` `weight_scheme_pairwise_cum_temporal()` when all time lag edges are used, or the existing weight schemes can also be used when using a single lag. The all lag weight schemes reproduce the same behavior as Codeface's paper. See the issue for details. [#229](https://github.com/sailuh/kaiaulu/issues/229)
* The `make_jira_issue()` and `make_jira_issue_tracker()` have been added, alongside examples and unit tests for `parse_jira()`. [#228](https://github.com/sailuh/kaiaulu/issues/228)
* We can now generate fake mailing lists `make_mbox_reply`, and `make_mbox_mailing_list` for unit testing and tool comparison [#238](https://github.com/sailuh/kaiaulu/issues/238)
Expand Down
712 changes: 712 additions & 0 deletions R/bugzilla.R

Large diffs are not rendered by default.

451 changes: 0 additions & 451 deletions R/download.R

This file was deleted.

136 changes: 136 additions & 0 deletions R/dv8.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,142 @@
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

#' Transform parsed dependencies into a structural dsm.json file.
#'
#' Converts table of dependencies from \code{\link{parse_dependencies}} into an *-sdsm.json.
#' In the sdsm.json, the Variables are all files/methods or any variables under analysis
#' (rows/columns in dependency matrix) and the Cells (matrix cell) contain all the relations of
#' variable (src & dest) pairs.
#'
#' @param project_dependencies A parsed depends project by \code{\link{parse_dependencies}}.
#' @param sdsmj_path the path to save the structural dsm (*-sdsm.json).
#' @param is_sorted whether to sort the variables (filenames) in the sdsm.json file (optional).
#' @export
#' @family edgelists
#' @family dv8
#' @seealso \code{\link{parse_dependencies}} to get a table of parsed dependencies needed as input into \code{\link{transform_dependencies_to_sdsmj}},
#' \code{\link{transform_gitlog_to_hdsmj}} to perform a similar transformation into a *-dsm.json using a gitlog,
#' \code{\link{transform_temporal_gitlog_to_adsmj}} to perform a similar transformation into a *-dsm.json using a temporal gitlog,
#' \code{\link{graph_to_dsmj}} to generate a *-dsm.json file.
transform_dependencies_to_sdsmj <- function(project_dependencies, sdsmj_path, is_sorted=FALSE){
# Make copy of table to do changes
project_depends <- copy(project_dependencies)

# Convert table to long form
project_depends[["edgelist"]] <- melt(project_depends[["edgelist"]],id.vars <- c("src_filepath","dest_filepath"), variable.name = "label")

setnames(x=project_depends[["nodes"]], old = c("filepath"), new = c("name"))

setnames(x=project_depends[["edgelist"]], old = c("src_filepath","dest_filepath", "value"),
new = c("from","to", "weight"))

# Put the weight column in front of the label column
setcolorder(project_depends[["edgelist"]], c("from", "to", "weight", "label"))

# This is a directed graph, so no duplication of edges
graph_to_dsmj(project_depends, sdsmj_path, dsmj_name="sdsm", is_directed=TRUE, is_sorted)
}

#' Transform parsed git repo into a history dsm.json file.
#'
#' Converts a gitlog table into an *-hdsm.json.
#' In the hdsm.json, the Variables are all files/methods or any variables under analysis
#' (rows/columns in dependency matrix) and the Cells (matrix cell) contain all the relations of
#' variable (src & dest) pairs. The Co-change is the number of times the src & dest were committed together.
#' Note that the co-change between a file and its renamed variant will not be considered
#' using this function, so those cells won't appear in the final *-hdsm.json.
#'
#' @param project_git A parsed git project by \code{\link{parse_gitlog}}.
#' @param hdsmj_path the path to save the history dsm (*-hdsm.json).
#' @param is_sorted whether to sort the variables (filenames) in the hdsm.json file (optional).
#' @export
#' @family edgelists
#' @family dv8
#' @seealso \code{\link{parse_gitlog}} to get a table of a parsed git project needed as input into \code{\link{transform_gitlog_to_hdsmj}},
#' \code{\link{transform_temporal_gitlog_to_adsmj}} to perform a similar transformation into a *-dsm.json using a temporal gitlog,
#' \code{\link{transform_dependencies_to_sdsmj}} to perform a similar transformation into a *-dsm.json using dependencies from Depends,
#' \code{\link{graph_to_dsmj}} to generate a *-dsm.json file.
transform_gitlog_to_hdsmj <- function(project_git, hdsmj_path, is_sorted=FALSE){
# Call preliminary functions to get graph and cochange for the files
git_bipartite <- transform_gitlog_to_bipartite_network(project_git, mode ="commit-file")
cochange_table <- bipartite_graph_projection(git_bipartite, mode = FALSE,
weight_scheme_function = weight_scheme_count_deleted_nodes)

# Add label column with Cochange value
cochange_table[["edgelist"]][["label"]] <- "Cochange"

# This is an undirected graph, so there is duplication of edges
graph_to_dsmj(cochange_table, hdsmj_path, dsmj_name="hdsm", is_directed=FALSE, is_sorted)
}

#' Transform parsed git repo into an author dsm.json file.
#'
#' Converts a temporal gitlog table into an *-adsm.json.
#' In the adsm.json, the Variables are all the authors under analysis
#' (rows/columns in dependency matrix) and the Cells (matrix cell) contain all the relations of
#' variable (src & dest) pairs. The Collaborate value is the number of times the src author and dest author changed the same file.
#'
#' @param project_git A parsed git project by \code{\link{parse_gitlog}}.
#' @param adsmj_path the path to save the author dsm (*-adsm.json).
#' @param is_sorted whether to sort the variables (filenames) in the adsm.json file (optional).
#' @export
#' @family edgelists
#' @family dv8
#' @seealso \code{\link{parse_gitlog}} to get a table of a parsed git project needed as input into \code{\link{transform_gitlog_to_hdsmj}},
#' \code{\link{transform_gitlog_to_hdsmj}} to perform a similar transformation into a *-dsm.json using a gitlog,
#' \code{\link{transform_dependencies_to_sdsmj}} to perform a similar transformation into a *-dsm.json using dependencies from Depends,
#' \code{\link{graph_to_dsmj}} to generate a *-dsm.json file.
transform_temporal_gitlog_to_adsmj <- function(project_git, adsmj_path, is_sorted=FALSE){
# Call preliminary functions to get graph and collaborators for the files
author_table <- transform_gitlog_to_temporal_network(project_git, mode=c("author"))

# Add label column with Collaborate value
author_table[["edgelist"]][["label"]] <- "Collaborate"

# This is a directed graph, so no duplication of edges
graph_to_dsmj(author_table, adsmj_path, dsmj_name="adsm", is_directed=TRUE, is_sorted)
}

#' Transform parsed git repo into an edgelist
#'
#' @param project_git A parsed git project by \code{\link{parse_gitlog}}.
#' @param mode The network of interest: author-entity, committer-entity, commit-entity, author-committer
#' @export
#' @family edgelists
transform_gitlog_to_bipartite_network <- function(project_git, mode = c("author-file","committer-file","commit-file",'author-committer')){
author_name_email <- author_datetimetz <- commit_hash <- committer_name_email <- committer_datetimetz <- lines_added <- lines_removed <- NULL # due to NSE notes in R CMD check
# Check user did not specify a mode that does not exist
mode <- match.arg(mode)
# Select and rename relevant columns. Key = commit_hash.
project_git <- project_git[,.(author=author_name_email,
author_date=author_datetimetz,
commit_hash=commit_hash,
committer=committer_name_email,
committer_date = committer_datetimetz,
file = file_pathname,
added = lines_added,
removed = lines_removed)]
if(mode == "author-file"){
git_graph <- model_directed_graph(project_git[,.(from=author,to=file)],
is_bipartite=TRUE,
color=c("black","#f4dbb5"))
}else if(mode == "committer-file"){
git_graph <- model_directed_graph(project_git[,.(from=committer,to=file)],
is_bipartite=TRUE,
color=c("#bed7be","#f4dbb5"))
}else if(mode == "commit-file"){
git_graph <- model_directed_graph(project_git[,.(from=commit_hash,to=file)],
is_bipartite=TRUE,
color=c("#afe569","#f4dbb5"))
}else if(mode == "author-committer"){
git_graph <- model_directed_graph(project_git[,.(from=author,to=committer)],
is_bipartite=TRUE,
color=c("black","#bed7be"))
}
return(git_graph)

}

#' Transforms a gitlog table to a historical DSM JSON file.
#'
#' Converts a gitlog table into an *-hdsm.json.
Expand Down
Loading

0 comments on commit 94045d1

Please sign in to comment.