Skip to content

Commit

Permalink
i #276 Removes JirAgileR Parser dependency
Browse files Browse the repository at this point in the history
Parser no longer assumes data from JIRA is obtained via JirAgileR, but instead directly from JIRA. Downloader and JIRA Fake Data Generator replacements to follow. This parser is functional with JSONs manually downloaded at this commit stage, but is incompatible with the Fake Data Generator and Downloader for JIRA present in this commit.

Signed-off-by: Carlos Paradis <carlosviansi@gmail.com>
Signed-off-by: Ian Jaymes Iwata <97856957+ian-lastname@users.noreply.github.com>
Co-authored-by: Carlos Paradis <carlosviansi@gmail.com>
Co-authored-by: Ian Jaymes Iwata <97856957+ian-lastname@users.noreply.github.com>
  • Loading branch information
ian-lastname and carlosparadis authored Apr 14, 2024
1 parent 2bc8d14 commit 6b154d5
Show file tree
Hide file tree
Showing 21 changed files with 226 additions and 70 deletions.
3 changes: 2 additions & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ Authors@R: c(
person('Nicholas', 'Lee', role = c('ctb')),
person('Ruben', 'Jacobo', role = c('ctb')),
person('Waylon', 'Ho', role = c('ctb')),
person('Nicole', 'Hoess', role = c('ctb'))
person('Nicole', 'Hoess', role = c('ctb')),
person('Ian Jaymes', 'Iwata', role= c('ctb'))
)
Maintainer: Carlos Paradis <cvas@hawaii.edu>
License: MPL-2.0 | file LICENSE
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ export(parse_gitlog_entity)
export(parse_gof_patterns)
export(parse_java_code_refactoring_json)
export(parse_jira)
export(parse_jira_latest_date)
export(parse_jira_replies)
export(parse_jira_rss_xml)
export(parse_line_metrics)
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,8 @@ __kaiaulu 0.0.0.9700 (in development)__

### NEW FEATURES

* `parse_jira()` now parses folders containing raw JIRA JSON files without depending on JirAgileR. [#276](https://github.com/sailuh/kaiaulu/issues/276)
* The `parse_jira_latest_date()` has been added. This function returns the file name of the downloaded JIRA JSON containing the latest date for use by `download_jira_issues()` to implement a refresh capability. [#276](https://github.com/sailuh/kaiaulu/issues/276)
* Kaiaulu architecture has been refactored. Instead of using a parser, download, network module structure, Kaiaulu now uses a combination of data type and tool structure. In that manner, various parser functions of download,R, parser.R, and network.R now are separated in git.R, jira.R, git.R, etc. When only small functionality of a tool is required, functions are grouped based on the data type they are associated to, for example, src.R. Kaiaulu API documentation has been updated accordingly. Functions signature and behavior remain the same: The only modification was the new placement of functions into files. For further rationale and changes, see the issue for more details. [#241](https://github.com/sailuh/kaiaulu/issues/241)
* Temporal bipartite projections are now weighted. The temporal projection can be parameterized by `weight_scheme_cum_temporal()` `weight_scheme_pairwise_cum_temporal()` when all time lag edges are used, or the existing weight schemes can also be used when using a single lag. The all lag weight schemes reproduce the same behavior as Codeface's paper. See the issue for details. [#229](https://github.com/sailuh/kaiaulu/issues/229)
* The `make_jira_issue()` and `make_jira_issue_tracker()` have been added, alongside examples and unit tests for `parse_jira()`. [#228](https://github.com/sailuh/kaiaulu/issues/228)
Expand Down
208 changes: 145 additions & 63 deletions R/jira.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,34 @@

############## Parsers ##############

#' Parse Jira issue and comments
#' Parse JIRA Issues and Comments
#'
#' @param json_path path to jira json (issues or issues with comments) obtained using `download_jira_data.Rmd`.
#' Parses JIRA issues without or with comments contained in a folder following a standardized file nomenclature.
#' as obtained from \code{\link{download_jira_issues}}. A named list with two elements (issues, comments) is returned
#' containing the issue table and optionally comments table.
#'
#' The following fields are expected on the raw data:
#'
#' issuekey, issuetype, components, creator, created, description, reporter, status, resolution
#' resolutiondate, assignee, updated, comment, priority, votes, watches, versions, fixVersions, labels
#'
#' which are the default parameters of \code{\link{download_jira_issues}}. If the `comment` field is
#' specified, then the comments table is included.
#'
#' If a field is not present in an issue, then its value will be NA.
#'
#'
#' @param json_folder_path is a folder path containing a set of jira_issues as json files.
#' @return A named list of two named elements ("issues", and "comments"), each containing a data.table.
#' Note the comments element will be empty if the downloaded json only contain issues.
#' @export
#' @family parsers
parse_jira <- function(json_path){
parse_jira <- function(json_folder_path){

json_issue_comments <- jsonlite::read_json(json_path)
file_list <- list.files(json_folder_path)

if (identical(file_list, character(0))){
stop(stringi::stri_c("cannot open the connection"))
}

# Comments list parser. Comments may occur on any json issue.
jira_parse_comment <- function(comment){
Expand All @@ -38,78 +56,142 @@ parse_jira <- function(json_path){
return(parsed_comment)
}

# names(json_issue_comments) => "base_info","ext_info"
# length([["base_info]]) == length([["ext_info]]) == n_issues.
# Choose either and store the total number of issues
n_issues <- length(json_issue_comments[["ext_info"]])
# Issues parser
jira_parse_issues <- function(jira_file){

json_issue_comments <- jsonlite::read_json(jira_file)

n_issues <- length(json_issue_comments[["issues"]])

# Prepare two lists which will contain data.tables for all issues and all comments
# Both tables can share the issue_key, so they can be joined if desired.
all_issues <- list()
all_issues_comments <- list()

for(i in 1:n_issues){

# This is the issue key
issue_key <- json_issue_comments[["issues"]][[i]][["key"]][[1]]

# All other information is contained in "fields"
issue_comment <- json_issue_comments[["issues"]][[i]][["fields"]]

# Parse all relevant *issue* fields
all_issues[[i]] <- data.table(
issue_key = issue_key,

issue_summary = issue_comment[["summary"]][[1]],
issue_parent = issue_comment[["parent"]][["name"]][[1]],
issue_type = issue_comment[["issuetype"]][["name"]][[1]],
issue_status = issue_comment[["status"]][["statusCategory"]][["name"]][[1]],
issue_resolution = issue_comment[["resolution"]][["name"]][[1]],
issue_components = stringi::stri_c(unlist(sapply(issue_comment[["components"]],"[[","name")),collapse = ";"),
issue_description = issue_comment[["description"]][[1]],
issue_priority = issue_comment[["priority"]][["name"]][[1]],
issue_affects_versions = stringi::stri_c(unlist(sapply(issue_comment[["versions"]],"[[","name")),collapse = ";"),
issue_fix_versions = stringi::stri_c(unlist(sapply(issue_comment[["fixVersions"]],"[[","name")),collapse = ";"),
issue_labels = stringi::stri_c(unlist(sapply(issue_comment[["labels"]],"[[",1)),collapse = ";"),
issue_votes = issue_comment[["votes"]][["votes"]][[1]],
issue_watchers = issue_comment[["watches"]][["watchCount"]][[1]],

issue_created_datetimetz = issue_comment[["created"]][[1]],
issue_updated_datetimetz = issue_comment[["updated"]][[1]],
issue_resolution_datetimetz = issue_comment[["resolutiondate"]][[1]],

issue_creator_id = issue_comment[["creator"]][["name"]][[1]],
issue_creator_name = issue_comment[["creator"]][["displayName"]][[1]],
issue_creator_timezone = issue_comment[["creator"]][["timeZone"]][[1]],

issue_assignee_id = issue_comment[["assignee"]][["name"]][[1]],
issue_assignee_name = issue_comment[["assignee"]][["displayName"]][[1]],
issue_assignee_timezone = issue_comment[["assignee"]][["timeZone"]][[1]],

issue_reporter_id = issue_comment[["reporter"]][["name"]][[1]],
issue_reporter_name = issue_comment[["reporter"]][["displayName"]][[1]],
issue_reporter_timezone = issue_comment[["reporter"]][["timeZone"]][[1]]
)
# Comments
# For each issue, comment/comments contain 1 or more comments. Parse them
# in a separate table.
root_of_comments_list <- json_issue_comments[["issues"]][[i]][["fields"]][["comment"]]
# If root_of_comments_list does not exist, then this is an issue only json, skip parsing
if(length(root_of_comments_list) > 0){
comments_list <- json_issue_comments[["issues"]][[i]][["fields"]][["comment"]][["comments"]]
# Even on a json with comments, some issues may not have comments, check if comments exist:
if(length(comments_list) > 0){
# Parse all comments into issue_comments
issue_comments <- rbindlist(lapply(comments_list,
jira_parse_comment))
# Add issue_key column to the start of the table
issue_comments <- cbind(data.table(issue_key=issue_key),issue_comments)
all_issues_comments[[i]] <- issue_comments
}
}
}

# Prepare two lists which will contain data.tables for all issues and all comments
# Both tables can share the issue_key, so they can be joined if desired.
all_issues <- list()
all_issues_comments <- list()

for(i in 1:n_issues){
all_issues <- rbindlist(all_issues,fill=TRUE)
all_issues_comments <- rbindlist(all_issues_comments,fill=TRUE)

# The only use of "base_info" is to obtain the issue_key
issue_key <- json_issue_comments[["base_info"]][[i]][["key"]]
parsed_issues_comments <- list()
parsed_issues_comments[["issues"]] <- all_issues
parsed_issues_comments[["comments"]] <- all_issues_comments

# All other information is contained in "ext_info"
issue_comment <- json_issue_comments[["ext_info"]][[i]]
return(parsed_issues_comments)
}

# Parse all relevant *issue* fields
all_issues[[i]] <- data.table(
issue_key = issue_key,
issues_holder <- list()
comments_holder <- list()

issue_summary = issue_comment[["summary"]][[1]],
issue_type = issue_comment[["issuetype"]][["name"]][[1]],
issue_status = issue_comment[["status"]][["name"]][[1]],
issue_resolution = issue_comment[["resolution"]][["name"]][[1]],
issue_components = stringi::stri_c(unlist(sapply(issue_comment[["components"]],"[[","name")),collapse = ";"),
issue_description = issue_comment[["description"]],
for(filename in file_list){
current_json <- paste0(json_folder_path, "/", filename)
parsed_data <- jira_parse_issues(current_json)
issues_holder <- append(issues_holder, list(parsed_data[["issues"]]))
comments_holder <- append(comments_holder, list(parsed_data[["comments"]]))
}

issue_created_datetimetz = issue_comment[["created"]][[1]],
issue_updated_datetimetz = issue_comment[["updated"]][[1]],
issue_resolution_datetimetz = issue_comment[["resolutiondate"]],
issues_holder <- rbindlist(issues_holder, fill=TRUE)
comments_holder <- rbindlist(comments_holder, fill=TRUE)

issue_creator_id = issue_comment[["creator"]][["name"]][[1]],
issue_creator_name = issue_comment[["creator"]][["displayName"]][[1]],
issue_creator_timezone = issue_comment[["creator"]][["timeZone"]][[1]],
return_info <- list()
return_info[["issues"]] <- issues_holder
return_info[["comments"]] <- comments_holder

issue_assignee_id = issue_comment[["assignee"]][["name"]][[1]],
issue_assignee_name = issue_comment[["assignee"]][["displayName"]][[1]],
issue_assignee_timezone = issue_comment[["assignee"]][["timeZone"]][[1]],
return(return_info)
}
#' Parse JIRA current issue
#'
#' Returns the file containing the most current issue in the specified folder.
#'
#' The folder assumes the following convention: "(PROJECTKEY)_issues_(uniextimestamp_lowerbound)_(unixtimestamp_upperbound).json"
#' or ""(PROJECTKEY)_issue_comments_(uniextimestamp_lowerbound)_(unixtimestamp_upperbound).json"
#' For example: "KAIAULU_issues_1231234_2312413.json". This nomenclature is guaranteed by \code{\link{download_jira_issues}}.
#'
#' @param json_folder_path path to save folder containing JIRA issue and/or comments json files.
#' @return The name of the jira issue file with the latest created date that was created/downloaded for
#' use by the Jira Downloader refresher
#' @export
#' @family parsers
parse_jira_latest_date <- function(json_folder_path){
file_list <- list.files(json_folder_path)
time_list <- list()

issue_reporter_id = issue_comment[["reporter"]][["name"]][[1]],
issue_reporter_name = issue_comment[["reporter"]][["displayName"]][[1]],
issue_reporter_timezone = issue_comment[["reporter"]][["timeZone"]][[1]]
)
# Checking if the save folder is empty
if (identical(file_list, character(0))){
stop(stringi::stri_c("cannot open the connection"))
}

# Comments
# For each issue, comment/comments contain 1 or more comments. Parse them
# in a separate table.
root_of_comments_list <- json_issue_comments[["ext_info"]][[i]][["comment"]]
# If root_of_comments_list does not exist, then this is an issue only json, skip parsing
if(length(root_of_comments_list) > 0){
comments_list <- json_issue_comments[["ext_info"]][[i]][["comment"]][["comments"]]
# Even on a json with comments, some issues may not have comments, check if comments exist:
if(length(comments_list) > 0){
# Parse all comments into issue_comments
issue_comments <- rbindlist(lapply(comments_list,
jira_parse_comment))
# Add issue_key column to the start of the table
issue_comments <- cbind(data.table(issue_key=issue_key),issue_comments)
all_issues_comments[[i]] <- issue_comments
}
}
for (j in file_list){
j <- sub(".*_(\\w+)\\.[^.]+$", "\\1", j)
j <- as.numeric(j)
time_list <- append(time_list, j)
}
all_issues <- rbindlist(all_issues,fill=TRUE)
all_issues_comments <- rbindlist(all_issues_comments,fill=TRUE)

parsed_issues_comments <- list()
parsed_issues_comments[["issues"]] <- all_issues
parsed_issues_comments[["comments"]] <- all_issues_comments
overall_latest_date <- as.character(max(unlist(time_list)))

latest_issue_file <- grep(overall_latest_date, file_list, value = TRUE)

return(parsed_issues_comments)
return(latest_issue_file)
}
#' Format Parsed Jira to Replies
#'
Expand Down
1 change: 1 addition & 0 deletions _pkgdown.yml
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ reference:
Notebook for details.
- contents:
- parse_jira
- parse_jira_latest_date
- parse_jira_replies
- parse_jira_rss_xml
- make_jira_issue
Expand Down
4 changes: 2 additions & 2 deletions conf/geronimo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ issue_tracker:
domain: https://issues.apache.org/jira
project_key: GERONIMO
# Download using `download_jira_data.Rmd`
issues: ../../rawdata/issue_tracker/geronimo_issues.json
issue_comments: ../../rawdata/issue_tracker/geronimo_issue_comments.json
issues: ../../rawdata/issue_tracker/geronimo/issues
issue_comments: ../../rawdata/issue_tracker/geronimo/issue_comments
github:
# Obtained from the project's GitHub URL
owner: apache
Expand Down
1 change: 1 addition & 0 deletions man/parse_bugzilla_perceval_rest_issue_comments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_bugzilla_perceval_traditional_issue_comments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_bugzilla_rest_comments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_bugzilla_rest_issues.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_bugzilla_rest_issues_comments.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_commit_message_id.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_dependencies.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_dv8_clusters.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions man/parse_gitlog.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 18 additions & 3 deletions man/parse_jira.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 6b154d5

Please sign in to comment.