Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scitools Understand Parser #309

Merged
merged 21 commits into from
Dec 8, 2024
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .idea/.gitignore

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

9 changes: 9 additions & 0 deletions .idea/kaiaulu.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/misc.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 8 additions & 0 deletions .idea/modules.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

6 changes: 6 additions & 0 deletions .idea/vcs.xml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 3 additions & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ Package: kaiaulu
Type: Package
Title: Kaiaulu
Version: 0.0.0.9700
Description: Kaiaulu is an R package and common interface that helps with understanding evolving software development communities, and the artifacts (gitlog, mailing list, files, etc.) which developers collaborate and communicate about. See Paradis et al., (2012) <doi:10.1007/978-3-031-15116-3_6>.
Description: Kaiaulu is an R package and common interface that helps with understanding evolving software development communities, and the artifacts (gitlog, mailing list, files, etc.) which developers collaborate and communicate about. See Paradis et al., (2012) <doi:10.1007/978-3-031-15116-3_6>.
Authors@R: c(
person('Carlos', 'Paradis', role = c('aut', 'cre'),
email = 'cvas@hawaii.edu',
Expand All @@ -21,6 +21,7 @@ Authors@R: c(
person('Anthony', 'Lau', role = c('ctb')),
person('Sean', 'Sunoo', role = c('ctb')),
person('Ian Jaymes', 'Iwata', role= c('ctb'))
person('Raven', 'Quiddaoen', role= c('ctb'))
)
Maintainer: Carlos Paradis <cvas@hawaii.edu>
License: MPL-2.0 | file LICENSE
Expand Down Expand Up @@ -49,4 +50,4 @@ Imports:
VignetteBuilder: knitr
URL: https://github.com/sailuh/kaiaulu
BugReports: https://github.com/sailuh/kaiaulu/issues
RoxygenNote: 7.2.3
RoxygenNote: 7.3.2
3 changes: 3 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
export(annotate_src_text)
export(assign_exact_identity)
export(bipartite_graph_projection)
export(build_understand_project)
export(commit_message_id_coverage)
export(community_oslom)
export(convert_pipermail_to_mbox)
Expand Down Expand Up @@ -132,6 +133,7 @@ export(parse_r_dependencies)
export(parse_r_function_definition)
export(parse_r_function_dependencies)
export(parse_rfile_ast)
export(parse_understand_dependencies)
export(query_src_text)
export(query_src_text_class_names)
export(query_src_text_namespace)
Expand All @@ -157,6 +159,7 @@ export(transform_gitlog_to_temporal_network)
export(transform_r_dependencies_to_network)
export(transform_reply_to_bipartite_network)
export(transform_temporal_gitlog_to_adsmj)
export(transform_und_dependencies_to_network)
export(weight_scheme_count_deleted_nodes)
export(weight_scheme_cum_temporal)
export(weight_scheme_pairwise_cum_temporal)
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ __kaiaulu 0.0.0.9700 (in development)__
=========================

### NEW FEATURES

* `build_understand_project (project_path , language, output_dir)`, `parse_understand_dependencies(output_dir, parse_type)`, and `transform_understand_dependencies_to_network(parsed, weights)` have been added. These functions handle creating tables from xml data generated from Scitool's Understand. [#308](https://github.com/sailuh/kaiaulu/issues/308)
* `refresh_jira_issues()` had been added. It is a wrapper function for the previous downloader and downloads only issues greater than the greatest key already downloaded.
* `download_jira_issues()`, `download_jira_issues_by_issue_key()`, and `download_jira_issues_by_date()` has been added. This allows for downloading of Jira issues without the use of JirAgileR [#275](https://github.com/sailuh/kaiaulu/issues/275) and specification of issue Id and created ranges. It also interacts with `parse_jira_latest_date` to implement a refresh capability.
* `make_jira_issue()` and `make_jira_issue_tracker()` no longer create fake issues following JirAgileR format, but instead the raw data obtained from JIRA API. This is compatible with the new parser function for JIRA. [#277](https://github.com/sailuh/kaiaulu/issues/277)
Expand Down
130 changes: 130 additions & 0 deletions R/src.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,101 @@
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at https://mozilla.org/MPL/2.0/.

############## Understand Project Builder ##############

#' Build the Scitool's Understand project folder for analysis of a project
#' This function creates the data file for Understand
#'
#' @param project_path path to the project folder to analyze
#' @param language the primary language of the project (language must be supported by Understand)
#' @param output_dir path to output directory (formatted output_path/)
#' @export
#' @family parsers
build_understand_project <- function(project_path, language, output_dir = "../tmp/"){

# Create variables for command line
command <- "und"
project_path <- paste0("\"", project_path, "\"")
db_dir <- paste0(output_dir, "/Understand.und")
args <- c("create", "-db", db_dir, "-languages", language)

# Build the Understand project
system2(command, args)
args <- c("-db", db_dir, "add", project_path)
system2(command, args)
args <- c("analyze", db_dir)
system2(command, args)

}

############## Parsers ##############

#' Parse dependencies from Scitool's Understand
#'
#'
#' @param understand_dir path to the built Understand project folder (same used in build_understand_project)
#' @param parse_type Type of dependencies to generate into xml (either "file" or "class")
#' @export
#' @family parsers
parse_understand_dependencies <- function(understand_dir="../tmp/", parse_type = C("file", "class")){
# Before running, check if parse_type is correct
parse_type <- match.arg(parse_type)

# Use Understand to parse the code folder.
# Create the variables used in command lines
db_dir <- paste0(understand_dir, "/Understand.und")
xml_dir <- paste0(db_dir, "/", parse_type, "Dependencies.xml")

# Generate the XML file
args <- c("export", "-dependencies", parse_type, "cytoscape", xml_dir, db_dir)
system2("und", args)

# Parse the XML file
xml_data <- xmlParse(xml_dir)
xml_nodes <- xmlRoot(xml_data) # The head of the xml
xml_nodes <- xmlChildren(xml_nodes) # Retrieve all the subnodes of the head (the data)

# From child nodes- filter for those with name "node"
node_elements <- lapply(xml_nodes, function(child) {
if (xmlName(child) == "node") {
# Extract the id
id <- xmlGetAttr(child, "id")
# Extract the necessary attributes from the attribute list
att_nodes <- xmlChildren(child)
node_label <- xmlGetAttr(att_nodes[[3]], "value");
long_name <- xmlGetAttr(att_nodes[[4]], "value");
return(data.table(node_label = node_label, id = id, long_name = long_name))
} else {
return(NULL)
}
})

# Remove NULLs and combine the results into a data frame
node_list <- rbindlist(node_elements[!sapply(edge_elements, is.null)], use.names = TRUE, fill = TRUE)

# From child nodes- filter for those with name "edge"
edge_elements <- lapply(xml_nodes, function(child) {
if (xmlName(child) == "edge") {
# Extract the id_from and id_to
id_from <- xmlGetAttr(child, "source")
id_to <- xmlGetAttr(child, "target")
# Extract the necessary attributes from the attribute list
att_nodes <- xmlChildren(child)
dependency_kind <- xmlGetAttr(att_nodes[[5]], "value");
dependency_kind <- unlist(stri_split(dependency_kind, regex = ",\\s*"))
return(data.table(id_from = id_from, id_to = id_to, dependency_kind = dependency_kind))
} else {
return(NULL)
}
})

# Remove NULLs and combine the results into a data frame
edge_list <- rbindlist(edge_elements[!sapply(edge_elements, is.null)], use.names = TRUE, fill = TRUE)

# Create a list to return
graph <- list(node_list = node_list, edge_list = edge_list)
return(graph)
}

#' Parse dependencies from Depends
#'
Expand Down Expand Up @@ -215,6 +308,43 @@ parse_r_dependencies <- function(folder_path){

############## Network Transform ##############

#' Transform parsed dependencies into a network
#'
#' @param depends_parsed Parsed data from parse_understand_dependencies
#' @param weight_types The weight types as defined in Depends.
#' Accepts single string and vector input
#'
#' @export
#' @family edgelists
transform_understand_dependencies_to_network <- function(parsed, weight_types) {

nodes <- parsed[["node_list"]]
edges <- parsed[["edge_list"]]

# Merge edges with nodes to get label_from
edges <- merge(edges, nodes[, .(id, node_label)], by.x = "id_from", by.y = "id", all.x = TRUE)
setnames(edges, "node_label", "label_from")

# Merge again to get label_to
edges <- merge(edges, nodes[, .(id, node_label)], by.x = "id_to", by.y = "id", all.x = TRUE)
setnames(edges, "node_label", "label_to")

# Reorder columns to have label_from and label_to on the left
edges <- edges[, .(label_from, label_to, id_from, id_to, dependency_kind)]

# Filter out by weights
edges <- edges[dependency_kind %in% weight_types]

# If filter removed all edges:
if (nrow(edges) == 0) {
stop("Error: No edges found under weight_types.")
}

# Create a list to return
graph <- list(node_list = nodes, edge_list = edges)
return(graph)
}

#' Transform parsed dependencies into a network
#'
#' @param depends_parsed A parsed mbox by \code{\link{parse_dependencies}}.
Expand Down
85 changes: 85 additions & 0 deletions vignettes/understand_showcase.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: "Understand Showcase"
output:
html_document:
toc: true
number_sections: true
vignette: >
%\VignetteEngine{knitr::rmarkdown}
%\VignetteIndexEntry{Understand Showcase}
%\VignetteEncoding{UTF-8}
---


# Introduction

Within a project, we might want to see the dependencies between files and classes. parse_r_dependencies and parse_dependencies uses in-house or Depends software, respectively, to analyze projects. parse_dependencies only provides file dependencies, while parse_r_dependencies provides file and function dependencies for a set of R files. Having Understand installed, we can analyze projects for both file and class dependencies in multiple languages like Java, PHP, HTML, C/C++, Python, Assembly, Ada, etc.

Here is more information on [Scitool's Understand](https://scitools.com)).

This notebook demonstrates a sample use case of the two functions that generates tables from the dependency data Understand outputs without opening Understand.


```{r warning = FALSE, message = FALSE}
rm(list = ls())
require(kaiaulu)
require(visNetwork)
require(igraph)
require(data.table)
```


# Parse a sample project folder

For sample purposes, we will use the project from [Houari Zegai's Calculator](https://github.com/HouariZegai/Calculator)) saved into a folder called
sample_project.

```{r}
folder_path <- "../tests/sample_project"
```


# File Dependencies

To generate a table containing the file dependencies of the project, provide the project_path, the primary project language, which in this case is Java, and the parse type we want: file. There is a fourth parameter named output_dir which allows us to redirect to an output folder so that one can see all the .xml and file data Understand generates (by default, will go to '../tmp').

Note the format of the generated table after running the below code.

```{r}
file_dependencies <- understand_parse_dependencies(project_path = folder_path, language = "java", parse_type = "file")
head(file_dependencies)
```


# Class Dependencies

Near-identical to parsing for file dependencies, class dependencies only requires us change the parse_type to class. In the output folder, this .xml will be separate from the one generated for file dependencies: fileDependencies.xml and classDependencies.xml respectively.

The generated data is in the same format, however note the different types of

```{r}
class_dependencies <- understand_parse_dependencies(project_path = folder_path, language = "java", parse_type = "class")
head(class_dependencies)
```

## File

```{r}
file_graph <- transform_und_dependencies_to_network(parsed = file_dependencies, weight_types = c("", ""))
project_function_network <- igraph::graph_from_data_frame(d=function_graph[["edgelist"]],
directed = TRUE,
vertices = function_graph[["nodes"]])
visIgraph(project_function_network,randomSeed = 1)
```


## Class

```{r}
class_graph <- transform_und_dependencies_to_network(parsed = class_dependencies, weight_types = c("", ""))
project_function_network <- igraph::graph_from_data_frame(d=file_graph[["edgelist"]],
directed = TRUE,
vertices = file_graph[["nodes"]])
visIgraph(project_function_network,randomSeed = 1)
```

Loading