Skip to content

Commit

Permalink
Merge pull request #2191 from jplag/develop
Browse files Browse the repository at this point in the history
Merge develop into main
  • Loading branch information
tsaglam authored Feb 20, 2025
2 parents 3fb0b8f + 6e90388 commit d1cdf08
Show file tree
Hide file tree
Showing 328 changed files with 9,438 additions and 6,060 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/complete-e2e.yml
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ jobs:
node-version: "18"

- name: Build Assembly
run: mvn -Pwith-report-viewer -DskipTests clean package assembly:single
run: mvn -DskipTests clean package assembly:single

- name: Rename Jar
run: mv cli/target/jplag-*-jar-with-dependencies.jar cli/target/jplag.jar
Expand Down
1 change: 1 addition & 0 deletions .github/workflows/maven.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ on:
- "**/pom.xml"
- "**.java"
- "**.g4"
- "report-viewer/**"
pull_request:
types: [opened, synchronize, reopened]
paths:
Expand Down
13 changes: 13 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,19 @@ jobs:
with:
node-version: "18"

- name: Set version of Report Viewer
shell: bash
run: |
VERSION=$(grep "<revision>" pom.xml | grep -oPm1 "(?<=<revision>)[^-|<]+")
MAJOR=$(echo $VERSION | cut -d '.' -f 1)
MINOR=$(echo $VERSION | cut -d '.' -f 2)
PATCH=$(echo $VERSION | cut -d '.' -f 3)
json=$(cat report-viewer/src/version.json)
json=$(echo "$json" | jq --arg MAJOR "$MAJOR" --arg MINOR "$MINOR" --arg PATCH "$PATCH" '.report_viewer_version |= { "major": $MAJOR | tonumber, "minor": $MINOR | tonumber, "patch": $PATCH | tonumber }')
echo "$json" > report-viewer/src/version.json
echo "Version of Report Viewer:"
cat report-viewer/src/version.json
- name: Build JPlag
run: mvn -Pwith-report-viewer -U -B clean package assembly:single

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/report-viewer-demo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ jobs:
npm run build-demo
- name: Deploy 🚀
uses: JamesIves/github-pages-deploy-action@v4.6.1
uses: JamesIves/github-pages-deploy-action@v4.7.2
with:
branch: gh-pages
folder: report-viewer/dist
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/report-viewer-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ jobs:
npm run build-dev
- name: Deploy 🚀
uses: JamesIves/github-pages-deploy-action@v4.6.1
uses: JamesIves/github-pages-deploy-action@v4.7.2
with:
branch: gh-pages
folder: report-viewer/dist
Expand Down
43 changes: 0 additions & 43 deletions .github/workflows/report-viewer.yml

This file was deleted.

42 changes: 42 additions & 0 deletions .github/workflows/scripts/checkCoverage.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
import os
import xml.etree.ElementTree as ET

def get_all_pom_files():
pom_files = []
for root, dirs, files in os.walk("../../.."):
for file in files:
if file == "pom.xml":
pom_files.append(os.path.join(root, file))
return pom_files

# get content from a file as a string
def get_file_content(file):
with open(file, "r") as f:
return f.read()

# extract xml field artifact id from string
def extract_artifact_id(xml):
root = ET.fromstring(xml)
return root.find("{http://maven.apache.org/POM/4.0.0}artifactId").text

excluded_artifacts = ["coverage-report", "aggregator", "languages"]
artifact_ids = [extract_artifact_id(get_file_content(file)) for file in get_all_pom_files()]
print("All artifacts: " + str(artifact_ids))
filtered_artifact_ids = [artifact_id for artifact_id in artifact_ids if artifact_id not in excluded_artifacts]

coverage_report_pom = ""
with open("../../../coverage-report/pom.xml", "r") as f:
coverage_report_pom = f.read()
xml = ET.fromstring(coverage_report_pom)
coverage_report_artifacts = [dependency.find("{http://maven.apache.org/POM/4.0.0}artifactId").text for dependency in xml.find("{http://maven.apache.org/POM/4.0.0}dependencies").findall("{http://maven.apache.org/POM/4.0.0}dependency")]
print("Coverage report artifacts: " + str(coverage_report_artifacts))

only_in_coverage_report = [artifact_id for artifact_id in coverage_report_artifacts if artifact_id not in filtered_artifact_ids]
print("Only in coverage report: " + str(only_in_coverage_report))
not_in_coverage_report = [artifact_id for artifact_id in filtered_artifact_ids if artifact_id not in coverage_report_artifacts]
print("Not in coverage report: " + str(not_in_coverage_report))

if len(not_in_coverage_report) > 0:
raise Exception("Some artifacts are not in the coverage report: " + str(not_in_coverage_report))
if len(only_in_coverage_report) > 0:
raise Exception("Some artifacts are only in the coverage report: " + str(only_in_coverage_report))
28 changes: 28 additions & 0 deletions .github/workflows/verify-coverage-report.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
name: Check that all dependencies are in coverage report

on:
workflow_dispatch:
push:
paths:
- ".github/workflows/verify-coverage-report.yml"
- "./scripts/checkCoverage.py"
- "**/pom.xml"
pull_request:
types: [opened, synchronize, reopened]
paths:
- ".github/workflows/verify-coverage-report..yml"
- "./scripts/checkCoverage.py"
- "**/pom.xml"

jobs:
check_coverage:
runs-on: ubuntu-latest

steps:
- name: Checkout 🛎️
uses: actions/checkout@v4

- name: Run script
working-directory: .github/workflows/scripts
run: |
python checkCoverage.py
117 changes: 75 additions & 42 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,17 @@
<img alt="JPlag logo" src="core/src/main/resources/de/jplag/logo-dark.png" width="350">
</p>

# JPlag - Detecting Software Plagiarism
# JPlag - Detecting Source Code Plagiarism
[![CI Build](https://github.com/jplag/jplag/actions/workflows/maven.yml/badge.svg)](https://github.com/jplag/jplag/actions/workflows/maven.yml)
[![Latest Release](https://img.shields.io/github/release/jplag/jplag.svg)](https://github.com/jplag/jplag/releases/latest)
[![Maven Central](https://maven-badges.herokuapp.com/maven-central/de.jplag/jplag/badge.svg)](https://maven-badges.herokuapp.com/maven-central/de.jplag/jplag)
[![License](https://img.shields.io/github/license/jplag/jplag.svg)](https://github.com/jplag/jplag/blob/main/LICENSE)
[![GitHub commit activity](https://img.shields.io/github/commit-activity/y/jplag/JPlag)](https://github.com/jplag/JPlag/pulse)
[![SonarCloud Coverage](https://sonarcloud.io/api/project_badges/measure?project=jplag_JPlag&metric=coverage)](https://sonarcloud.io/component_measures?metric=Coverage&view=list&id=jplag_JPlag)
[![Report Viewer](https://img.shields.io/badge/report%20viewer-online-b80025)](https://jplag.github.io/JPlag/)
[![Java Version](https://img.shields.io/badge/java-SE%2021-yellowgreen)](#download-and-installation)


JPlag finds pairwise similarities among a set of multiple programs. It can reliably detect software plagiarism and collusion in software development, even when obfuscated. All similarities are calculated locally, and no source code or plagiarism results are ever uploaded to the internet. JPlag supports a large number of programming and modeling languages.
JPlag finds pairwise similarities among a set of multiple programs. It can reliably detect software plagiarism and collusion in software development, even when obfuscated. All similarities are calculated locally; no source code or plagiarism results are ever uploaded online. JPlag supports a large number of programming and modeling languages.

* 📈 [JPlag Demo](https://jplag.github.io/Demo/)

Expand Down Expand Up @@ -46,14 +45,14 @@ All supported languages and their supported versions are listed below.
| [EMF Metamodel](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf | beta | EMF |
| [EMF Model](https://www.eclipse.org/modeling/emf/) | 2.25.0 | emf-model | alpha | EMF |
| [SCXML](https://www.w3.org/TR/scxml/) | 1.0 | scxml | alpha | XML |
| Text (naive) | - | text | legacy | CoreNLP |
| Text (naive, use with caution) | - | text | legacy | CoreNLP |

## Download and Installation
You need Java SE 21 to run or build JPlag.

### Downloading a release
* Download a [released version](https://github.com/jplag/jplag/releases).
* In case you depend on the legacy version of JPlag we refer to the [legacy release v2.12.1](https://github.com/jplag/jplag/releases/tag/v2.12.1-SNAPSHOT) and the [legacy branch](https://github.com/jplag/jplag/tree/legacy).
* In case you depend on the legacy version of JPlag, we refer to the [legacy release v2.12.1](https://github.com/jplag/jplag/releases/tag/v2.12.1-SNAPSHOT) and the [legacy branch](https://github.com/jplag/jplag/tree/legacy).

### Via Maven
JPlag is released on [Maven Central](https://search.maven.org/search?q=de.jplag), it can be included as follows:
Expand All @@ -73,64 +72,98 @@ JPlag is released on [Maven Central](https://search.maven.org/search?q=de.jplag)
3. You will find the generated JARs in the subdirectory `cli/target`.

## Usage
JPlag can either be used via the CLI or directly via its Java API. For more information, see the [usage information in the wiki](https://github.com/jplag/JPlag/wiki/1.-How-to-Use-JPlag). If you are using the CLI, you can display your results via [jplag.github.io](https://jplag.github.io/JPlag/). No data will leave your computer!
JPlag can either be used via the CLI or directly via its Java API. For more information, see the [usage information in the wiki](https://github.com/jplag/JPlag/wiki/1.-How-to-Use-JPlag). If you are using the CLI, the report viewer UI will launch automatically. No data will leave your computer!

### CLI
*Note that the [legacy CLI](https://github.com/jplag/jplag/blob/legacy/README.md) is varying slightly.*
The language can either be set with the -l parameter or as a subcommand (`jplag [jplag options] <language name> [language options]`). A subcommand takes priority over the -l option.
When using the subcommand, language-specific arguments can be set. A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g. `jplag java -h`).
Language-specific arguments can be set when using the subcommand. A list of language-specific options can be obtained by requesting the help page of a subcommand (e.g., `jplag java h`).

```
Parameter descriptions:
[root-dirs[,root-dirs...]...]
Root-directory with submissions to check for plagiarism.
Root-directory with submissions to check for
plagiarism. If mode is set to VIEW, this parameter
can be used to specify a report file to open. In that
case only a single file may be specified.
-bc, --bc, --base-code=<baseCode>
Path to the base code directory (common framework used in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java). See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW}>
The mode of JPlag: either only run analysis, only open the viewer, or do both (default: null)
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in the generated report, if set to -1 all comparisons will be shown (default: 500)
Path to the base code directory (common framework used
in all submissions).
-l, --language=<language>
Select the language of the submissions (default: java).
See subcommands below.
-M, --mode=<{RUN, VIEW, RUN_AND_VIEW, AUTO}>
The mode of JPlag. One of: RUN, VIEW, RUN_AND_VIEW,
AUTO (default: null). If VIEW is chosen, you can
optionally specify a path to an existing report.
-n, --shown-comparisons=<shownComparisons>
The maximum number of comparisons that will be shown in
the generated report, if set to -1 all comparisons
will be shown (default: 2500)
-new, --new=<newDirectories>[,<newDirectories>...]
Root-directories with submissions to check for plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for languages: Java, C++.
Root-directories with submissions to check for
plagiarism (same as root).
--normalize Activate the normalization of tokens. Supported for
languages: Java, C++.
-old, --old=<oldDirectories>[,<oldDirectories>...]
Root-directories with prior submissions to compare against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will be stored (default: results). Missing .zip endings will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the minimum token required to be counted as a matching section. A smaller value increases the sensitivity but might lead to more
false-positives.
Root-directories with prior submissions to compare
against.
-r, --result-file=<resultFile>
Name of the file in which the comparison results will
be stored (default: results). Missing .zip endings
will be automatically added.
-t, --min-tokens=<minTokenMatch>
Tunes the comparison sensitivity by adjusting the
minimum token required to be counted as a matching
section. A smaller value increases the sensitivity
but might lead to more false-positives.
Advanced
--csv-export Export pairwise similarity values as a CSV file.
-d, --debug Store on-parsable files in error folder.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will be saved (default: 0.0).
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are included.
-P, --port=<port> The port used for the internal report viewer (default: 1996).
-s, --subdirectory=<subdirectory>
-d, --debug Store on-parsable files in error folder.
--log-level=<{ERROR, WARN, INFO, DEBUG, TRACE}>
Set the log level for the cli.
-m, --similarity-threshold=<similarityThreshold>
Comparison similarity threshold [0.0-1.0]: All
comparisons above this threshold will be saved
(default: 0.0).
--overwrite Existing result files will be overwritten.
-p, --suffixes=<suffixes>[,<suffixes>...]
comma-separated list of all filename suffixes that are
included.
-P, --port=<port> The port used for the internal report viewer (default:
1996).
-s, --subdirectory=<subdirectory>
Look in directories <root-dir>/*/<dir> for programs.
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the comparison (line-separated list).
-x, --exclusion-file=<exclusionFileName>
All files named in this file will be ignored in the
comparison (line-separated list).
Clustering
--cluster-alg, --cluster-algorithm=<{AGGLOMERATIVE, SPECTRAL}>
Specifies the clustering algorithm (default: spectral).
Specifies the clustering algorithm. Available
algorithms: agglomerative, spectral (default:
spectral).
--cluster-metric=<{AVG, MIN, MAX, INTERSECTION}>
The similarity metric used for clustering (default: average similarity).
The similarity metric used for clustering. Available
metrics: average similarity, minimum similarity,
maximal similarity, matched tokens (default: average
similarity).
--cluster-skip Skips the cluster calculation.
Subsequence Match Merging
--gap-size=<maximumGapSize>
Maximal gap between neighboring matches to be merged (between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract obfuscation attempts.
Maximal gap between neighboring matches to be merged
(between 1 and minTokenMatch, default: 6).
--match-merging Enables merging of neighboring matches to counteract
obfuscation attempts.
--neighbor-length=<minimumNeighborLength>
Minimal length of neighboring matches to be merged (between 1 and minTokenMatch, default: 2).
Subcommands (supported languages):
Minimal length of neighboring matches to be merged
(between 1 and minTokenMatch, default: 2).
--required-merges=<minimumRequiredMerges>
Minimal required merges for the merging to be applied
(between 1 and 50, default: 6).
Languages:
c
cpp
csharp
Expand All @@ -141,10 +174,10 @@ Subcommands (supported languages):
javascript
kotlin
llvmir
multi
python3
rlang
rust
scala
scheme
scxml
swift
Expand Down Expand Up @@ -183,7 +216,7 @@ Please consider our [guidelines for contributions](https://github.com/jplag/JPla

## Contact
If you encounter bugs or other issues, please report them [here](https://github.com/jplag/jplag/issues).
For other purposes, you can contact us at jplag@ipd.kit.edu .
If you are doing research related to JPlag, we would love to know what you are doing. Feel free to contact us!
For other purposes, you can contact us at jplag@ipd.kit.edu.
We would love to hear about your research related to JPlag. Feel free to contact us!

### More information can be found in our [Wiki](https://github.com/jplag/JPlag/wiki)!
Loading

0 comments on commit d1cdf08

Please sign in to comment.