Skip to content

Code clone detection package for python. It comes with a cli and PIP package

License

Notifications You must be signed in to change notification settings

otzhora/potator

Repository files navigation

Small code clone detection tool. It implements an algorithm from SourcererCC with adaptive prefix filtering optimizations and displays its results as HTML.

It works with JavaScript, Python, Java, Go, C++, PHP, C#, C, Swift, Kotlin and Haskell.

Supported platforms

potator supports Linux and macOS. It is possible to use potator on Windows under WSL

Installation

Using pip

potator can be installed using pip

pip install potator

Using installation script

git clone https://github.com/otzhora/potator
cd potator
./install.sh

Usage

Using potator as a standalone cli application

potator [-h] [-d {Naive,Filtering}] [--depth DEPTH] [-t THRESHOLD] [-g GRANULARITY] [-o OUT] directory 

Options

  • You can choose one of two detectors: Naive and Filtering. Naive detector compares every possible combination of source code fragments and calculates Jaccard similarity between them. Filtering detector implements algorithm from SourcererCC paper with an adaptive prefix filtering optimizations.
  • depth parameters specify the maximum depth of adaptive prefix. depth=2 is recommended. Since it offers the optimal balance between costs of building index and querying it.
  • threshold is the minimum score that two code fragments should have to be considered clones.
  • granularity specifies granularity of code blocks. Options are functions and classes. functions is recommended.
  • out specifies the name of the resulting html
  • directory is the directory with files on which to perform search.

You can also do export DEBUG=1 before the search, then profiling information will be printed out.

Using potator as python package

You can import detectors or entities extractor from potator and use them to work with source code.

>>> from potator.detectors import FilteringDetector
>>> detector = FilteringDetector()
>>> detector.detect(directory, thershold, granularity)
>>> from potator.extractors import EntitiesExtractor
>>> EntitiesExtractor.extract_data_from_directory(directory, granularity)