Skip to content

Commit

Permalink
feat: cosmetic changes
Browse files Browse the repository at this point in the history
  • Loading branch information
Yuriy Rogachev committed May 25, 2021
1 parent a2cb950 commit e062225
Show file tree
Hide file tree
Showing 3 changed files with 10 additions and 9 deletions.
12 changes: 6 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,12 +34,12 @@ potator [-h] [-d {Naive,Filtering}] [--depth DEPTH] [-t THRESHOLD] [-g GRANULARI

### Options

* You can choose one of two detectors: Naive and Filtering. Naive detector compares every possible combination of source code fragments and calculates jaccard similarity between them. Filtering detector implements algorithm from SourcererCC paper with an adaptive prefix filtering optimizations.
* Depth parameters specify the maximum depth of adaptive prefix. `depth=2` is recommended. Since it offers the optimal balance between costs of building index and querying it.
* Threshold is the minimum score that two code fragments should have to be considered clones.
* Granularity specifies granularity of code blocks. Options are `functions` and `classes`. `functions` is recommended.
* Out specifies the name of the resulting html
* Directory is the directory with files on which to perform search.
* You can choose one of two detectors: `Naive` and `Filtering`. `Naive detector` compares every possible combination of source code fragments and calculates [Jaccard similarity](https://en.wikipedia.org/wiki/Jaccard_index#Generalized_Jaccard_similarity_and_distance) between them. `Filtering detector` implements algorithm from `SourcererCC` paper with an `adaptive prefix filtering` optimizations.
* `depth` parameters specify the maximum depth of adaptive prefix. `depth=2` is recommended. Since it offers the optimal balance between costs of building index and querying it.
* `threshold` is the minimum score that two code fragments should have to be considered clones.
* `granularity` specifies granularity of code blocks. Options are `functions` and `classes`. `functions` is recommended.
* `out` specifies the name of the resulting html
* `directory` is the directory with files on which to perform search.

You can also do `export DEBUG=1` before the search, then profiling information will be printed out.

Expand Down
4 changes: 2 additions & 2 deletions potator/detectors.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def detect(self, directory: str, threshold: float, granularity: str) -> Detectio
files, files_data, entities = EntitiesExtractor.extract_data_from_directory(directory, granularity)

clones = []
with Profile("Validate candidates set"):
with Profile("Search for clones in candidates set"):
for i in range(len(entities)):
entity = entities[i]
for j in range(i + 1, len(entities)):
Expand Down Expand Up @@ -105,7 +105,7 @@ def detect(self, directory: str, threshold: float, granularity: str) -> Detectio
for token in tokens[left_bound: right_bound]:
candidates.update(indexer.get_entities_for_token(token, lang, l_depth))

with Profile("Validate candidates set"):
with Profile("Search for clones in candidates set"):
for candidate in candidates:
if not _validate_entity_candidate(entity, candidate):
continue
Expand Down
3 changes: 2 additions & 1 deletion potator/profiler.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,8 +10,9 @@

def print_debug_exit():
if DEBUG:
print(f"{'NAME':>50} : {'N_OCCUR':>8} {'TIME SPENT':>12} ms")
for name, _ in sorted(DEBUG_TIMES.items(), key=lambda x: -x[1]):
print(f"{name:>50} : {DEBUG_COUNTS[name]:>6} {DEBUG_TIMES[name]:>10.2f} ms")
print(f"{name:>50} : {DEBUG_COUNTS[name]:>8} {DEBUG_TIMES[name]:>12.2f} ms")


atexit.register(print_debug_exit)
Expand Down

0 comments on commit e062225

Please sign in to comment.