Skip to content

Tutorial of GemX

Jacek Banaszczyk edited this page Dec 30, 2018 · 16 revisions

Author: Toshihiro Kamiya
Created: 2008/May/7
Contact: info@ccfinder.net
Copyright: 2005-2010 © Toshihiro Kamiya. All rights reserved.

TOC

  1. Detecting Code Clone
  2. Investigating Code Clones
    1. Scatter Plot
    2. File Table
    3. Source Text
    4. Clone-Set Table
    5. Saving Detected Code Clone or Scope
  3. Analyzing Code Clones by Metrics
    1. File Metrics
    2. Clone Set Metrics
    3. Line-based Metrics

Detecting Code Clone

Specify target source code from menu File » Detect Clones. That is, select a (preprocess script of) programming language and directories of the target source code.

  • By "Add Directory" button or drag-and-drop, add directories to the "Target directory list".
  • CCFinderX will look up source files recursively from the directories.

fig3.1et.png

In the options dialog, use default values at first. Refer the manual for detail of each option.

After that, the tool will detect code clones. When the detection finished, views of the tool will be updated.

fig3.2et.png

hint When the detected code clones are too many, check "Use prescreening" in the options dialog.

fig3.3et.png

hint By default, the order of source files is a lexical order of their path (multi-byte encoding lexical oder). CCFinderX searches a file with one of the extensions that are predefined to each programming language (For example, .c, .cc, .cpp, .cxx, .h, .hh, .hpp, .hxx for C++). In order to change the order of source files and/or specify source files with special extension, make a file list by hand (file list is a text file, which includes a path of source file per line), and use GemX's menu File » Detect Clones from File List.

Investigating Code Clones

You can investigate the detected code clones with views named File Table, Clone-Set Table, Scatter Plot, and Source Text, Scrapbook, and a Search box.

fig4.1et.png

  • File Table
    This view shows a list of target source files.
  • Clone-Set Table
    This view shows a list of the detected code clones.
  • Scatter Plot
    This view shows where and how the code clones are distributed in the whole source files.
  • Source Text
    This view shows text of a source file. Code clones are illustrated as highlighted texts.
  • Scrapbook
    You can copy the code fragments shown in Source Text view to Scrapbook. This view is almost the same as Source Text, except for Scrapbook does not follow the selection change in File Table or Clone-Set Table.
  • Search box
    Search in File Table, Clone-Set Table, Source Text, or Scrapbook.

Scatter Plot

This view shows where and how the code clones are distributed in the whole source files. Checkboxes at the upper side of Scatter Plot are used to turn on/off boundary of source files, directory, etc. Refer the manual for the detail of each option.

fig4.1.1et.png

  • Gray vertical and horizontal lines are boundaries of the source files, that is, locations of the end-of-flies.
  • Squares drawn by black lines (on the main diagonal line) are directories.
  • Black lines (and dots) that located parallel to the main diagonal line are code clones
    • The corresponding orthogonal projections to the horizontal axis and vertical axis are the locations of similar code fragments.

Also, The list of source files are shown in File Table at the left of the Scatter Plot.

Narrowing Down of File List (Zooming Scatter Plot)

Select files by dragging. Then, from right-click menu Fit Scope to Selected Files, you can narrow down the target files to the selected files. (This operation will also affect File Table, that is, the list of files in it will be updated.)

fig4.1.2et.png

In order to undo this operation, use right-click menu Pop Scope. This option often cause missing number in File ID's. The file boundaries between such discontinuous file ID's are shown by dotted-line.

hint Files in a directory can be selected by clicking the upper-right corner or lower-left corner of the square corresponding to the directory.

File Table

File Table shows a list of target source files.

fig4.2.1et.png

For each file, the following information are shown:

  • Index of the file(File ID)
  • Path of the file
  • Length of the file in token(LEN)
  • Count of code clones included in the file(CLN)

On the above of the list, a "common" path of the source files is shown. The common path is a path of the root directory of the files in the list.

You can narrow down the list, by right-click menu.

  • Select files by click, CTRL + click, or SHIFT + click → Fit Scope to Selected Files or Fit Scope to Files Except for Selected Ones
  • Alternatively, Select Files That Have Clones Between Them or Selected Files UnderFit Scope to Selected Files or Fit Scope to Files Except for Selected Ones

The status of narrowing-down of files can be canceled by menu Pop Scope

hint In order to narrow down the file list with more complex selection (for example, keep files under two directories, or keep unselected files), use check boxes at the files in File Table. Refer the manual for operations about these check boxes.

Source File Search

When the File Table has focus (the focus is shown by colored frame), you can search in File List by putting file name or path in Search box. You can also select a file having a specific File ID by putting "#number" (file ID) in Search box,

fig4.2.2et.png

Looking up File Name in Scatter Plot (Scatter Plot → File Table)

When you click on Scatter Plot, the corresponding two files on vertical axis and horizontal axis will get into selected state. These files will be highlighted in File Table. By this operation, you can find out the names of the files.

fig4.2.4et.png

Source Text

This view shows text of a source file. Code clones are illustrated as highlighted text.

When you select a file by clicking in File Table, text of the file will be shown in the Source Text.

fig4.3.1et.png

  • The area is composed of three views, Map (described in the next sheet), Line Number, and Text.
  • Code clones are shown as text with gray background.
  • The highlighting of line number shows code clones, but it also shows origin of the code. A highlight at left side means the code is similar to text of the other file. A highlight at the right side means the code is similar to text within the same file.
  • Comments and the tokens (of comments or simple getters/setters) removed by a preprocess script are shown in green characters. Reserved words are shown in dark red characters.

Text Search

When Source Text has focus, you can search in text by putting word in Search box. You can also move to the specific line by putting "#number" (line number) in Search box.

fig4.3.2et.png

Multiple Text Mode

When two (or more) files are selected in File Table or in Scatter Plot, Source Text will show two of the files.

fig4.3.3et.png

Figures shown in the map are:

  • White bar: source file
  • Gray vertical bars at the left side of the white bar: Clone code fragment enclosed within the file.
  • Gray vertical bars at the right side of the white bar: Clone code fragment between the file and the other files.
  • Gray crosswise or U-shaped line: indicates the code fragments belongs to a clone set.
  • Yellow rounded rectangle: Range of shown text. You can update a Source Text view by dragging on the right side of the rectangle.

hint The number of panes of Source Text view (that is, the maximum number of source files shown in the view at the same time) can be changed by menu Preferences » GemX Preferences.

Copy to Scrapbook

By right click menu Copy to Scrapbook in Source Text, its text is copied to Scrapbook. The Scrapbook is not affected by selection change in File Table or Clone-Set Table, so the Scrapbook is convenient when you are investigating code clones around a specific code fragment(s).

fig4.3.4et.png

Clone-Set Table

Clone-Set Table shows a list of the detected code clones. A clone set is a set of code fragments, in which any pair of the code fragments is a code clone. For each clone set, ID (Clone-Set ID) and length are shown.

fig4.4.1et.png

Scope of Clone-Set Table is linked to scope of File Table. That is, if some files are removed from File Table by narrowing down, the enclosed code clones (whose code fragments appear only in these files) will be removed in Clone Set Table. (The reverse is not true. That is, if some code clones are removed from Clone-Set Table, the related files never be removed automatically.)

To Show Code Fragment of a Clone Set (Clone-Set Table → Source Text)

Show the Clone-Set Table at the left side, and show the Source Text at the right side, in advance.

fig4.4.2et.png

  • Select a clone set by clicking in Clone-Set Table, then pair of code fragments of the clone set will be shown in Source Text. By clicking the same clone set again, the next pair will be shown.
  • You can see two code fragments of a clone a clone set at the same time, by right-click menu Show Next Clone-Pair Code
  • Also, you can narrow down clone sets by right-click menu Fit Scope to Selected Clone Sets.

hint Right-click menu Show a Code Fragment is used to show one code fragment of a clone set.

Saving Detected Code Clone or Scope

File menu has menu items to save scope and file list.

fig4.5.1et.png

  • Saving scope: A scope is a state of narrow-downing of files and code clones. Scopes are shown in Scope History at the leftmost view.
    • Save: from menu File » Open Current Scope As
    • Load: from menu File » Open Clone Data
  • Saving file list: A file list contains paths of source files in File Table.
    • Save: from menu File » Save File List as
    • From menu File » Detect Clones from File List, you can re-detect code clones from the source files that have been beforehand saved as a file list.
  • Re-detection of clones: In order to detect code clones from the source files of current scope with the different detection options, use menu File » Re-Detect Clones.

Analyzing Code Clones by Metrics

Metrics of code clones and metrics of files will help you find interesting code clones or source files. Also, line-based metrics measures the statistics by count of lines. GemX provides narrow-downing operations with metrics.

File Metrics

File metrics will be shown in File Table, by menu Metrics » Show File Metrics.

fig5.1.1et.png

  • NBR (neighbor):
    Count of files that have a code-clone fragment between the file.
  • RSA (ratio of similarity between another files):
    Ratio (percentage) of tokens that are covered by a code clone between the file and one of the other files. If a file has a RSA value close to 100%, then it is possible that the file was created by copying a file.
  • RSI (ratio of similarity within the file):
    Ratio (percentage) of tokens that are covered by a code clone enclosed within the file. If a file has a RSI value close to 100%, then it is possible that the file contains a series of the similar functions or methods.
  • CVR (coverage):
    Ratio (percentage) of tokens that are covered by any code clone. By definition, max(RSA, RSI) ≤ CVR ≤ RSA+RSI
  • RNR (ratio of non-repeated code):
    Ratio of non-repeated code. 1 - (ratio of tokens appears in repetition in the text) If the value is near to zero, the code is repetition of simple statements and/or definitions.

From menu Metrics » Filter File by Metrics, You can narrow down the file list by the minimum and maximum values of a metric.

fig5.1.2et.png

hint By default, when you perform narrowing down of file list after showing file metrics, the metrics value will disappear from File Table. In order to make GemX re-calculate and show file metrics automatically after such narrowing-down, add check mark to Calc File Metrics Always in the dialog shown by menu Preferences » GemX Preferences.

Clone Set Metrics

Clone-set metrics will be shown by menu Metrics » Show Clone-Set Metrics.

fig5.2.1et.png

  • LEN (length):
    Length (in token) of a code fragment of the code clone
  • POP (population):
    Count of code fragments of the code clone
  • NIF:
    Count of source files that include one or more code fragments of the code clone. By definition, NIF ≤ POP (This metric is imported from Gemini, developed by Yoshiki Higo, Osaka Univ..)
  • RAD (radius):
    Range of the source code fragments of a code clone in the directory hierarchy. (When all the code fragments of the code clone are located in one source file, the value is 0.)
  • RNR (ratio of non-repeated tokens):
    Ratio (percentage) of tokens that are not included in repeated part of a code fragment of the code clone.
  • TKS (token set size):
    Size of a set of tokens of a code fragment of the code clone. If a code clone has a extremely small TKS value, then the code fragment is a simple statement, such as a series of declaration of variables.
  • LOOP, COND, McCabe:
    LOOP is defined as count of loops in a code fragment, COND is defined as count of conditional branches, and McCabe is defined as the sum of them. In order to focus attention on complex code, select code clones with the higher values of these metrics.

You can do filtering of code clones from menu Metrics » Filter Clone Set by Metrics.

fig5.2.2et.png

When clone-set metrics are calculated, each clone in scatter plot will be colored by RNR metric value (by default).

fig5.2.3et.png

hint Just like file metrics, the clone-set metrics can be automatically re-calculated and shown by menu Preferences » GemX Preferences.

Line-based Metrics

Line-based metrics will be shown by menu Metrics » Show Line-based Metrics.

fig5.3.1et.png

  • LOC:
    Raw line count of the source file.
  • SLOC:
    Count of lines, in case of excluding lines without any valid tokens. Here, the "tokens" are specific to CCFinderX. (For example, definitions of simple getters/setters in Java source file will be entirely neglected by CCFinderX.) Such neglected characters are shown in green color in Source Text view.
  • CLOC:
    Count of lines including at least one token of a code fragment of a code clone.
  • CVRL:
    Ratio of the lines including a token of a code fragment of a code clone. CVRL = CLOC / SLOC.