Extract Java dependency information, run a hierarchical clustering algorithm, organize your code.
To generate a dry-run folder: src/main/auto-cluster-maven-plugin1234...
mvn io.github.patrickdoc:auto-cluster-maven-plugin:cluster
To delete your existing structure and fully embrace the plugin
⚠️ WARNING: This will delete your existing files. Please be very careful, and also use version control.
mvn io.github.patrickdoc:auto-cluster-maven-plugin:cluster -DdryRun=false
I think dependencies are an under-examined aspect of code and we can do a lot more with them.
This plugin has two goals:
-
For individual projects, the goal is to make the internal dependency structure easy to analyze. By putting it front and center, you will hopefully be able to identify and resolve potential structural issues in your code.
-
For the general community, the goal is to provide a language for talking about code organization and style.
I don't like organizing interfaces into an inf
package or enums into an enum
package, but there is no productive conversation we can have about that.
On the other hand, if you submit a pull request to increase the effect of transitive dependencies on the clustering algorithm, then we can look at concrete examples of how it would work on any codebase. This seems like a much better starting point than "I don't like X".
For a longer form dev log and discussion, see here.
The ClassGraph repo is as good an example of any of medium sized project with non-zero complexity in the code. So I've used it as an example here. Note, this is not a criticism of the existing structure. In fact, I'm quite pleased that the plugin reproduces some of the existing structure.
Running a dry run in the ClassGraph repo (with a parameter to handle multiple base packages):
mvn io.github.patrickdoc:auto-cluster-maven-plugin:cluster -DbasePackages=io.github.classgraph,nonapi.io.github.classgraph
You can see a side by side comparison of the original repo:
You can also browse the files in this fork.
ClassGraph: This project powers the dependency data extraction, but can also do much more
Hierarchical Clustering Primer: A useful introduction to hierarchical clustering
Clustering Algorithm: The base algorithm
for clustering, available in Python and R packages as fastcluster