What do you normally do when you want to make use of some utility which is part of some Java library? I mean, if you add it as a Maven dependency is the whole library get included in the assembled JAR?
I don't have storage problems of any sort and I don't mind the JAR to get bloated but I'm just curious if there's some strategy to reduce the JAR size in this case.
Thanks
It is even worse: You do not only add the "whole library" to your project, but also its dependencies, whether you need them or not.
Joachim Sauer is right in saying that you do not bundle dependencies into your artifact unless you want it to be runnable. But IMHO this just moves the problem to a different point. Eventually, you want to run the stuff. At some point, a runnable JAR, a WAR or an EAR is built and it will incorporate the whole dependency tree (minus the fact that you get only one version per artifact).
The Maven shade plugin can help you to minimise your jar (see Minimize an Uber Jar correctly, Using Shade-Plugin) by only adding the "necessary" classes. But this is of course tricky in general:
Classes referenced by non-Java elements are not found.
Classes used by reflection are not found.
If you have some dependency injection going on, you only use the interfaces in your code. So the implementations might get kicked out.
So your minimizedJar might in general just be too small.
As you see I don't know any general solution for this.
What are sensible approaches?
Don't build JARs that have five purposes, but build JARs that are small. To avoid a huge number of different build processes, use multi-module projects.
Don't add libraries as dependencies unless you really need them. Better duplicate a method to convert Strings instead of adding a whole String library just for this purpose.
Related
I have a maven (multi-module) project that contains something like an IAction. In this project I have implemented about 50 implementation of different actions. Each Action consists of a MyAction.java and a MyAction.properties file.
I use Java's SPI (java.util.spi) to load all the implementations at runtime. This all works great, but now I want to package each Action into a single jar, so that I end up with 50 jars.
What would be a good way to accomplish this? I don't really want to create a sub-module for each action, as that takes a lot of maintenance.
You can use the Maven assembly plugin in order to create additional artifacts for building a module. However, note that you need to add these artifacts to your deployment queue once you created them, in case that you do not only want to build these artifacts. This can be achieved by for example using the build-helper-maven-plugin.
However, as for the iteration over 50 elements, you might want to consider writing your own specialized plugin. There are additional plugins that allow the iteration over another plugin, but this will blow up your POM. If a build is therefore very specific, you might therefore consider writing an individual plugin which is specialized on this task. You can do this in pure Java instead of XML and it is not as hard as it sounds. There is documentation and there are default APIs to Maven. And you can check the source code of the named plugins on how to achieve your requirements.
I am rephrasing this question to make it a little more straightforward and easy to understand, hopefully.
I have roughly 30 components (internal) that go into a single web application. That means 30 different projects with their own separate POM. I use inheritance quite a bit in my POMs so one of the things they inherit is a PMD/CPD configuration to prevent code duplication.
Even though I have CPD/PMD running, it only detects duplicate code within the same project. I would like it to detect in any of my projects if there is code shared among the projects that can be refactored out. Moreover, I was looking for something that could (using the same concept/pattern) verify that no code is shared between other open source dependencies.
It would be CPD/PMD, except it would operate on the source jars. This task would consume a large amount of memory if you scan all projects and their dependencies for duplication. Right now, I would just like to apply that to internal projects. If it works, then it would be relatively easy/straightforward to scale that out.
Walter
I'm not sure I got everything but...
I'd create an aggregating module with all projects as dependencies, use the maven-dependency-plugin and it's unpack-dependencies mojo to get all dependencies sources jar (the mojo can take a classifier as parameter) and unpack-them (maybe in target/generated-sources/java, the maven build helper plugin may help here) and finally run pmd:cpd on the whole source base.
This may need some tweaking, I didn't test this at all.
It sounds like you want to find duplicate code anywhere in your 30 projects. I can't speak for PMD; I assume you tell it to make one giant project containing all the source files from the union of the projects. But yes, this would take a lot of RAM and CPU.
Another tool that does is the Java CloneDR. The CloneDR finds duplicate code whether it is exactly the same or close (e.g., a few edits) regardless of source code layout or intervening comments. It is pretty easy to set it up to process all the files in your set of projects.
Just run PMD:CPD as a stand-alone program. All it needs is a directory, and it will recurse. At least, it did for me. I moved all my source to one directory and ran the CPD gui from the batch file distributed with PMD-4.2.5 .
You can perhaps take a look at sonar :
Sonar-CPD engine that is much more scalable and can detect cross-projects duplications.
You can try Lizard for Python.
It doesn't work on source jars, though.
"Code Duplicate Detector
lizard -Eduplicate {path to your code}"
https://pypi.org/project/lizard/
PMD/CPD provides more granularity since it allows the user to specify the number of tokens before a block of code is flagged as duplicate.
https://pmd.github.io/latest/pmd_userdocs_cpd.html#cli-options-reference
I guess this is kind of a follow-on to question 1522329.
That question talked about getting a list of all classes used at runtime via the java -verbose:class option.
What I'm interested in is automating the build of a JAR file which contains my class(es), and all other classes they rely on. Typically, this would be where I am using code from some third party open source product's "client logic" but they haven't provided a clean set of client API objects. Their complete set of code goes server-side, but I only need the necessary client bits.
This would seem a common issue but I haven't seen anything (e.g. in Eclipse) which helps with this. Am I missing something?
Of course I can still do it manually by: biting the bullet and including all the third-party code in a massive JAR (offending my purist sensibilities) / source walkthrough / trial and error / -verbose:class type stuff (but the latter wouldn't work where, say, my code runs as part of a J2EE servlet, and thus I only want to see this for a given Tomcat webapp and, ideally, only for classes related to my classes therein).
I would recommend using a build system such as Ant or Maven. Maven is designed with Java in mind, and is what I use pretty much exclusively. You can even have Maven assemble (using the assembly plugin) all of the dependent classes into one large jar file, so you don't have to worry about dependencies.
http://maven.apache.org/
Edit:
Regarding the servlet, you can also define which dependencies you want packaged up with your jar, and if you are making a stand alone application you can have the jar tool make an executable jar.
note: yes, I am a bit of a Maven advocate, as it has made the project I work on much easier. No I do not work on the project personally. :)
Take a look at ProGuard.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.
What you want is not only to include the classes you rely on but also the classes, the classes you rely on, rely on. And so on, and so forth.
So that's not really a build problem, but more a dependency one. To answer your question, you can either solve this with Maven (apparently) or Ant + Ivy.
I work with Ivy and I sometimes build "ueber-jar" using the zipgroupfileset functionality of the Ant Jar task. Not very elegant would say some, but it's done in 10 seconds :-)
I have recently joined a project that is using multiple different projects. A lot of these projects are depending on each other, using JAR files of the other project included in a library, so anytime you change one project, you have to then know which other projest use it and update them too. I would like to make this much easier, and was thinking about merging all this java code into one project in seperate packages. Is it possible to do this and then deploy only some of the packages in a jar. I would like to not deploy only part of it but have been sassked if this is possible.
Is there a better way to handle this?
Approach 1: Using Hudson
If you use a continuous integration server like Hudson, then you can configure upstream/downstream projects (see Terminology).
A project can have one or several downstream projcets. The downstream projects are added to the build queue if the current project is built successfully. It is possible to setup that it should add the downstream project to the call queue even if the current project is unstable (default is off).
What this means is, if someone checks in some code into one project, at least you would get early warning if it broke other builds.
Approach 2: Using Maven
If the projects are not too complex, then perhaps you could create a main project, and make these sub-projects child modules of this project. However, mangling a project into a form that Maven likes can be quite tricky.
If you use Eclipse (or any decent IDE) you can just make one project depend on another, and supply that configuration aspect in your SVN, and assume checkouts in your build scripts.
Note that if one project depends on a certain version of another project, the Jar file is a far simpler way to manage this. A major refactoring could immediately means lots of work in all the other projects to fix things, whereas you could just drop the new jar in to each project as required and do the migration work then.
I guess it probably all depends on the specific project, but I think I would keep all the projects separate. This help keep the whole system loosely coupled. You can use a tool such as maven to help manage all the dependencies between the projects. Managing dependencies like this is one of maven's main strengths.
Using Ant as your build tool, you can package your project any way that you want. However, leaving parts of your code out of the distribution seems like it would be error prone; you might accidentally leave out necessary classes (presumably, all of your classes are necessary).
In relation to keeping your code in different projects, I have a loose guideline. Keep the code that changes together in the same project and package it in its own jar file. This works best when some of your code can be broken out into utility libraries that change less frequently than your main application.
For example, you might have an application where you've generated web service client classes from a web service WSDL (using something like the Axis library). The web service interface will likely change infrequently, so you don't want to have the regeneration step reoccurring all the time in your main application build. Create a separate project for this piece so that you only have to recreate the web service client classes when the WSDL changes. Create a separate jar and use it in your main application. This style also allows other projects to reuse these utility modules.
When following this style, you should place a version number in the jar manifest so that you can keep track of which applications are using which versions of your module. Depending on how far you want to take this, you could also keep a text file in the jar that details the changes that have occurred for each revision (much like an open source library).
It's all possible (we had the same situation some years ago). How hard or easy it'll be depends on your IDE (refactoring, merging, organizing new project) and you build tool (deploying). We used IDEA as IDE and Ant as build tool and it wasn't too hard. One sunday (nobody working+committing), 2 people on one computer.
I'm not sure what you mean by
"deploy only some of the packages in a jar"
I think you will need all of them at runtime, won't you? As I understood they depend on each other.
I'm currently working on a project that contains many different Eclipse projects referencing each other to make up one large project. Is there a point where a developer should ask themselves if they should rethink the way their development project is structured?
NOTE: My project currently contains 25+ different Eclipse projects.
My general rule of thumb is I would create a new project for every reusable component. So for example if I have some isolated functionality that can be packaged say as a jar, I would create a new project so I can build,package and distribute the component independently.
Also, if there are certain projects that you do not need to make frequent changes to, you can build them only when required and keep them "closed" in eclipse to save time on indexing, etc. Even if you think that a certain component is not reusable, as long as it is separated from the rest of the code base in terms of logic/concerns you may be well served by just separating it out. Sometimes seemingly specific code might be reusable in another project or in a future version of the same project.
When compiled, a project would typically result in a jar. So if your application consists of potentially reusable components, it is ok to use a project for each.
I'm a big fan of using a lot of projects, I feel that this "breaks down" large things beyond what I can do with packages, and helps me orient and navigate.
Of course, if you're developing Eclipse plug-ins, everything would be a project anyway.
The only thing I would watch out for has to do with your source-control and it's ability to handle moves of files between projects. Subclipse had been giving me trouble with it, or maybe it's my SVN server that did.
If your project has that many sub-projects, or modules, needed to actually compose your final artifact then it is time to look at having something like Maven and setting up a multi-module project. It will a) allow you to work on each module independently without ide worries and allow easy setup in your ide (and others' IDEs) through the mvn eclipse:eclipse goal. In addition, when building your entire top level project, maven will be able to derive from list of dependencies you have described what modules need to be built in what order.
Here's a quick link via google and a link to the book Maven: The Definitive Guide, which will explain things in much better detail in chapter 6 (once you have the basics).
This will also force your project to not be explicitly tied to Eclipse. Being able to build independent from an ide means that any Joe Schmoe can come along and easily work with your code base using whatever tools he/she needs.
Create jars for the projects you don't work in often. That should greatly reduce the clutter. If you work on all the projects often, then you can add targets to your build that will jar up the respective projects for you, which condenses everything down to one file that you can then include on the class path.
An additional method is to create many different workspaces. The benefit of separate workspaces is that you can remove some of the visual clutter/ performance overhead of having lots of projects. You can use targets to jar up all of you projects and put them in a repository so you can reference them in each workspace.
At a former job the entire application was more then +170 projects. While it was rarely necessary to have all projects checked out locally, even the 30-40 projects constantly in our scope made reindexing, etc. very slow.
Yeesh. One Project for each Project. If you are using reusable projects, make them into a library for heavens sake. Break the none re-usable projects into packages, that's what they are there for.
That's a hard question and answers span from having one eclipse project at all to having one eclipse project for every single class.
My bottomline:
You can have too few projects,
and never too many (of course use
automation e.g. mvn eclipse:eclipse)
Use
-Declipse.useProjectReferences=true/false
when using maven to switch workspace
mode btw jar and project
dependencies
Use mvn release plugin to generate
consecutive releases (automatic
version increase)
Multiple projects gives you
independent versioning which is
extremely important. E.g. one dev may work on a new version of a
module while you still depends on
the previous one and you at some
point decide to upgrade to the newer
version(possibly by increasing its version in pom.xml dependency section). Or in other scenario if one
project contains a bug you downgrade
to its previous version.
Multiple projects makes you think
about the architecture more than if
you have just packages.
Multiple projects generally make
architectural problems evident more
than if you have just one project.
Anyone would like to comment on
this?
You never know if you project
evolves into OSGI/SOA/EDA where you
need separation.
Even if you're 100% sure that you
projects will be deployed as one jar
in an old way in a single jvm, it
still does not hurt(mvn assembly
plugin) to have multiple eclipse
projects for logically independent
pieces of code
BTW, the project I work on is divided into 24 eclipse projects.
Hell, we have more than 100. Projects don't cost anything.