Maven build optimization - prevent building *-fat.jar locally

Maven build optimization - prevent building *-fat.jar locally - java

Right now I see this in my project:
I have a pretty optimized maven build using:
mvn -offline -T 9 package exec:java -DskipTests
the offline flag prevents it from looking for updates, uses 9 threads, and skips tests, but I wonder if there is a flag I can use to prevent it from creating the *-fat.jar?
I figure the fat.jar is a big file and if I avoid creating it until I need to, might save some time.

Maven is not creating something like "-fat.jar" by default. It mast be specific definition in the pom.xml: maven-assembly-plugin or maven-shade-plugin which do it.
So, you need to change your pom.xml: define special profiles: one(defualt) which will create "-fat.jar" and one which will not.
And then you will able to run something like "mav package -Pmy-no-fat-profile" to avoid "-fat.jar" creation.

Related

How to programmatically query a p2 repository for information and artifacts?

I have a bunch of different p2 repositories I want to query for information programmatically. What types of bundles/features do they provide? What type of licenses (if any) are paired with the bundles? And I'd like to simply download jars.
In other words, I want to programmatically query and download just about any public information contained in a p2 repository, but I don't need to actually do anything OSGi-related with this information.
Is there a relatively simple way to do this?
I have already tried a few things and found them not adequate:
Solution 1: p2 director:
I know about the p2 director, however I want to query the information from within a non-eclipse application and adding eclipse to then trigger commands via the command line seems like a bit of a weird detour. Also, that would restrict me to the rather limited interface of the p2 director (for instance, I think I can't just download a jar, I can just install it, which also unpacks it and maybe does other stuff I'm not aware of).
Solution 2: Building OSGi container manually:
Browsing the Eclipse APIs, I thought that it should be sufficient to have instances of IArtifactRepository/IMetadataRepository (see for instance: https://help.eclipse.org/luna/index.jsp?topic=%2Forg.eclipse.platform.doc.isv%2Freference%2Fapi%2Forg%2Feclipse%2Fequinox%2Fp2%2Frepository%2Fartifact%2Fclass-use%2FIArtifactRepository.html). However, it seems not exactly trivial to get the artifacts. If I start from scratch, using the information provided in this anser here: Programmatically Start OSGi (Equinox)? I then have to build and initialize a IProvisioningAgentProvider, then an IProvisioningEventBus, then I need a registry, etc. It's quite hard to understand exactly what is needed and lots of the stuff is equinox-internals, so this also doesn't really seem to be the way to go.
Do any of the many equinox-related bundles offer an "easy" gateway to do this?

The bnd code base has a P2 repository that might be useful. The bnd command line allows you to use it interactively. First create a bndrun file repo.bndrun:
-standalone true
-plugin.p2 \
aQute.bnd.repository.p2.provider.P2Repository; \
url="https://bndtools.jfrog.io/bndtools/update/"
In the same directory in the shell you can then do:
$ bnd repo -w repo.bndrun list
biz.aQute.bnd.maven [4.2.0.201901301338-SNAPSHOT]
biz.aQute.bndlib [4.2.0.201901301338-SNAPSHOT]
biz.aQute.repository [4.2.0.201901301338-SNAPSHOT]
biz.aQute.resolve [4.2.0.201901301338-SNAPSHOT]
...
org.bndtools.templating.gitrepo [4.2.0.201901301338-SNAPSHOT]
org.bndtools.versioncontrol.ignores.manager [4.2.0.201901301338-SNAPSHOT]
org.bndtools.versioncontrol.ignores.plugin.git [4.2.0.201901301338-SNAPSHOT]
org.slf4j.api [1.7.2.v20121108-1250]
This will show a list of bsns and versions available in the p2 repo. You
can also generate an OSGi XML index for OBR from it:
bnd repo -w repo.bndrun index
This index is very easy to parsed and has an OSGi standardized format.
If you need the version of an artifact:
$ bnd repo -w repo.bndrun versions bndtools.api
[4.2.0.201901301338-SNAPSHOT]
You can also get artifacts from it:
$ bnd repo -w repo.bndrun get bndtools.api
$ ls -1
bndtools.api-4.2.0.201901301338-SNAPSHOT.jar
repo.bndrun
If you include biz.aQute.bndlib and biz.aQute.bnd.repository from Maven Central then you can also use the P2 repository from your code.
You can install the latest of bnd from brew for MacOS. On other OS'es you
can download the biz.aQute.bnd JAR from Maven Central from group biz.aQute.bnd.
https://repo1.maven.org/maven2/biz/aQute/bnd/biz.aQute.bnd/4.1.0/biz.aQute.bnd-4.1.0.jar
[I am a committer on this project]

Spark, Alternative to Fat Jar

I know at least 2 ways to get my dependencies into a Spark EMR job. One is to create a fat jar and another is to specify which packages you want in spark submit using the --packages option.
The fat jar takes quite a long time to zip up. Is that normal? ~10 minutes. Is it possible that we have it incorrectly configured?
The command line option is fine, but error prone.
Are there any alternatives? I'd like it if there (already existed) a way to include the dependency list in the jar with gradle, then have it download them. Is this possible? Are there other alternatives?
Update: I'm posting a partial answer. One thing I didn't make clear in my original question was that I also care about when you have dependency conflicts because you have the same jar with different versions.
Update
Thank you for the responses relating to cutting back the number of dependencies or using provided where possible. For the sake of this question, lets say we have the minimal number of dependencies necessary to run the jar.

Spark launcher can used if spark job has to be launched through some application with the help of Spark launcher you can configure your jar patah and no need to create fat.jar for runing application.
With a fat-jar you have to have Java installed and launching the Spark application requires executing java -jar [your-fat-jar-here]. It's hard to automate it if you want to, say, launch the application from a web application.
With SparkLauncher you're given the option of launching a Spark application from another application, e.g. the web application above. It is just much easier.
import org.apache.spark.launcher.SparkLauncher
SparkLauncher extends App {
val spark = new SparkLauncher()
.setSparkHome("/home/knoldus/spark-1.4.0-bin-hadoop2.6")
.setAppResource("/home/knoldus/spark_launcher-assembly-1.0.jar")
.setMainClass("SparkApp")
.setMaster("local[*]")
.launch();
spark.waitFor();
}
Code:
https://github.com/phalodi/Spark-launcher
Here
setSparkHome(“/home/knoldus/spark-1.4.0-bin-hadoop2.6”) is use to set spark home which is use internally to call spark submit.
.setAppResource(“/home/knoldus/spark_launcher-assembly-1.0.jar”) is use to specify jar of our spark application.
.setMainClass(“SparkApp”) the entry point of the spark program i.e driver program.
.setMaster(“local[*]”) set the address of master where its start here now we run it on loacal machine.
.launch() is simply start our spark application
What are the benefits of SparkLauncher vs java -jar fat-jar?
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-SparkLauncher.html
https://spark.apache.org/docs/2.0.0/api/java/org/apache/spark/launcher/SparkLauncher.html
http://henningpetersen.com/post/22/running-apache-spark-jobs-from-applications

For example on Cloudera's clusters, there is some set of libraries already available on all nodes which will be available on classpath for drivers, executors.
Those are e.g. spark-core, spark-hive, hadoop etc
Versions are grouped by Cloudera, so e.g. you have spark-core-cdh5.9.0 where cdh5.9.0 suffix means that all libraries with that suffix were actually verified by Cloudera to be working together properly.
Only thing you should do is to use libraries with the same group suffix and you shouldn't have any classpath conflicts.
What that allows is:
set dependencies configured in an app as Maven's scope provided, so they will not be part of fat jar, but resolved from classpath on nodes.
You dind't write what kind of cluster you have, but maybe you you can use similliar approach.
maven shade plugin may be used to create fat jar which additionally allows to set libraries you want to include ina jar, and those not in a list are not included.
I think something similiar is described in this answer Spark, Alternative to Fat Jar but using S3 as dependency storage.

HubSpot has a (partial) solution: SlimFast. You can find an explanation here http://product.hubspot.com/blog/the-fault-in-our-jars-why-we-stopped-building-fat-jars and you can find the code here https://github.com/HubSpot/SlimFast
Effectively it stores all the jars it'll ever need on s3, so when it builds it does it without packaging the jars, but when it needs to run it gets them from s3. So you're builds are quick, and downloads don't take long.
I think if this also had the ability to shade the jar's paths on upload, in order to avoid conflicts, then it would be a perfect solution.

The fat jar indeed take a lot of time to create. I was able to optimize a little bit by removing the dependencies which were not required at runtime. But it is really a pain.

Cmake: don't build project again when executing make install

I have a CMakeList.txt which will build a Java project with Maven to a war file when running make, but when I run make install, it will rebuild it again before copy to the installation folder of Web Application.
How can I only build Java once with make but not again with make install? Here is the CMakeList.txt:
add_custom_target(JavaProject ALL
COMMAND ${MAVEN_EXECUTABLE} package
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
VERBATIM)
install(FILES "${JAVA_PROJECT_TARGET_DIR}/java_project.war"
DESTINATION ${WAR_DIR})

As the documentation of add_custom_target() says, custom targets are always considered out of date, which means they will re-build with each invocation of make which includes them.
What you want instead is a custom command to produce the .war file:
add_custom_command(
OUTPUT "${JAVA_PROJECT_TARGET_DIR}/java_project.war"
COMMAND ${MAVEN_EXECUTABLE} package
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
VERBATIM
)
This tells CMake how the file named "${JAVA_PROJECT_TARGET_DIR}/java_project.war" is produced when someone requests it. For files, CMake can generate dependency checks just fine, so it will not re-build needlessly. Note that you will probably also want to include some DEPENDS in that add_custom_command(), otherwise it will never rebuild once built(1).
Then, you need one more thing: a driver for the custom command. That is something that will depend on the command's OUTPUT and actually cause it to be built. So you'll add a custom target:
add_custom_target(
JavaProject ALL
DEPENDS "${JAVA_PROJECT_TARGET_DIR}/java_project.war"
)
Then, the sequence will be as follows:
During a make, JavaProject will be considered out of date (since it's a custom target) and will be built. This means its dependencies will be checked for up-to-datedness, and re-built if they're not up to date. That's what the custom command is for. After that, the custom target itself would run its COMMAND, but it doesn't have any, so nothing else happens.
On a subsequent make invocation, JavaProject will again be considered out of date and will thus be built. Its dependencies are checked again, but this time, they're up to date (since the .war already exists). It's therefore not built again. The custom target still has no COMMAND, so nothing further happens.
This "custom target as driver for custom commands" approach is very a idiomatic piece of CMake code, and you will see it in many projects which produce additional files which do not participate in further build steps (such as documentation).
(1) If the list of dependencies is very large, you want to move it to a separate files and include that. Something like this:
In CMakeLists.txt:
include(files.cmake)
add_custom_command(
OUTPUT "${JAVA_PROJECT_TARGET_DIR}/java_project.war"
COMMAND ${MAVEN_EXECUTABLE} package
DEPENDS ${MyFiles}
WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}
VERBATIM
)
In files.cmake:
set(MyFiles
a/file1.java
a/file2.java
a/b/file1.java
a/c/file1.java
# ... list all files as necessary
)
This keeps the CMakeList itself readable, while allowing you to explicitly depend on all you need.

Although Angew has an excellent answer but unfortunately it does not work as I expect (i.e: when update source code folder and run make, it will not build the war again).
Here is the way to solve what I wanted:
set(CMAKE_SKIP_INSTALL_ALL_DEPENDENCY TRUE)
Then when I run make it will build and make install will just copy to the installation folder.

How can I programmatically add an Artifact to the MavenProject?

I am writing a Maven plugin and I would like to automatically resolve specific dependencies and add them as dependencies to the project based on the parameters given to the plugin.
I have been able to successfully resolve dependencies through aether, but there seems to be a disconnect between aether and the MavenProject.
There is a method on MavenProject#addAttachedArtifact which I'm guessing is what I want to use. However, it takes a org.apache.maven.artifact.Artifact while the one retrieved from aether is org.sonatype.aether.artifact.Artifact. I found a plugin that has a conversion method between the two, but I figure there ought to be a more standard approach.
I have also tried using the DefaultArtifactFactory to create a org.apache.maven.artifact.Artifact but get a NullPointerException when trying to get an ArtifactHandler.
code:
DefaultArtifactFactory factory = new DefaultArtifactFactory();
Artifact mavenArtifact = factory.createBuildArtifact("com.beust", "jcommander", "1.27", "jar");
result:
Caused by: java.lang.NullPointerException
at org.apache.maven.artifact.factory.DefaultArtifactFactory.createArtifact(DefaultArtifactFactory.java:155)
at org.apache.maven.artifact.factory.DefaultArtifactFactory.createArtifact(DefaultArtifactFactory.java:117)
at org.apache.maven.artifact.factory.DefaultArtifactFactory.createArtifact(DefaultArtifactFactory.java:111)
at org.apache.maven.artifact.factory.DefaultArtifactFactory.createBuildArtifact(DefaultArtifactFactory.java:75)
at com.foo.bar.module.IncludeModuleFrontEndMojo.execute(IncludeModuleFrontEndMojo.java:165)
at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
... 20 more
So really, these are the things I've tried, a resolution to these issues would be great, but I'm really after the right way to do this. Any ideas?
UPDATE
I wrote my own conversion method between the two classes:
private static org.apache.maven.artifact.Artifact aetherToMavenArtifactBasic(Artifact artifact, String scope, ArtifactHandler artifactHandler) {
DefaultArtifact mavenArtifact = new DefaultArtifact(artifact.getGroupId(), artifact.getArtifactId(), artifact.getVersion(), scope, artifact.getExtension(), artifact.getClassifier(), artifactHandler);
mavenArtifact.setFile(artifact.getFile());
mavenArtifact.setResolved(true);
return mavenArtifact;
}
and found that the MavenProject#addAttachedArtifact method is to attach an artifact to an existing artifact (i.e. attach sources/javadocs jars to an artifact), which is not my goal. Instead I got the artifacts from the maven project and add my artifact:
project.getArtifacts().add(mavenArtifact);
which adds my artifact to the project (my artifact is then shown when I call the project's getArtifactMap() and getCompileClasspathElements(). However, this change does not persist. This is the problem I was really worried about. So the question has evolved into:
Can I make changes to the MavenProject and have it persist?

I don't think this is possible and for my purposes I decided instead to require the user to add the dependency in the project's pom file (and error out if they don't have it).
It seems to be by design that you don't allow the user to muck with the project configuration through a plugin to a point where you could break the build. I found a good post on advanced MOJO development here. A quote from it:
If this parameter could be specified separately from the main
dependencies section, users could easily break their builds –
particularly if the mojo in question compiled project source code. In
this case, direct configuration could result in a dependency being
present for compilation, but being unavailable for testing. Therefore,
the #readonly annotation functions to force users to configure the
POM, rather than configuring a specific plugin only.

Injecting current git commit id into Java webapp

We have a git repository which contains source for a few related Java WARs and JARs. It would be nice if the Java code could somehow:
System.err.println("I was built from git commit " + commitID);
(Obviously real code might be putting this into an HTTP header, logging it on startup, or whatever, that's not important right now)
We are using Ant to build (at least for production builds, it seems some programmers do their testing from inside Eclipse which I know even less about) binaries.
Is there a canonical way to get the commit id for the current git checkout into our Java at build time? If not, can people using Ant to build suggest how they'd do it and we'll see if a canonical solution emerges? I'm sure I can invent something myself entirely from whole cloth, but this seems like a re-usable building block, so I'd rather not.

You can get the last commit SHA with
git rev-parse HEAD
but it's generally a lot more useful to use
git describe
which will give you something that looks like this:
v0.7.0-185-g83e38c7
This works if you have tags - it will tell you how many commits from the last valid tag your current checkout is at plus a partial SHA for that commit, so you can use it to base a checkout off of later. You can use this identifier just like a SHA in most circumstances, but it's much more human readable.

I don't know if there are any Ant task for git (I googled a bit without success), anyway Ant can update a properties file with Piotr's option (git rev-parse HEAD) and then in runtime use that properties to get the revision number. This is cleaner and IDE friendly than having Ant generating a .java file.

If it helps for someone else. I know yours is ANT
For MAVEN build, you could probably use git-commit-id-plugin in your pom.xml file
<plugin>
<groupId>pl.project13.maven</groupId>
<artifactId>git-commit-id-plugin</artifactId>
<version>2.2.0</version>
<executions>
<execution>
<goals>
<goal>revision</goal>
</goals>
</execution>
</executions>
<configuration>
<dotGitDirectory>${project.basedir}/.git</dotGitDirectory>
<generateGitPropertiesFile>true</generateGitPropertiesFile>
<generateGitPropertiesFilename>${project.build.outputDirectory}/git.properties</generateGitPropertiesFilename>
</configuration>
</plugin>
Please go through :
1. http://www.baeldung.com/spring-git-information &
2. https://github.com/ktoso/maven-git-commit-id-plugin for more info.

I wrote an Ant task to get the buildnumber using JGit API (without git command line app), see jgit-buildnumber-ant-task. Then you can store this buildnumber inside MANIFEST.MF file and get it from the classpath on runtime.

git rev-parse HEAD will print what you probably want (e.g. id of HEAD commit).
You can make ant generate a simple Java class with this id as a static constant.

First, you can use ident gitattribute with $Id$ keyword (although it is not probably what you want; it is hash of file contents, and has nothing to do with current project version).
Second, you can do it the way Linux kernel and Git itself do it: in Makefile (in your case: in Ant file) there is rule which replaces some placeholder, usually '##VERSION##' (but in case of Perl it is '++VERSION++') by result of GIT-VERSION-GEN, which in turn uses "git describe". But for that to be useful you have to tag your releases (using annotated / signed tags).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.