Improve efficacy of Gradle Build Cache

Improve efficacy of Gradle Build Cache - java

I am building a Java/Spring Boot back-end application using Gradle. The application consists of 30 modules, all of them written in Java.
I am trying to utilize the Gradle build cache to get faster builds, however, I struggle with its efficacy. On consecutive builds with build caches activated, just ~60 of 240 tasks are "from cache", the remaining are rebuilt. This even happens, when there has not been a single modification to the application code between running the first and the second build. When there are modifications to source code, there'll be even less tasks that are cache hits.
I would have been expecting build cache to be able to eradicate most if not all tasks when doing a second build without code modifications. Unfortunately, I find little information about how to investigate. Is my assumption incorrect? If not, then how do I find out what's keeping the build cache from being more efficient?

Related

How can I optimize docker builds of multiple, related maven java projects? (caching)

I manage a large propriety system that's compromised of about a dozen services in java. We have a core set of java libs that these all share ), and all the components/apps are built using maven. Outside of the core SDK jars though each app has its own unique set of dependencies. I can't figure out what the best approach is to both building and deploying inside docker. Ideally I want the entire lifecycle in docker, using a multi-stage build approach. But, I can't see how to optimize this with the huge number of dependencies.
It looks like I can do 2 approaches.
Build as we have before, using maven and a common cache on the CI server (jenkins) so that dependencies are fetched once and cached, and accessible to all the apps. Then have a dockerfile for each app that just copies the product jar and it's dependencies (or a fat jar) into the container, and set it up to execute. Downside of this approach is that the build itself is something that could differ between developers and the CI server. Potentially use a local maven cache like nexus just to avoid pulling deps from the internet everytime? But that still doesn't solve the problem that a dev build won't necessarily match the CI build environment.
Use multi-stage dockerfile for each project. I've tried this, and it does work and I managed to get the maven dependencies layer to cache so that it doesn't fetch too often. Unfortunately that intermediate build layer was hitting 1-2gb per application, and I can't remove the 'dangling' intermediates from the daemon or all the caching is blowing away. It also means there's a tremendous amount of duplication in the jars that have to be downloaded for each application if something changes in the poms. (ie they all use junit and log4j and many other similarities)
Is there a way to solve this optimally that I'm not seeing? All the blogs I've found basically focus on the 2 approaches above (with some that focus on running maven itself in a container, which really doesn't solve anything for me). I'll probably need to end up going with Option 1 if there aren't any other good solutions.
I've checked around on stackoverflow and blogs, and everything I can find seems to assume that you're really just building a single app and not a suite of them, where it becomes important to not repeat the dependency downloads.

I think it is OK to use the .m2/repository filesystem cache as long as you set the --update-snapshots option in your maven build. It scales better, because you cache each .jar only once per build environment and not once per application. Additionally a change in a single dependency does not invalidate the entire cache, which would be the case if you use docker-layer-caching.
Unfortunately that cannot be combined well with multi-stage builds at the moment, but you are not the only one asking for it.
This issue requests adding a --volume option to the docker build command. This one asks for allowing instructions like this in the Dockerfile: RUN --mount=m2repo=/var/mvn/repo mvn install.
Both features would allow you to use the local maven filesystem cache during your multistage build.
For the moment I would advise to keep your option 1 as solution, unless you are facing many issues which are due to differing build environments.

What are the practical uses of gradle in selenium webdriver?

Am posting the below query after failed to understand the practical purpose of Gradle in selenium..
I am quite successful in using selenium webdriver without using Gradle or any other build tool.
I have seen Selenium developers using Gradle as a build tool for their selenium projects. I fail to see the reason behind using Gradle as a build tool except dependency management. I understand that it is crucial in java development projects.
I have explored a lot as to why we need to use Gradle.. what is the practical use in a selenium project..
Please correct me if am wrong.. As per my knowledge,In a selenium project,No one requires to do continuous build(If any changes made to source every time) or perform some operation after build except few stuffs like deploy jar in some location.... Because, we would make that as part of execution plan...
Being in Automation for quite some time below are the things that i use to perform and believe most of the projects would go in the same way..
No need to perform any task after saving my script every time
After developing the script, schedule as a batch for execution...
Heard, using build tool only for dependencies management
Below are my queries:-
What can be achieved in gradle(Except dependencies), I understand that we can put up tasks in the build file so that it will be executed after every build.. like creating a file or put jar in some location..Could you please explain me, what kind of realtime tasks that we would need to perform after / during the build..
What else can be done with gradle- selenium to utilize it in a maximum level...

We can build Check in , check out mechanism for selenium codes. People working from different machines can do check out , do the changes then commit and push the code to the master and check in the code.
Users can create their own build mechanisms. Running the test suite is also quite simple in Gradle

Maven incremental building

We currently have a big Maven 2 project that is rather a collection of many single standalone projects with complicated dependencies, with the exception of some common parent POMs for building. In the end we always have to ship the application as one piece, so I would rather like to convert it to one or a few big projects.
Does anyone have experience in how to optimize continuous integration builds for big projects. Is the incremental build functionality of Maven or Hudson any good? I would prefer not to wait always 2 hours when having made only a small change in one module.
On the other hand, to be sure, you always would have to rebuild and re-test at least all direct and indirect dependencies of a changed module. That is also what we are currently doing with Hudson, triggering automatically all dependent jobs.
Does a split up into multiple build jobs for the same project pay off? I generally do not like to have artifacts on the server where all the other generated stuff like reports, docs, etc. could possibly be out of date.
Thanks for any thoughts.

I just did some more testing and as I found out Maven does not really support incremental builds. Without any plugins Maven actually has a dangerous behavior. If you change the code in some module and compile without prior clean, dependent modules will not get rebuilt, meaning they then reference an old outdated version of the dependency and will not react to the updated code.
With the incremental build plugin it is possible to build without clean. Every module that changed gets rebuilt plus all dependents will be cleaned and rebuilt. However, in my case compiling uses only like 10% of the build time, 90% are for testing. And when I install/deploy all tests get executed again, so the time benefit from the incremental build plugin is very small.
So I still only see the option of splitting up builds in Hudson which is hardly ideal in my opinion.

I would highly suggest not to split up into different build jobs. This, in my experience, can get out of hand quickly with upstream and downstream dependencies. Incremental building works great for what you need it for. If the dependencies are set directly only artifacts that change and their dependencies are rebuilt.
I would split up the build jobs though if they are completely separate applications with no or very few dependnecies (if thats true then they shouldn't be under the same reactor and thus the incremental builds would not be possible)

question about application instance management

I am currently working on a rather large project with a team distributed across the United States. Developers regular commit code to the source repository. We have the following application builds (all are managed by an application, no manual processes):
Continuous Integration: a monitor checks to see if the code repository has been updated, if so it does a build and runs our unit test suite. On errors, the team receive email notifications
Daily Build: Developers use this build to verify their bug fixes or new code on an actual application server, and if "things" succeed, the developer may resolve the task.
Weekly Build: Testers verify the resolved issue queue on this build. It is a more stable testing environment.
Current Release build: used for demoing and an open testing platform for potential new users.
Each build refreshes the database associated with it. This cleans data and verifies any databases changes that go along with the new code are pulled in. One concern I hear from our testers is that we need to pre-populate the weekly build database with some expected testing data, as opposed to more generic data that developers work with. This seems like a legitimate concern/need and is something we are working on.
I am tossing what we are doing out to see if the SO community sees any gap with what we are doing, or have any concerns. Things seems to be working well, but it FEELS like it could be better. Your thoughts?

An additional step that is followed is that once the release build passes tests (say smoke test) then it is qualified as a good build (say a golden build) and you use some kind of labeling mechanism to label all the artefacts (code, install scripts, makefiles, installable etc.) that went into the creation of the golden image. The golden build may become a release candidate later or not.
Probably you are already doing this, since you don't mention I added what I had observed.

this is pretty much the way we do it.
The DB of the testers themselves is only reset on demand. If we would refresh this automatically every week then
we would lose the references to bug symptoms; if a bug is found but a developer only looks at it a few weeks later (or simply after the weekend) then all eveidence of that bug may have dissapeared
testers might be in the middle of a big test case (taking more than 1 day for instance)
we have tons of unit tests which are running against a DB which is refreshed (automatically of course) each time an integration build is executed
regards,
Stijn

I think you have a good, comprehensive process, as long as it fits in with when your customers want to see updates. One possible gap I can see is that it looks like you wouldn't be able to get a critical customer bug fix into production in less than a week, since your test builds are weekly and then you'd need time for the testers to verify the fix.
If you fancy thinking about things a different way, have a look at this article on continuous deployment - it can be a bit hard to accept the concept at first, but it definitely has some potential.

Build management/ Continuous Integration best practices

How does your team handle Builds?
We use Cruise Control, but (due to lack of knowledge) we are facing some problems - Code freeze in SVN - Build management
Specifically, how do you make available a particular release when code is constantly being checked in?
Generally, can you discuss what best practices you use in release management?

I'm positively astonished that this isn't a duplicate, but I can't find another one.
Okay, here's the deal. They are two separate, but related questions.
For build management, the essential point is that you should have an automatic, repeatable build that rebuilds the entire collection of software from scratch, and goes all the way to your deliverable configuration. in other words, you should build effectively a release candidate every time. Many projects don't really do this, but I've seen it burn people (read "been burned by it") too many times.
Continuous integration says that this build process should be repeated every time there is a significant change event to the code (like a check in) if at all possible. I've done several projects in which this turned into a build every night because the code was large enough that it took several hours to build, but the ideal is to set up your build process so that some automatic mechanism --- like an ant script or make file --- only rebuilds the pieces affected by a change.
You handle the issue of providing a specific release by in some fashion preserving the exact configuration of all affected artifacts for each build, so you can apply your repeatable build process to the exact configuration you had. (That's why it's called "configuration management.") The usual version control tools, like git or subversion, provide ways to identify and name configurations so they can be recovered; in svn, for example, you might construct a tag for a particular build. You simply need to keep a little bit of metadata around so you know which configuration you used.
You might want to read one of the "Pragmatic Version Control" books, and of course the stuff on CI and Cruise Control on Martin Fowler's site is essential.

Look at continuous integration: best pratices, from Martin Fowler.
Well, I have managed to find a related thread, I participated in, a year ago. You might find it useful, as well.
And here is how we do it.
[Edited]
We are using Cruise Control as integration tool. We just deal with the trunk, which is the main Subversion repository in our case. We seldom pull out a new branch for doing new story cards, when there is a chance of complex conflicts. Normally, we pull out a branch for a version release and create the build from that and deliver that to our test team. Meanwhile we continue the work in trunk and wait for the feedback from test team. Once all tested we create a tag from the branch, which is immutable logically in our case. So, we can release any version any time to any client in case. In case of bugs in the release we don't create tag, we fix the things there in the branch. After getting everything fixed and approved by test team, we merge the changes back to trunk and create a new tag from the branch specific to that release.
So, the idea is our branches and tags are not really participating in continuous integration, directly. Merging branch code back to the trunk automatically make that code becomes the part CI (Continuous Integration). We normally do just bugfixes, for the specific release, in branches, so it doesn't really participate into CI process, I believe. To the contrary, if we start doing new story cards, for some reasons, in a branch, then we don't keep that branch apart too long. We try to merge it back to trunk as soon as possible.
Precisely,
We create branches manually, when we plan a next release
We create a branch for the release and fix bugs in that branch in case
After getting everything good, we make a tag from that branch, which is logically immutable
At last we merge the branch back to trunk if has some fixes/modifications

Release Management goes well beyond continuous integration.
In your case, you should use Cruise Control to automatically make a tag, which allows developers to go on coding while your incremental build can take place.
If your build is incremental, that means you can trigger it every x minutes (and not for every commit, because if they are too frequent, and if your build is too long, it may not have time to finish before the next build tries to take place). The 'x' should be tailored to be longer that a compilation/unit test cycle.
A continuous integration should include automatic launch of unit tests as well.
Beyond that, a full release management process will involve:
a series of deployment on homologation servers
a full cycle of homologation / UAT (User Acceptance Test)
non-regression tests
performance / stress tests
pre-production (and parallel run tests)
before finally releasing into production.
Again "release management" is much more complex than just "continuous integration" ;)

Long story short: Create a branch copied from trunk and checkout/build your release on that branch on the build server.
However, to get to that point in a completely automated fashion using cc.net is not an easy task. I could go into details about our build process if you like, but it's probably too fine grained for this discussion.
I agree with Charlie about having an automatic, repeatable build from scratch. But we don't do everything for the "Continuous" build, only for Nightly, Beta, Weekly or Omega (GA/RTM/Gold) release builds. Simply because some things, like generating documentation, can take a long time, and for the continuous build you want to provide developer with rapid feedback on a build result.
I totally agree with preserving exact configuration, which is why branching a release or tagging is a must. If you have to maintain a release, i.e. you can't just release another copy of trunk, then a branch on release approach is the way to go, but you will need to get comfortable with merging.

You can use Team Foundation Server 2008 and Microsoft Studio Team System to accomplish your source control, branching, and releases.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.