Container Portability and Java Language [closed]

Container Portability and Java Language [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am new to Docker and been reading a lot about it. But when I look at it from Java application perspective, I am not sure what value addition it does in terms of 'packaging dependencies' which is one of the important factor called out in their documentation.
Java is already a language which can run on multiple OS using the JVM layer abstraction.Build once , run anywhere is not new concept. Docker container do allow me to ship my JRE version along with my application code. So I see that benefit, but is there any other benefit I get, especially when my environments( host environments) aren't going to change. i.e I will be using Linux boxes for deployments.
A fat jar file is as good as a packaging can get to bundle all the dependencies using maven build. I understand that containers really help with deploying on the platforms like Kubernetes, but if I have to strictly judge containers in terms of packaging issue, isn't jar package enough? I may have to still containerize it to benefit from running light weight processes instead of running them on VMs.
Does the JRE layer gets reused in all other containers ? This would be akin to installing the JRE on my VM boxes. All apps on the box will use the same JRE version. Unless, I need to run diff versions of JRE for my application which is highly unlikely.

If you have a working deployment system using established technologies, you should absolutely keep using it, even if there is something newer and shinier. The Java app server space is very mature and established, and if you have a pure-JVM application and a working setup, there's nothing wrong with staying on that setup even if Docker containers exist now.
As you note, a Docker image contains a JVM, and an application server, and library dependencies, and the application. If you have multiple images, they're layered correctly, and these details match exactly then they could be shared, but there's also a very real possibility that one image has a slightly newer patch release of a JVM or the base Linux distribution than another. In general I'd say the Docker ecosystem broadly assumes that applications using "only" tens of megabytes of disk or memory aren't significant overhead; this is a major difference from the classic Java ecosystem where multiple applications would run on a single shared JVM inside a single process.
# This base image will be shared between derived images; _if_ the
# version/build matches exactly
FROM tomcat:10-jdk11
# These libraries will be shared between derived images; _if_ the
# _exact_ set of files match _exactly_, and Tomcat is also shared
COPY target/libs/*.jar /usr/local/tomcat/lib
# This jar file presumably won't be shared
COPY target/myapp.jar /usr/local/tomcat/webapps
I'd start looking into containers if you had a need to also incorporate non-JVM services into your overall system. If you have a Java component, and a Node component, and a Python component, and they all communicate over HTTP, then Docker will make them also all deploy the same way and it won't really matter which pieces are in which languages. Trying to stay within the JVM ecosystem (and maybe using JVM-based language implementations like JRuby or Jython, if JVM-native languages like Java/Kotlin/Groovy/Scala don't meet your needs) makes sense the way you've described your current setup.

The JVM/JRE doesn't get reused. You may feel that running in an application server environment would be better. Docker will in comparison to running on a JSSE have a higher overhead
The advantage of running just Docker is diminishingly small compared to that.
Some advantages could be:
Testing
Testing out your code on different JRE versions quickly
Automated testing. With a dockerfile, your CI/CD pipeline can check out your code, compile it, spin up a docker image, run the tests and spit out a junit formatted test report.
Having consistent environments (dependecy injection (like JKS, config, etc), OS version, JRE, etc)
Environment by configuration.
You don't have to spend time installing the OS, JRE, etc, that is a configuration file in your Source Control System of choice.
This makes disaster recovery much easier
Migrations are simplified (partially)
The adantages of running in an orchestrated environment PaaS for instance using just kubernetes, or openshift, or something like that are (in addition to the base docker):
Possibility to do canary deployments
Routing, scaling and loadbalancing across same or several machines to optimize usage based on machine (there are sweetspots beyond which JRE performance lags for some operations)

A fat jar file is as good as a packaging can get to bundle all the
dependencies using maven build
It's not as good as it can get.
Your app probably has a bunch of dependencies: Spring, Tomcat, whatever. In any enterprise application, the size of the final artifact will be made up of something like 99% dependencies, 1% your code. These dependencies probably change infrequently, only when you add some functionality or decide to bump up a version, while your code changes very frequently.
A fat JAR means that every time you deploy, every time you push to a repo host (e.g. Nexus), you are wasting time uploading or download something that's 99% identical every time. It's just a dumb zip file.
Docker is a smarter format. It has the concept of layers, so you can (and are encouraged to) use one layer for dependencies and another for your code. This means that if the dependency layer doesn't change, you don't have to deploy it again, or update it again in your deployments.
So you can have faster CI builds, that require less disk space in your repo host, and can be installed faster. You can also use the layers to more easily validate your assumptions that only business code has changed, say.

Related

Advantages of using Fat Jars over Containers

I wanna know the difference between the two and also which one is currently used more according to industry standards. I tried finding resources online but the content about Fat Jars is very less and also almost no contrast is shown between the two anywhere.

It seems not a big deal at a first glance. The reason for that is that the Java packaging system is really mature and grew strong over many years. Many other ecosystems do not benefit from this and can benefit greatly from being packaged into a container image. But containers are not only about packaging. Packaging could almost be considered a side-effect of it.
Amongst others, some benefits of using containers over simple Fat JARS are:
Simplified infrastructure
For a big (or mid-sized) enterprise built around microservices, chances are that not all of them are using the same languages and tools. A container provides a predictable way of deploying all those different things in the same manner, so it greatly simplifies the infrastructure, thus dramatically reducing company costs with it. This becomes even more important when deploying into the cloud, especially in multi-cloud-provider scenarios, and in that case, container orchestration provided by software like Kubernetes help a great deal without much effort.
Consistency
Another benefit of containers over regular JARs is consistency across environments. Imagine for instance that you deploy your JAR into DEV (running Java 8) and then into PROD (running Java 10). For some reason the JVM behaves differently, either because of default Garbage Collector or something else, making your program behave differently in both environments. When you deploy through a container image, the same image will be used across different environments, and thus the same Java version will be always used, making it less error-prone.
Resource isolation
Yet another benefit is resource isolation. Containers can make sure that your application "see" only a predetermined amount of memory and CPUs. Actually, there were some improvements in Java 10 regarding this matter, you can read more about it here.
Hope this provides with a better point of view regarding this matter.

Depending on your needs, you can use either standalone fat jars or fat jars inside containers. Key point I'm making is there is no standoff between the two, in fact they are complimentary.
Fat jar is a primary way to wrap java application so it could be easily containerized.
Of course, you could use fat jars in standalone ways too.
Pros of containerization:
You can take advantage of unified tooling for containers. I.e., you set up docker and you can run any container - be it java with fat jar or node.js or anything else.
You can take advantage of various container orchestration systems (docker compose, docker swarm, kubernetes, etc) - meaning you use unified tooling for healthchecks, monitoring, rolling updates, etc.
You don't need to worry about things like JRE / JDK versions on your system.
When you may still want to use standalone:
When you have java-centric architecture and established processes around that and it would be costly to change from that to modern container orchestration.
When you are using java as primary scripting (or application) platform on your instance and simply don't want any overhead of containers.

Easy deployment of a jvm based web server on a remote machine

I wanted to know what is the easiest way to deploy a web server made using java or kotlin. With nodejs, I just keep all the server code on remote machine and edit it using the sshfs plugin for vscode. For jvm based servers, this doesn't appear as easy since intellij doesn't provide remote editing support. Is there a method for jvm based servers which allows quick iterative development cycle?

Do you have to keep your server code on remote machine? How about developing and testing it locally, and only when you want to test it on the actual deployment site, then deploy it?
I once tried to use SSH-FS with IntelliJ, and because of the way IntelliJ builds its cache, the performance was terrible. The caching was in progress, but after 15 minutes I gave up. And IntelliJ without its caching and smart hints would be close to a regular editor.
In my professional environment, I also use Unison from time to time: https://www.cis.upenn.edu/~bcpierce/unison/. I have it configured in a way to copy only code, not the generated sources. Most of the times it works pretty well, but it tends to have its quirks which can make you waste half a day on debugging it.
To sum up, I see such options:
Developing and testing locally, and avoiding frequent deployments to the remote machine.
VSCode with sshfs plugin, because why not, if it's enough for you for nodejs?
A synchronization tool like Unison.
Related answers regarding SSHFS from IntelliJ Support (several years old, but, I believe, still hold true):
https://intellij-support.jetbrains.com/hc/en-us/community/posts/206592225-Indexing-on-a-project-hosted-via-SSHFS-makes-pycharm-unusable-disable-indexing-
https://intellij-support.jetbrains.com/hc/en-us/community/posts/206599275-Working-directly-on-remote-project-via-ssh-

A professional deployment won't keep source code on the remote server, for several reasons:
It's less secure. If you can change your running application by editing source code and recompiling (or even if edits are deployed automatically), it's that much easier for an attacker to do the same.
It's less stable. What happens to users who try to access your application while you are editing source files or recompiling? At best, they get an error page; at worst, they could get a garbage response, or even a leak of customer data.
It's less testable. If you edit your source code and deploy immediately, how do you test to ensure that your application works? Throwing untested buggy code directly at your users is highly unprofessional.
It's less scalable. If you can keep your source code on the server, then by definition you only have one server. (Or, slightly better, a small number of servers that share a common filesystem.) But that's not very scalable: you're clearly hosted in only one geographic location and thus vulnerable to all kinds of single points of failure. A professional web-scale deployment will need to be geographically distributed and redundant at every level of the application.
If you want a "quick iterative development cycle" then the best way to do that is with a local development environment, which may involve a local VM (managed with something like Vagrant) or a local container (managed with something like Docker). VMs and containers both provide mechanisms to map a local directory containing your source code into the running application server.

Java JVM on Docker/CoreOS

I'm learning CoreOS/Docker and am trying to wrap my mind around a few things.
With Java infrastructure, is it possible to use the JVM in it's own container and have other Java apps/services use this JVM container? If not, I'm assuming the JVM would have to be bundled in each container, so essentially you have to pull the Java dockerfile and merge my Java services; essentially creating a Linux Machine + Java + Service container running on top of the CoreOS machine.
The only other thought I had was it might be possible to run the JVM on CoreOS itself, but it seems like this isn't possible.

It is actually possible to just untar Orcale Java in /opt, but that's just a kind of last resort. The Oracle binaries of JRE and JDK don't require any system libraries, so it's pretty easy anywhere.
I have written some pretty small JRE and JDK images, with which I was able to run Elasticsearch and other major open-source applications. I also wrote some containers that allow me to compile jars on CoreOS (errordeveloper/mvn, errordeveloper/sbt & errordeveloper/lein).
As #ISanych pointed out, running multiple Java containers will not impact disk usage, it's pretty much equivalent to running multiple JVMs on the host. If you find that running multiple JVMs is not quite your cuppa tea, then the answer is really that JVM wouldn't have to be as complex as it is if containers existed before it. However, Java in container is still pretty good, as you can have one classpath that would be fixed forever and you won't get into dependency hell. Perhaps instead of building uberjars (which is what I mostly do, despite that they are known to be not exactly perfect, but I am lazy) one could instead bundle jars in tarball and then use ADD jars.tar /app/lib/ in their Dockerfile.

Applications that run on JVM will have to have JVM installed in the container. So if you want to split application components into separate containers, each of these containers need to have JVM.
On a side note, containers can talk to each other via a process called container linking

Best practice will be to create image with jvm and then other images based on jvm image (from jvm in Dockerfile). Even if you will create many different images they would not waste much space as docker is using layered file system and all containers will use the same single copy of jvm image. Yes each jvm will be separate process eating own memory - but isolated environments that is what docker is used for.

Can multiple JVM processes share memory for common classes?

I'd like to run multiple Java processes on my web server, one for each web app. I'm using a web framework (Play) that has a lot of supporting classes and jar files, and the Java processes use a lot of memory. One Play process shows about 225MB of "resident private" memory. (I'm testing this on Mac OS X, with Java 1.7.0_05.) The app-specific code might only be a few MB. I know that typical Java web apps are jars added to one server process (Tomcat, etc), but it appears the standard way to run Play is as a standalone app/process. If these were C programs, most of that 200MB would be shared library and not duplicated in each app. Is there a way to make this happen in Java? I see some pages about class data sharing, but that appears to apply only to the core runtime classes.

At this time and with the Oracle VM, this isn't possible.
But I agree, it would be a nice feature, especially since Java has all the information it needs to do that automatically.
Of the top of my hat, I think that the JIT is the only reason why this can't work: The JIT takes runtime behavior into account. So if app A uses some code in a different pattern than app B, that would result in different assembler code generated at runtime.
But then, the usual "pattern" is "how often is this code used." So if app A called some method very often and B didn't, they could still share the code because A has already paid the price for optimizing/compiling it.
What you can try is deploy several applications as WAR files into a single VM. But from my experience, that often causes problems with code that doesn't correctly clean up thread locals or shutdown hooks.

IBM JDK has a jvm parameter to achieve this. Check out # http://www.ibm.com/developerworks/library/j-sharedclasses/
And this takes it to the next step : http://www.ibm.com/developerworks/library/j-multitenant-java/index.html

If you're using a servlet container with virtual hosts support (I believe Tomcat does it) you would be able to use the play2-war-plugin. From Play 2.1 the requirement of always being the root app is going to be lifted so you will probably be able to use any servlet container.
One thing to keep in mind if that you will probably have to tweak the war file to move stuff from WEB-INF/lib to your servlet container's lib directory to avoid loading all the classes again and this could affect your app if it uses singleton or other forms of class shared data.

The problem of sharing memory between JVM instances is more pressing on mobile platforms, and as far as I know Android has a pretty clever solution for that in Zygote: the VM is initialized and then when running the app it is fork()ed. Linux uses copy-on-write on the RAM pages, so most of the data won't be duplicated.
Porting this solution might be possible, if you're running on Linux and want to try using Dalvik as your VM (I saw claims that there is a working port of tomcat on Dalvik). I would expect this to be a huge amount of work, eventually saving you few $s on memory upgrades.

Java - Hot deployment

Recently, I was reading book about Erlang which has hot deployment feature. The deployment can be done without bringing the system down. All the existing requests will be handled by old version of code and all the new request after deployment will be served by the new code. In these case, both the versions of code available in runtime for sometime till all the old requests are served. Is there any approach in Java where we can keep 2 versions of jar files? Is there any app/web servers support this?

If your intention is to speed up development then JRebel is a tool for just this purpose. I wouldn't however recommend to use it to patch a production system.
JRebel detects whenever a class file has changed and reloads it into the running appserver without throwing any old state away. This is much faster compared to what most appservers do when redeploying a whole war/ear where the whole initialization process must rerun.

There are many ways to achieve hot deployment in the Java world, so you'll probably need to be a bit more specific bout your context and what you are trying to achieve.
Here are some good leads / options to consider:
OSGi is a general purpose module system that supports hot deployment
Clojure is a dynamic JVM language that enables a lot of runtime interactivity. Clojure is often used for "live coding" - pretty much anything can be hot-swapped and redefined at runtime. Clojure is a functional language with a strong emphasis on immutability and concurrency, so in some ways has some interesting similarities with Erlang. Clojure has some very nice web frameworks like Noir that are suitable for hot swapping web srever code.
The Play Framework is designed to enable hot swapping code for productivity reasons (avoiding web server restarts). Might be relevant if you are looking primarily at hot-swapping web applications.
Most Java applicatio servers such as JBoss support some form of hot-swapping for web applications.

The only reason for hot updates on production application is the aim to provide zero downtime to the users.
LiveRebel (based on JRebel) is the tool that could be used in conjunction with Jenkins, for instance. It can do safe hotpatching as well as rolling restarts while draining the sessions on the production cluster.

Technically, you CAN do this yourself. Although, I wouldn't recommend it since it can get complicated quickly. But the idea is that you can create a ClassLoader and load your new version of your class. Then make sure that your executing code knows about the new ClassLoader.
I would recommend just using JBoss and redeploying your jars and wars. Nice and simple for the most part.
In either case, you have make sure you don't have any memory leaks because You'll run out of PermGen space after a few redeployments.

Add RelProxy to your toolbox, RelProxy is a hot class reload for Java (and Groovy) to improve development even for production, with just one significative limitation, reloading only is only possible in a subset of your code.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.