Java JVM on Docker/CoreOS

Java JVM on Docker/CoreOS - java

I'm learning CoreOS/Docker and am trying to wrap my mind around a few things.
With Java infrastructure, is it possible to use the JVM in it's own container and have other Java apps/services use this JVM container? If not, I'm assuming the JVM would have to be bundled in each container, so essentially you have to pull the Java dockerfile and merge my Java services; essentially creating a Linux Machine + Java + Service container running on top of the CoreOS machine.
The only other thought I had was it might be possible to run the JVM on CoreOS itself, but it seems like this isn't possible.

It is actually possible to just untar Orcale Java in /opt, but that's just a kind of last resort. The Oracle binaries of JRE and JDK don't require any system libraries, so it's pretty easy anywhere.
I have written some pretty small JRE and JDK images, with which I was able to run Elasticsearch and other major open-source applications. I also wrote some containers that allow me to compile jars on CoreOS (errordeveloper/mvn, errordeveloper/sbt & errordeveloper/lein).
As #ISanych pointed out, running multiple Java containers will not impact disk usage, it's pretty much equivalent to running multiple JVMs on the host. If you find that running multiple JVMs is not quite your cuppa tea, then the answer is really that JVM wouldn't have to be as complex as it is if containers existed before it. However, Java in container is still pretty good, as you can have one classpath that would be fixed forever and you won't get into dependency hell. Perhaps instead of building uberjars (which is what I mostly do, despite that they are known to be not exactly perfect, but I am lazy) one could instead bundle jars in tarball and then use ADD jars.tar /app/lib/ in their Dockerfile.

Applications that run on JVM will have to have JVM installed in the container. So if you want to split application components into separate containers, each of these containers need to have JVM.
On a side note, containers can talk to each other via a process called container linking

Best practice will be to create image with jvm and then other images based on jvm image (from jvm in Dockerfile). Even if you will create many different images they would not waste much space as docker is using layered file system and all containers will use the same single copy of jvm image. Yes each jvm will be separate process eating own memory - but isolated environments that is what docker is used for.

Related

Container Portability and Java Language [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 1 year ago.
Improve this question
I am new to Docker and been reading a lot about it. But when I look at it from Java application perspective, I am not sure what value addition it does in terms of 'packaging dependencies' which is one of the important factor called out in their documentation.
Java is already a language which can run on multiple OS using the JVM layer abstraction.Build once , run anywhere is not new concept. Docker container do allow me to ship my JRE version along with my application code. So I see that benefit, but is there any other benefit I get, especially when my environments( host environments) aren't going to change. i.e I will be using Linux boxes for deployments.
A fat jar file is as good as a packaging can get to bundle all the dependencies using maven build. I understand that containers really help with deploying on the platforms like Kubernetes, but if I have to strictly judge containers in terms of packaging issue, isn't jar package enough? I may have to still containerize it to benefit from running light weight processes instead of running them on VMs.
Does the JRE layer gets reused in all other containers ? This would be akin to installing the JRE on my VM boxes. All apps on the box will use the same JRE version. Unless, I need to run diff versions of JRE for my application which is highly unlikely.

If you have a working deployment system using established technologies, you should absolutely keep using it, even if there is something newer and shinier. The Java app server space is very mature and established, and if you have a pure-JVM application and a working setup, there's nothing wrong with staying on that setup even if Docker containers exist now.
As you note, a Docker image contains a JVM, and an application server, and library dependencies, and the application. If you have multiple images, they're layered correctly, and these details match exactly then they could be shared, but there's also a very real possibility that one image has a slightly newer patch release of a JVM or the base Linux distribution than another. In general I'd say the Docker ecosystem broadly assumes that applications using "only" tens of megabytes of disk or memory aren't significant overhead; this is a major difference from the classic Java ecosystem where multiple applications would run on a single shared JVM inside a single process.
# This base image will be shared between derived images; _if_ the
# version/build matches exactly
FROM tomcat:10-jdk11
# These libraries will be shared between derived images; _if_ the
# _exact_ set of files match _exactly_, and Tomcat is also shared
COPY target/libs/*.jar /usr/local/tomcat/lib
# This jar file presumably won't be shared
COPY target/myapp.jar /usr/local/tomcat/webapps
I'd start looking into containers if you had a need to also incorporate non-JVM services into your overall system. If you have a Java component, and a Node component, and a Python component, and they all communicate over HTTP, then Docker will make them also all deploy the same way and it won't really matter which pieces are in which languages. Trying to stay within the JVM ecosystem (and maybe using JVM-based language implementations like JRuby or Jython, if JVM-native languages like Java/Kotlin/Groovy/Scala don't meet your needs) makes sense the way you've described your current setup.

The JVM/JRE doesn't get reused. You may feel that running in an application server environment would be better. Docker will in comparison to running on a JSSE have a higher overhead
The advantage of running just Docker is diminishingly small compared to that.
Some advantages could be:
Testing
Testing out your code on different JRE versions quickly
Automated testing. With a dockerfile, your CI/CD pipeline can check out your code, compile it, spin up a docker image, run the tests and spit out a junit formatted test report.
Having consistent environments (dependecy injection (like JKS, config, etc), OS version, JRE, etc)
Environment by configuration.
You don't have to spend time installing the OS, JRE, etc, that is a configuration file in your Source Control System of choice.
This makes disaster recovery much easier
Migrations are simplified (partially)
The adantages of running in an orchestrated environment PaaS for instance using just kubernetes, or openshift, or something like that are (in addition to the base docker):
Possibility to do canary deployments
Routing, scaling and loadbalancing across same or several machines to optimize usage based on machine (there are sweetspots beyond which JRE performance lags for some operations)

A fat jar file is as good as a packaging can get to bundle all the
dependencies using maven build
It's not as good as it can get.
Your app probably has a bunch of dependencies: Spring, Tomcat, whatever. In any enterprise application, the size of the final artifact will be made up of something like 99% dependencies, 1% your code. These dependencies probably change infrequently, only when you add some functionality or decide to bump up a version, while your code changes very frequently.
A fat JAR means that every time you deploy, every time you push to a repo host (e.g. Nexus), you are wasting time uploading or download something that's 99% identical every time. It's just a dumb zip file.
Docker is a smarter format. It has the concept of layers, so you can (and are encouraged to) use one layer for dependencies and another for your code. This means that if the dependency layer doesn't change, you don't have to deploy it again, or update it again in your deployments.
So you can have faster CI builds, that require less disk space in your repo host, and can be installed faster. You can also use the layers to more easily validate your assumptions that only business code has changed, say.

Advantages of using Fat Jars over Containers

I wanna know the difference between the two and also which one is currently used more according to industry standards. I tried finding resources online but the content about Fat Jars is very less and also almost no contrast is shown between the two anywhere.

It seems not a big deal at a first glance. The reason for that is that the Java packaging system is really mature and grew strong over many years. Many other ecosystems do not benefit from this and can benefit greatly from being packaged into a container image. But containers are not only about packaging. Packaging could almost be considered a side-effect of it.
Amongst others, some benefits of using containers over simple Fat JARS are:
Simplified infrastructure
For a big (or mid-sized) enterprise built around microservices, chances are that not all of them are using the same languages and tools. A container provides a predictable way of deploying all those different things in the same manner, so it greatly simplifies the infrastructure, thus dramatically reducing company costs with it. This becomes even more important when deploying into the cloud, especially in multi-cloud-provider scenarios, and in that case, container orchestration provided by software like Kubernetes help a great deal without much effort.
Consistency
Another benefit of containers over regular JARs is consistency across environments. Imagine for instance that you deploy your JAR into DEV (running Java 8) and then into PROD (running Java 10). For some reason the JVM behaves differently, either because of default Garbage Collector or something else, making your program behave differently in both environments. When you deploy through a container image, the same image will be used across different environments, and thus the same Java version will be always used, making it less error-prone.
Resource isolation
Yet another benefit is resource isolation. Containers can make sure that your application "see" only a predetermined amount of memory and CPUs. Actually, there were some improvements in Java 10 regarding this matter, you can read more about it here.
Hope this provides with a better point of view regarding this matter.

Depending on your needs, you can use either standalone fat jars or fat jars inside containers. Key point I'm making is there is no standoff between the two, in fact they are complimentary.
Fat jar is a primary way to wrap java application so it could be easily containerized.
Of course, you could use fat jars in standalone ways too.
Pros of containerization:
You can take advantage of unified tooling for containers. I.e., you set up docker and you can run any container - be it java with fat jar or node.js or anything else.
You can take advantage of various container orchestration systems (docker compose, docker swarm, kubernetes, etc) - meaning you use unified tooling for healthchecks, monitoring, rolling updates, etc.
You don't need to worry about things like JRE / JDK versions on your system.
When you may still want to use standalone:
When you have java-centric architecture and established processes around that and it would be costly to change from that to modern container orchestration.
When you are using java as primary scripting (or application) platform on your instance and simply don't want any overhead of containers.

Docker container vs Java Virtual Machine

Is it true to say that to a high degree what is now done in docker containers can also be done in java with jvm if someone wanted to ?
Besides being able to write an application in your own language and having much customisation flexibility does docker basically do what Java has been doing with its virtual machine for ages ? i.e. it provides executable environment separate from the underlying OS.

Generally Docker containers cannot be done "within Java" because Docker serves to encapsulate the application, and "within Java" is the code being loaded after the JVM launches.
The JVM is already running when it parses the class it will search for the main method. So encapsulation at the process level can't be done because the process (the JVM) is already running.
Java has encapsulation techniques which do provide protection between various Java elements (see class loader hierarchies in Tomcat, for an example); but those only isolate "application plugins" from each other, the main process that is running them all is Tomcat, which is really a program loaded into an already running JVM.
This doesn't mean you can't combine the two to achieve some object, it just means that the types of isolation provided between the two products isn't interchangeable.

what is now done in docker containers can also be done in java with jvm someone wanted to
Short Answer: No. You could wrap a docker container around your JVM, but you cannot wrap a JVM around a docker container, non-trivially.
docker basically do what Java has been doing with its virtual machine for ages ? i.e. it provides executable environment separate from the underlying OS.
Docker containers provide isolation from other containers without introducing a virtualisation layer. Thus, they are different and more performant than VMs.

Docker can perform several things that the Java JVM cannot do however, programming in Java and running on the JVM will provide several of the advantages of running in a Docker container.
I work on a large Java project and this project is 20 years old. We have been evolving and fixing our application without any tools or compatibility issues for all these years. Also, as a bonus, the application is platform independent. Several components of it can run in both Windows and Linux. Because no effort was made initially to build a multiplatform application, there is one component that cannot run on Linux. But it would be relatively easy to make it work on that platform.
It would have been much more difficult to do the same thing using C or C++ and associated toolchain.

JVM memory profiling when multiple applications are running on the same JVM

I am running a Web Based Java application on JBOSS and OFBIZ. I am suspecting some memory leak is happening so, did some memory profiling of the JVM on which the application along with JBOSS and OFBIZ are running. I am suspecting garbage collection is not working as expected for the application.
I used VisulaVM, JConsole, YourKit, etc to do the memory profiling. I could see how much heap memory is getting used, how many classes are getting loaded, how many threads are getting created, etc. But I need to know how much of memory is used only by the application, how much by JBOSS and how much by OFBIZ, respectively. I want to find out who is using how much memory and what is the usage pattern. That will help me identify where the memory leak is happening, and where tuning is needed.
But the memory profilers I ran so far, I was unable to differentiate the usage of each application separately. Can you please tell me which tool can help me with that?

There is no way to do this with Java since the Java runtime has no clear way to say "this is application A and this is B".
When you run several applications in one Java VM, you're just running one: JBoss. JBoss then has a very complex classloader but the app you're profiling is actually JBoss.
To do what you want to do, you have to apply filters but this only works when you have a memory leak in a class which isn't shared between applications (so when com.pany.app.a.Foo leaks, you can do this).
If you can't use filters, you have to look harder at the numbers to figure out what's going on. That means you'll probably have to let the app server run out of memory, create a heap dump and then look for what took most of the memory and work from there.
The only other alternative is to install a second server, deploy just one app there and watch it.

You can install and create Docker containers, allowing you to run processes in isolation. This will allow you to use multiple containers with the same base and without having to install the JDK multiple times. The advantage of this is separation of concerns- Every application can be deployed in a separate container. With this, you can then profile any specific application running on the JVM because each namespace is provided with a completely isolated application's view of the operating environment, including process trees, network, user ids and mounted file system.
Here are a couple of resources for Docker:
Deploying Java applications with Docker
JVM plus Docker: Better together
Docker
Please let me know if you have any questions!

Another good tool to use to find java memory leaks is Plumbr. you can try it out for free, it will find the cause for the java.lang.OutOfMemoryError and even shows you the exact location of the problem along with solution guidelines.

I explored various Java memory profilers, and found that YourKit can give me the closest result. In YourKit dashboard you can get links to individual classes running. So, if you are familiar with the codebase, you will know which class belongs to which app. You click on any class, you will see the CPU, Memory usage related to that. Also, if you notice any issues, YourKit can help you trace back to the particular line of the code in your source java files!
If you add YourKit to Eclipse, clicking on the object name in the 'issue area', will highlight the code line in the particular source file, which is the source of the problem.
Pretty cool!!

Can multiple JVM processes share memory for common classes?

I'd like to run multiple Java processes on my web server, one for each web app. I'm using a web framework (Play) that has a lot of supporting classes and jar files, and the Java processes use a lot of memory. One Play process shows about 225MB of "resident private" memory. (I'm testing this on Mac OS X, with Java 1.7.0_05.) The app-specific code might only be a few MB. I know that typical Java web apps are jars added to one server process (Tomcat, etc), but it appears the standard way to run Play is as a standalone app/process. If these were C programs, most of that 200MB would be shared library and not duplicated in each app. Is there a way to make this happen in Java? I see some pages about class data sharing, but that appears to apply only to the core runtime classes.

At this time and with the Oracle VM, this isn't possible.
But I agree, it would be a nice feature, especially since Java has all the information it needs to do that automatically.
Of the top of my hat, I think that the JIT is the only reason why this can't work: The JIT takes runtime behavior into account. So if app A uses some code in a different pattern than app B, that would result in different assembler code generated at runtime.
But then, the usual "pattern" is "how often is this code used." So if app A called some method very often and B didn't, they could still share the code because A has already paid the price for optimizing/compiling it.
What you can try is deploy several applications as WAR files into a single VM. But from my experience, that often causes problems with code that doesn't correctly clean up thread locals or shutdown hooks.

IBM JDK has a jvm parameter to achieve this. Check out # http://www.ibm.com/developerworks/library/j-sharedclasses/
And this takes it to the next step : http://www.ibm.com/developerworks/library/j-multitenant-java/index.html

If you're using a servlet container with virtual hosts support (I believe Tomcat does it) you would be able to use the play2-war-plugin. From Play 2.1 the requirement of always being the root app is going to be lifted so you will probably be able to use any servlet container.
One thing to keep in mind if that you will probably have to tweak the war file to move stuff from WEB-INF/lib to your servlet container's lib directory to avoid loading all the classes again and this could affect your app if it uses singleton or other forms of class shared data.

The problem of sharing memory between JVM instances is more pressing on mobile platforms, and as far as I know Android has a pretty clever solution for that in Zygote: the VM is initialized and then when running the app it is fork()ed. Linux uses copy-on-write on the RAM pages, so most of the data won't be duplicated.
Porting this solution might be possible, if you're running on Linux and want to try using Dalvik as your VM (I saw claims that there is a working port of tomcat on Dalvik). I would expect this to be a huge amount of work, eventually saving you few $s on memory upgrades.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.