My Question: What is a good way to deal with two different versions of an API? (Sub-question: Is there any way to avoid classpath reference problems if you include two libraries with the same class-names?)
Description of question: I have a project that uses an API. I have spent the last few months developing for the old version and I'm about to add features for a newer version. So far as I can tell there's only one critical difference. However, I don't have the API yet (waiting on someone to get me the jar) so I can't be sure whether there are more differences.
Description of subquestion: I'm worried that there may be class reference inconsistencies between the two APIs (like I said, I don't have the jar yet to be sure).
I realize you may want some code to look at, but this is a design question, not a coding issue. I'm hoping to get some best practices out of this. Thanks!
Not a duplicate: I've looked at a few questions which may appear to be duplicates, but they didn't really address my issue :)
Avoid, or shade.
If you have full control of the classpath you can use ordering: this is brittle and non-intuitive.
If you don't, you're at the mercy of whatever is, hence shading, and its related headaches.
There are also classloader isolation strategies you can use. For example, if you're in a container environment (Java EE, OSGi) you can put the different versions of the libraries in different classloader contexts (i.e., different EJB jars, or different web applications) and they won't interfere with each other.
OSGi can do the same sort of thing; deploy library A.1 and A.2 into your OSGi container, and deploy project.uno (which uses A.1) and project.dos (which uses A.2) into the same JVM, and the OSGi container handles resolution of the classpaths.
Related
I wanna know the difference between the two and also which one is currently used more according to industry standards. I tried finding resources online but the content about Fat Jars is very less and also almost no contrast is shown between the two anywhere.
It seems not a big deal at a first glance. The reason for that is that the Java packaging system is really mature and grew strong over many years. Many other ecosystems do not benefit from this and can benefit greatly from being packaged into a container image. But containers are not only about packaging. Packaging could almost be considered a side-effect of it.
Amongst others, some benefits of using containers over simple Fat JARS are:
Simplified infrastructure
For a big (or mid-sized) enterprise built around microservices, chances are that not all of them are using the same languages and tools. A container provides a predictable way of deploying all those different things in the same manner, so it greatly simplifies the infrastructure, thus dramatically reducing company costs with it. This becomes even more important when deploying into the cloud, especially in multi-cloud-provider scenarios, and in that case, container orchestration provided by software like Kubernetes help a great deal without much effort.
Consistency
Another benefit of containers over regular JARs is consistency across environments. Imagine for instance that you deploy your JAR into DEV (running Java 8) and then into PROD (running Java 10). For some reason the JVM behaves differently, either because of default Garbage Collector or something else, making your program behave differently in both environments. When you deploy through a container image, the same image will be used across different environments, and thus the same Java version will be always used, making it less error-prone.
Resource isolation
Yet another benefit is resource isolation. Containers can make sure that your application "see" only a predetermined amount of memory and CPUs. Actually, there were some improvements in Java 10 regarding this matter, you can read more about it here.
Hope this provides with a better point of view regarding this matter.
Depending on your needs, you can use either standalone fat jars or fat jars inside containers. Key point I'm making is there is no standoff between the two, in fact they are complimentary.
Fat jar is a primary way to wrap java application so it could be easily containerized.
Of course, you could use fat jars in standalone ways too.
Pros of containerization:
You can take advantage of unified tooling for containers. I.e., you set up docker and you can run any container - be it java with fat jar or node.js or anything else.
You can take advantage of various container orchestration systems (docker compose, docker swarm, kubernetes, etc) - meaning you use unified tooling for healthchecks, monitoring, rolling updates, etc.
You don't need to worry about things like JRE / JDK versions on your system.
When you may still want to use standalone:
When you have java-centric architecture and established processes around that and it would be costly to change from that to modern container orchestration.
When you are using java as primary scripting (or application) platform on your instance and simply don't want any overhead of containers.
Context
Running multiple version of the same library seems to be a usual need and there are many questions for this when dealing with versioned jar dependencies.
However, I have another constraint here: my code is part of a rolling-release MOAB where code has no version. I cannot depend on a older version of a library from the MOAB.
The use case of that question is being able to load different versions of the same code at runtime for compatibility.
Eg: GET /my/api/call?compat_version=42
I have to be able to provide several compatibility versions (ie code from version x that have not been changed). This must be the actual code that was running when version x was the current/latest version and not any kind of retrocompatibility trick.
Naive solution
The "obvious" way seems to duplicate the code for each version. For instance by having per-version packages:
com.me.thing.v1
com.me.thing.v2
com.me.thing.v3
...
and dynamically loading the code from the associate package upon the provided compat_version parameter by whatever technique. Let's suppose for know that all those versions share a common interface (API).
Challenge
I'd like to challenge that and maybe find a better option than the naive solution.
Since using the exact code from version x is a prerequisite, I don't believe I can get rid of the copy-paste (but please, tell me I'm wrong).
What technique would you suggest as a simple (but not necessarily easy) and robust implementation? Reflection? Dependency injection?
Is there a "good" pattern for doing such things? Is there any literature on that?
This was already an old problem when Java was developed, hence Sun's emphasis on binary compatibility (which existed for Solaris of course as well). This is the original guarantee offered by the platform- that you can upgrade the bits underneath and applications will continue to work, unmodified.
The way to run legacy code in the JVM world is to run the full legacy application.
Many segregation architectures have of course been developed over the years and reached various levels of maturity- like OSGI and many others before it- but there are edge cases upon edge cases and many failure modes.
Do not futz with multiple code versions within a single JVM. It was never a design goal and in environments where it matters only leads to pain.
I'm interested in the OSGi Enterprise specification. At the moment I'm only interested in the JDBC connectivity, but that may change.
At http://www.osgi.org/Download/Release4V42 I can find the osgi.enterprise.jar (the companion code link). Can I just install it in my equinox container and use it?
I had the impression that some of the classes are overlapping (for instance org.osgi.service.component), doesn't this lead to problems? Or should I then just uninstall the org.eclipse.osgi.services bundle and use the osgi.enterprise instead?
No. You can't do that.
The file from OSGI.org is mostly interfaces, it is not complete implementation.
The answer by "J-16 SDiZ" is correct, in that the osgi.enterprise.jar is pure interfaces rather than implementations.
You also asked about the overlap with the org.eclipse.osgi.services bundle… in fact the OSGi enterprise JAR should be a strict superset of it. There is not much problem with having both these bundles installed but it is also not really necessary, so to minimise confusion I would probably remove org.eclipse.osgi.services.
I was just about to include the HtmlUnit library in a project. I unpacked the zip-file and realised that it had no less than 12 dependencies.
I've always been concerned when it comes to introducing dependencies. I suppose I have to ship all these dependencies together with the application (8.7 mb in this particular case). Should I bother checking for, say, security updates for these libraries? Finally (and most importantly, actually what I'm most concerned about): What if I want to include another library which depends on the same libraries as this library, but with different versions? That is, what if for instance HtmlUnit depends on one version of xalan and another library I need, depends on a different version of xalan?
The task HtmlUnit solves for me could be solved "manually" but that would probably not be as elegant.
Should I be concerned about this? What are the best practices in situations like these?
Edit: I'm interested in the general situation, not particularly involving HtmlUnit. I just used it here as an example as that was my current concern.
Handle your dependencies with care. They can bring you much speed, but can be a pain to maintain down the road. Here are my thoughts:
Use some software to maintain your dependencies. Maven is what I would use for Java to do this. Without it you will very soon loose track of your dependencies.
Remember that the various libraries have different licenses. It is not granted that a given license works for your setting. I work for a software house and we cannot use GPL based libraries in any of the software we ship, as the software we sell are closed source. Similarly we should avoid LGPL as well if we can (This is due to some intricate lawyer reasoning, don't ask me why)
For unit testing I'd say go all out. It is not the end of the world if you have to rewrite your tests in the future. It might even be then that that part of the software is either extremely stable or maybe not even maintained no more. Loosing those is not that big of a deal as you already had a huge gain of gaining speed when you got it.
Some libraries are harder to replace later than others. Some are like a marriage that should last the life of the software, but some other are just tools that are easily replaceable. (Think Spring versus an xml library)
Check out how the community support older versions of the library. Are they supporting older versions? What happens when life continues and you are stuck at a version? Is there an active community or do you have the skill to maintain it yourself?
For how long are your software supposed to last? Is it one year, five year, ten year or beyond? If the software has short time span, then you can use more to get where you are going as it is not that important to be able to keep up with upgrading your libraries.
It could be a serious issue if there isn't a active community which does maintain the libraries on long term. It is ok to use libraries, but to be honest you should care to get the sources and put them into your VCS.
Should I bother checking for, say, security updates for these libraries?
In general, it is probably a good idea to do this. But then so should everyone upstream and downstream of you.
In your particular case, we are talking about test code. If potential security flaws in libraries used only in testing are significant, your downstream users are doing something strange ...
Finally (and most importantly, actually what I'm most concerned about): What if I want to include another library which depends on the same libraries as this library, but with different versions? That is, what if for instance HtmlUnit depends on one version of xalan and another library I need, depends on a different version of xalan?
Ah yes. Assuming that you are building your own classpaths, etc by hand, you need to make a decision about which version of the dependent libraries you should use. It is usually safe to just pick the most recent of the versions used. But if the older version is not backwards incompatible with the new (for your use case) then you've got a problem.
Should I be concerned about this?
IMO, for your particular example (where we are talking about test code), no.
What are the best practices in situations like these?
Use Maven! It explicitly exposes the dependencies to the folks who download your code, making it possible for them to deal with the issue. It also tells you when you've got dependency version conflicts and provides a simple "exclude" mechanism for dealing with it.
Maven also removes the need to create distributions. You publish just your artifacts with references to their dependents. The Maven command then downloads the dependent artifacts from wherever they have been published.
EDIT
Obviously, if you are using HtmlUnit for production code (rather than just tests), then you need to pay more attention to security issues.
A similar thing has happened to me actually.
Two of my dependencies had the same 'transitive' dependency but a different version.
My favorite solution is to avoid "dependency creep" by not including too many dependencies. So, the simplest solution would be to remove the one I need less, or the one I could replace with a simple Util class, etc.
Too bad, it's not always that simple. In unfortunate cases where you actually need both libraries, it is possible to try to sync their versions, i.e. downgrade one of them so that dependency versions match.
In my particular case, I manually edited one of the jars, deleted the older dependency from it, and hoped it would still work with new version loaded from other jar. Luckily, it did (i.e. maintainers of the transitive dependency didn't remove any classes or methods that library used).
Was it ugly - Yes (Yuck!), but it worked.
I try to avoid introducing frivolous dependencies, because it is possible to run into conflicts.
One interesting technique I have seen used to avoid conflicts: renaming a library's package (if its license allows you to -- most BSD-style licenses do.) My favorite example of this is what Sun did when they built Xerces into the JRE as the de-facto JAXP XML parser: they renamed the whole of Xerces from org.apache.xerces to com.sun.org.apache.xerces.internal. Is this technique drastic, time consuming, and hard to maintain? Yes. But it gets the job done, and I think it is an important possible alternative to keep in mind.
Another possibility is -- license terms abided -- copying/renaming single classes or even single methods out of a library.
HtmlUnit can do a lot, though. If you are really using a large portion of its functionality on a lot of varied input data, it would be hard to make a case for spending the large amount of time it would take to re-write the functionality from scratch, or repackage it.
As for the security concerns -- you might weigh the chances of a widely used library having problems, vs. the likelihood of your hand-written less-tested code having some security flaw. Ultimately you are responsible for the security of your programs, though -- so do what you feel you must.
What is classpath hell and is/was it really a problem for Java?
Classpath hell is an unfortunate consequence of dynamic linking of the kind carried out by Java.
Your program is not a fixed entity but rather the exact set of classes loaded by a JVM in a particular instance.
It is very possible to be in situations where the same command line on different platforms or even on the same one would result in completely different results because of the resolution rules.
There could be differences in standard libraries (very common). Libraries could be hidden by one another (an an older version may even be used instead of a newer one). The directory structure could mess resolution. A different version of the same class may appear in multiple libraries and the first one encountered will be used, etc. Since Java, by specification, uses a first-encountered policy, unknown ordering dependencies can lead to problems. Of course, since this is the command line and it is part of the spec, there are no real warnings.
It is very much still a problem. For example on Mac OS the horrible support from Apple means that you machine ends up with several JVMs and several JREs, and you can never easily poart things from place to place. If you have multiple libraries that were compiled against specific but different versions of other libraries, you coulld have problems, etc.
However, this problem is not inherent in Java. I remember my share of DLL hell situations while programming windows in the 90s. Any situation where you have to count on something in the file system to assemble your program rather than having a single well defined executable is a problem.
However, the benefits of this model are still great, so I'm willing to tolerate this hell. There are also steps in the right direction on the side of Sun. For example, Java6 allows you to simply specify a directory with jars rather than have to enumerate them.
BTW: Classpaths are also a problem if you are using an environment that uses a non-default classloader. For example, I have had a lot of problems running things like Hibernate or Digester under Eclipse because the classloaders were incompatible.
Classpath/jar-hell has a couple of escape hatches if they make sense for your project:
OSGi
JarJarLinks
NetBeans Module System - Not sure if this is usable outside of NetBeans
Others?
I think "classpath hell" refers to the time when the classpath of a Java app could only be set by using the CLASSPATH environment variable. This led to many applications requiring changes to the global system configuration (different for each OS), version conflicts between applications, and general confusion.
This is a somewhat more concrete example:
When two libraries (or a library
and the application) require different versions of the same third
library. If both versions of the third library use the same class
names, there is no way to load both versions of the third library with
the same classloader.
Take a loot at http://en.wikipedia.org/wiki/Java_Classloader#JAR_hell for more examples.
There's lot of good stuff here http://mindprod.com/jgloss/classpath.html and http://java.sun.com/javase/6/docs/technotes/tools/windows/classpath.html
I've only had issues with classpaths when I am not setting is myself using -cp. Trying to figure out how your third-party software sets their classpaths can be a pain at times.