Is there any way to speed up the initialization of javax.xml.bind.JAXBContexts with a large (>1000) number of classes? In our XML heavy application the startup time is some 10 minutes and consists mainly of the initialization time of the JAXBContexts. :-(
We are using Sun's JAXB implementation in the JDK 1.5 and the org.jvnet.jaxb2.maven2.maven-jaxb2-plugin for the code generation from XSDs.
Clarification: The problem is not that we have many instances of a JAXBContext with the same contextpaths, but the problem is that the initialization of one single JAXBContext takes tens of seconds since it has to load and process thousands of classes. (Our XSDs are fairly large and complicated.) All JAXBContext instances have different contextpaths - we cannot reduce the number further.
The JAXB reference implementation has a sort-of-undocumented system property for exactly this reason:
-Dcom.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.fastBoot=true
or for old versions prior to the package refactoring:
-Dcom.sun.xml.bind.v2.runtime.JAXBContextImpl.fastBoot=true
This instructs JAXB to skip the expensive process of pre-caching the various reflection muscles it needs to do the job. Instead, it will do all the reflection when the context gets used. This makes for a slower runtime, but considerably faster initialization, especially for large numbers of classes.
However, one part of the speed problem is unavoidable, and that's the fact that JAXB has to load every single one of your classes, and classloading is slow. This is apparent if you create a 2nd context immediately after the first, with the same configuration - you'll see it's much, much faster, having already loaded the classes.
Also, you say that you have multiple JAXBContext instances because you have multiple contextpaths. Did you realise that you can put multiple context paths into a single context? You just need to pass them all as a semicolon-delimited string when you initialize the context, e.g.
JaxbContext.newInstance("a.b.c:x.y.z");
will load the contexts a.b.c and x.y.z. It likely won't make any difference to performance, though.
In general, you should not have to create many instances of JAXBContext, as they are thread-safe after they have been configured. In most cases just a single context is fine.
So is there specific reason why many instances are created? Perhaps there was assumption they are not thread-safe? (which is understandable given this is not clearly documented -- but it is a very common pattern, need syncing during configuration, but not during usage as long as config is not changed).
Other than this, if this is still a problem, profiling bottlenecks & filing an issue at jaxb.dev.java.net (pointing hot spots from profile) would help in getting things improved.
JAXB team is very good, responsive, and if you can show where problems are they usually come up with good solutions.
JAXBContext is indeed thread-safe, so wrapping it with a singleton is advised. I wrote a simple singleton containing a class->context map that seems to do the job. You may also want to create a pool of [un]marshaller objects if you're application uses many threads, as these objects are not thread-safe and you may see some initialization penalties with these as well.
In our case updating the JAXB libraries was a good idea. Incidentially, using the server VM instead of the client VM even in the development environment was a good idea here, even though it normally slows down server startup: since the JAXB initialization takes so much time the better compilation of the server VM helps.
Related
We came across a JaxB class loading issue as highlighted by Jaxb classCastException.
To fix that I added com.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize=true and that actually FIXED the issue.
However, I read that this will disable Jaxb's ability to directly interact with bytecode and go through Java Reflection API and so might have a slight performance hit while initilazation of new Jaxb Contexts via "JAXBContext.newInstance".
To test the performance I added a simple method which invokes JAXBContext.newInstance in a for loop some 500 times. And I ran this with the flag=true and =false.
In the worst case, I saw a performance hit of only about 3.5 ms on an average per invocation.
Has anyone had a similar issue and tried the above fix? What were your findings? I couldn't find much information on the com.sun.xml.bind.v2.bytecode.ClassTailor.noOptimize. Can you share more info on its behavior and impact?
This document from IBM about the JAXB Context initialization might help you: JAXBContext Initialization Takes A Long Time
Problem(Abstract)
JAXBContext initialization slows down application performance.
Symptom
Slow performance in WebSphere Application Server.
Cause
JAXB context (javax.xml.bind.JAXBContext) object instantiation is a resource intensive operation. JAXB Context instantiation involves the pre-load and pre-creation of contexts (called the pre-caching process) of all packages and classes associated with the context, and then all of the packages and classes which are statically (directly and indirectly) referenced from those. Performance latency will correlate with the number of classes which are passed during JAXB creation during this pre-caching process.
We've got a custom classloader, called here MainClassLoader, sitting on top of a Java web application (specifically on Tomcat 7 where the parent classloader is the WebAppClassLoader). This custom classloader is set as the TCCL for the web application, and its purpose is to delegate the lookup of classpath resources (including classes and non-class resources) to a set of other custom classloaders, each of which represents a pluggable module to the application. (MainClassLoader itself loads nothing.)
MainClassLoader.loadClass() will do parent-first delegation, and upon a ClassNotFoundException, go one by one through the pluggable child classloaders to see which of them will provide the result. If none of them can, it then throws the ClassNotFoundException.
The logic here is a bit more complicated, however, and combining that with the fact that our end users may end up having several (in the 10s) of these child modules plugged in, we're finding that the classloader ends up being one of the more CPU-intensive parts of the application, given how reliant Java is today on reflection-based command pattern implementations. (By that I mean there are a lot of Class.forName() calls to load and instantiate classes at runtime.)
We started noticing this first in periodic thread dumps of the application to catch the app "in action" to see what it is doing, plus profiling through JProfiler certain use cases that were known to be slower than desired.
I've written a very simple caching approach for MainClassLoader where the results of a loadClass() (including a ClassNotFoundException) call are cached in a concurrent map with weak values (keyed by the String className), and the performance of this class went high enough to totally fall off the hot spots list of JProfiler.
However, I'm concerned about whether we can really safely do this. Will such a cache get in the way of intended classloader logic? What are the pitfalls one might expect in doing this?
Some obvious ones I anticipate:
(1) Memory - obviously this cache consumes memory, and if left unbounded is a possible memory drain. We can address this using a limited cache size (we're using Google's Guava CacheBuilder for this cache).
(2) Dynamic classloading, especially in development - So if a new or updated class/resource is added to the classpath after our cache has a stale result, this would confuse the system, probably resulting more often in ClassNotFoundExceptions being thrown when the class now should be loadable. A small TTL on the cached "not found" state elements might help here, but my bigger concern is, during development, what happens when we update a class and it gets hot-swapped into the JVM. This class would most likely be in one of the classloaders that MainClassLoader delegates to, and so its cache could conceivably have a stale (older) version of the class. However, since I'm using Weak values, would this help to mitigate this? My understanding of weak references are they don't go away even when eligible for collection until the GC runs a pass where it decides to reclaim them.
These are my two known issues/concerns with this approach, but what scares me is that classloading is a bit of a black art (if not a dark science) that is full of gotchas when you do non-standard things here.
So what am I not worried about that I should be worried about?
UPDATE/EDIT
We ended up opting NOT to do the local caching as I prototyped above (it just seems dangerous and redundant with the caching/optimization done by the JVM), but did some optimization within our loadClass() method. Basically the logic we have in this loadClass() method (see comments below) did not follow a "best case" path through the code when it could have, e.g. when there were no "customization" modules in place, we were still behaving as though there were, letting that classloader throw a ClassNotFoundException and catching it and doing the next checks. This pattern meant that a given class load operation would nearly always go through at least 3 try/catch blocks with a ClassNotFoundException being thrown in each. Quite expensive. Some extra code to determine whether there were any URLs associated with the classloaders being delegated to allowed us to bypass those checks (and the resultant exception throw/catch), giving us an almost 25000% boost in performance for this class.
I'd still like comment on my original question, however, to help keep the issue alive to be answered.
What are the concerns in doing our own caching in a custom classloader, other than those I already listed?
I am looking for a way to reload a class into Java at runtime. The motivation is to make debugging more efficient. The application is a typical client/server design that synchronously processes requests. A "handler" object is instantiated for each request. This is the only class I intend to dynamically replace. Since each request deals with a fresh instance, reloading this class won't have any side-effects. In short, I do not want to restart the entire application every time there is a change to this module.
In my design, the Java process becomes aware that a .class file has been updated in the classpath in between requests. When this happens, the "handler" class is unloaded and a new one is loaded.
I know I can use the classLoader interface to load in a new class. I seem to be having trouble finding the proper way of "unloading".
Classes will be unloaded and garbage collected like any other object, if there is no remaining reference to them. That means there must be no reachable instance of the class (as loaded by that particular classloader instance) and the classloader instance itself must be eligible for garbage collection as well.
So basically, all you have to do is to create a new classloader instance to load the new version of the class, and make sure that no references to instances of the old version remain.
I believe that you actually need to have a hierarchy of classloaders, and in order to reload you actually get rid of the low level classloader (by normall GC means), and hence all the classes it loaded. So far as I know this technique is used by Java EE app servers for reloading applications, and there's all manner of fun results when framework code loaded in one classloader wants to use classes loaded somewhere else.
As of 2015 also java's class reloading is a missing feature.
Use OSGi to create a class reloading application.
Use jrebel for testing. There are a few others which does the same thing.
Use application server and externalize the parts which you want to reload into a separate web application. Then keep deploying/undeploying. You will eventually get some perm gen space overflow kind of errors due to dangling old ClassLoader instances.
Use a script runner to execute parts of changeable code. JSR-223 Java Scripting API support for the scripting language "Java".
I had written a series about class reloading. But all of those methods are not good for production.
The blog and source codes in google sources
IMHO this class reloading is messy in java and its not worth trying it. But I would very much like this to be a specification in java.
Is there a recommended way to synchronize Tomcat Servlet instances that happen to be competing for the same resource (like a file, or a database like MongoDB that isn't ACID)?
I'm familiar with thread synchronization to ensure two Java threads don't access the same Java object concurrently, but not with objects that have an existence outside the JRE.
edit: I only have 1 Tomcat server running. Whether that means different JVMs or not, I am not sure (I assume it's the same JVM, but potentially different threads).
edit: particular use case (but I'm asking the question in general):
Tomcat server acts as a file store, putting the raw files into a directory, and using MongoDB to store metadata. This is a pretty simple concept except for the concurrency issue. If there are two concurrent requests to store the same file, or to manage metadata on the same object at the same time, I need a way to resolve that and I'm not sure how. I suppose the easiest approach would be to serialize / queue requests somehow. Is there a way to implement queueing in Tomcat?
Typically, your various servlets will be running in the same JVM, and if they're not, you should be able to configure your servlet runner so this is the case. So you can arrange for them to see some central, shared resource manager.
Then for the actual gubbinry, if plain old synchronized isn't appropriate, look for example at the Semaphore class (link is to part of a tutorial/example I wrote a while ago in case it's helpful), which allows you to handle "pools" of resources.
If you are running one tomcat server and all your servlets are on one context you can always synchronize on a java object present on that context class loader. If you are running multiple contexts then the "synchronization object" can not reside in any particular context but needs to reside at a higher level that is shared by all the contexts. You can use the "common" class loader in tomcat 6.0 documentation here to place your "synchronization object" there which will then be shared among all contexts.
I have 2 cases, If you expect to access common resource for File editing within the same JVM you can use the "synchronized" in a Java function. If different JVMs and other none Java threads accessing the common resource you might try using manual file locking code giving each thread priority number in queue
For database i believe there's no concurrency issue.
Your external resource is going to be represented by Java object (e.g. java.io.File) in some way or another. You can always synchronize on that object if you need to.
Of course, that implies that said object would have to be shared across your servlet instances.
IMO you're asking for trouble. There are reasons why things like databases and shared file systems were invented. Trying to write your own using some Singleton class or semaphores is going to get ugly real quick. Find a storage solution that does this for you and save yourself a lot of headaches.
I am currently working on some older java code that was developed without App Servers in mind. It is basically a bunch of "black box code" with an input interface, and an output interface. Everything in the "black box" classes are static Data Structures that contain state, which are put through algorithms at timed intervals (every 10 seconds). The black box is started from a main method.
To keep this easy for myself, I am thinking of making the "black box" a Singleton. Basically, anyone who wants to access the logic inside of the black box will get the same instance. This will allow me to use Message Driven beans as input to the black box, and a JMS Publisher of some sort as the output of the black box.
How bad of an idea is this? Any tips?
One of the main concerns I have though, is there may be Threads in the "black box" code that I am unaware of.
Is there such thing as "application scoped objects" in EJB?
Note: I am using Glassfish
If you use a simple singelton, you will be facing problems once you enter a clustered environment.
In such scenario, you have multiple classloaders on multiple JVMs, and your sinlgeton pattern will break as you will have several instances of that class.
The only acceptable use for a singleton in an app server (potentially in a clustered environment) is when you the singleton is totally state-less, and is only used as a convenience to access global data/functions.
I suggest checking your application server vendor's solution for this issue. Most, if not all vendors, supply some solution for requirements of your sort.
Specifically for Glassfish, which you say you are using, check out Singleton EJB support for Glassfish. It might be as simple as adding a single annotation.
I would say that creating a singleton is actually the only viable idea. Assuming that code inside this "black box" is known to use static fields, it is absolutely unsafe to create two instances of this facade. Results are unpredictable otherwise.
Far from being a bad idea, it actually sounds to me like potentially quite a good idea.
Just from a program design point of view: if your black box is conceptually an "object" with properties and methods that work on them, then make it into an object, even if there'll only ever be one of them instantiated.
It should work, but there are some issues you may have to deal with.
Threading, as you have mentioned. An MDB is run in the EJB container where you cannot create your own threads, so you have a potential problem there. If you have access to the actual code (which it sounds like you do), you may want to do some refactoring to either eliminate the threads or use an "approved" threading method. The CommonJ TimerManager will probably work in your stated case since it is performing some task on an interval. There are implementations available for most app servers (WAS and Weblogic have it included).
Classloading - This is dependent on you configuration. If the singleton is created and manipulated from MDB's within the same EAR, you will be fine. Separate EAR's will mean different classloaders and multiple instance of you Singleton. Can't comment on whether this would be a problem in your case or not without more information.
I'm missing a point? You mentioned that the 'black box code' contains state. MDBs may be limited to 1 instance per destination but without proper configuration you will end up with a few MDBs. All of them working with your single instance of 'black box code'. For me it seems this is not a good idea, because one bean will override the 'black box code' state a other bean has created a few ticks before.
It seems to me that the artifact that better fits to your requirement is a JBoss MBean. (If you are thinking on JBoss as AS candidate).
Standard MBean Example
MBeans can also be deployed as Singletons, in case of JBoss clustering.
Clustering with JBoss
I hope that this is useful for you.
Rafa.
Fix the code to get rid of the statics as soon as possible. Singletons are not a step in the right direction - they just add extra misdirection.
Don't use Singletons where state may change.
Exposing the global instance of your black-box class doesn't seem like the way to go. Often times, singletons will seem like they will make things easier on you, and in a way they can, but it often comes back to bite you and you end up having to restructure a large chunk of your code.
In the webserver world, an object can be scoped to the request, the session, or the application. Perhaps what you need is a application-scope object.
Search the docs for "application scope object" or "application lifetime object".
Why not create a rest interface for the blank box thingy and let clients make http calls ?
IMO, it's a good idea to have an EJB container of your Singleton needs. In Java EE 6 placing a #Singleton annotation in your session bean gives you a named singleton.