Can a Custom Delegating Classloader Cache loadClass() results safely?

Can a Custom Delegating Classloader Cache loadClass() results safely? - java

We've got a custom classloader, called here MainClassLoader, sitting on top of a Java web application (specifically on Tomcat 7 where the parent classloader is the WebAppClassLoader). This custom classloader is set as the TCCL for the web application, and its purpose is to delegate the lookup of classpath resources (including classes and non-class resources) to a set of other custom classloaders, each of which represents a pluggable module to the application. (MainClassLoader itself loads nothing.)
MainClassLoader.loadClass() will do parent-first delegation, and upon a ClassNotFoundException, go one by one through the pluggable child classloaders to see which of them will provide the result. If none of them can, it then throws the ClassNotFoundException.
The logic here is a bit more complicated, however, and combining that with the fact that our end users may end up having several (in the 10s) of these child modules plugged in, we're finding that the classloader ends up being one of the more CPU-intensive parts of the application, given how reliant Java is today on reflection-based command pattern implementations. (By that I mean there are a lot of Class.forName() calls to load and instantiate classes at runtime.)
We started noticing this first in periodic thread dumps of the application to catch the app "in action" to see what it is doing, plus profiling through JProfiler certain use cases that were known to be slower than desired.
I've written a very simple caching approach for MainClassLoader where the results of a loadClass() (including a ClassNotFoundException) call are cached in a concurrent map with weak values (keyed by the String className), and the performance of this class went high enough to totally fall off the hot spots list of JProfiler.
However, I'm concerned about whether we can really safely do this. Will such a cache get in the way of intended classloader logic? What are the pitfalls one might expect in doing this?
Some obvious ones I anticipate:
(1) Memory - obviously this cache consumes memory, and if left unbounded is a possible memory drain. We can address this using a limited cache size (we're using Google's Guava CacheBuilder for this cache).
(2) Dynamic classloading, especially in development - So if a new or updated class/resource is added to the classpath after our cache has a stale result, this would confuse the system, probably resulting more often in ClassNotFoundExceptions being thrown when the class now should be loadable. A small TTL on the cached "not found" state elements might help here, but my bigger concern is, during development, what happens when we update a class and it gets hot-swapped into the JVM. This class would most likely be in one of the classloaders that MainClassLoader delegates to, and so its cache could conceivably have a stale (older) version of the class. However, since I'm using Weak values, would this help to mitigate this? My understanding of weak references are they don't go away even when eligible for collection until the GC runs a pass where it decides to reclaim them.
These are my two known issues/concerns with this approach, but what scares me is that classloading is a bit of a black art (if not a dark science) that is full of gotchas when you do non-standard things here.
So what am I not worried about that I should be worried about?
UPDATE/EDIT
We ended up opting NOT to do the local caching as I prototyped above (it just seems dangerous and redundant with the caching/optimization done by the JVM), but did some optimization within our loadClass() method. Basically the logic we have in this loadClass() method (see comments below) did not follow a "best case" path through the code when it could have, e.g. when there were no "customization" modules in place, we were still behaving as though there were, letting that classloader throw a ClassNotFoundException and catching it and doing the next checks. This pattern meant that a given class load operation would nearly always go through at least 3 try/catch blocks with a ClassNotFoundException being thrown in each. Quite expensive. Some extra code to determine whether there were any URLs associated with the classloaders being delegated to allowed us to bypass those checks (and the resultant exception throw/catch), giving us an almost 25000% boost in performance for this class.
I'd still like comment on my original question, however, to help keep the issue alive to be answered.
What are the concerns in doing our own caching in a custom classloader, other than those I already listed?

Related

Block instances of a class at the JVM level?

Is there a way to configure the JVM to block instances of a class being created?
I'd like to do this to ensure no service running in the JVM is allowed to create instances of a class that has been identified as a security risk in a CVE, lets call that class BadClass.
NOTE: I'm looking for a general solution, so the following is purely additional information. I would normally address this by switching the library out, or upgrading it to a version that doesn't have the exploit, but it's part of a larger library that wont be addressing the issue for some time. So I'm not even using BadClass anywhere, but want to completely block it.

I do not know a JVM parameter, but here's some alternatives that might pout you in a position that solve your requirements:
You can write a CustomClassLoader that gives you fine control on what to do. Normal use cases would be plugin loading etc. In your case this is more security governance on devops level.
If you have a CICD pipeline with integration tests you could also start the JVM with -verbose:class parameter and see which classes are loaded when running your tests. Seem a bit hacky, but maybe suits your use case. Just throwing everything into the game, it's up to you judging about the best fit.
Depending on your build system (Maven?) you could restrict building applications just on your private cached libs. So you should have full control on it and put a library - review layer in between. This would also share responsibility between devs and the repository admins.

A distinct non-answer: Do not even try!
What if that larger library that has this dependency wants to call that method? What should happen then?
In other words, what is your blocking supposed to do?
Throw some Error instance, that leads to a teardown of the JVM?
Return null, so that (maybe much later) other code runs into a NPE?
Remember: that class doesn't exist in a void. There is other code invoking it. That code isn't prepared for you coming in, and well, doing what again?!
I think there are no good answers to these questions.
So, if you really want to "manipulate" things:
Try sneaking in a different version of that specific class into your classpath instead. Either an official one, that doesn't have the security issue, or something that complies to the required interface and that does something less harmful. Or, if you dare going down that path, do as the other answer suggests and get into "my own classloader" business.
In any case, your first objective: get clean on your requirements here. What does blocking mean?!

Have you considered using Java Agent?
It can intercept class loading in any classloader, and manipulate it's content before the class is actually loaded. Then, you may either modify the class to remove/fix it's bugs, or return dummy class that would throw error in static initializer.

Notification of any String object construction in Java 8 HotSpot VM

Is there a way to get notified on all invocations to constructor of String class (either directly or using reflection) without weaving or instrumenting rt.jar?
Further is it possible to filter these notifications only for calls within a specific package?
Further is it possible to make these notifications async (like events) so that actual JVM invocations are not slowed down
My use-case is to intercept all strings being created, make a pattern match on the content and raise alters based on some rules (all in backend) as part of some platform component.
As I don't want to instrument rt.jar, AspectJ seems to be out of question (as LTW can't be done on java core classes). The potential tool seems to JVM TI, but I am not exactly sure how to achieve it.
Thanks,
Harish

Is there a way to get notified on all invocations to constructor of String class (either directly or using reflection) without weaving or instrumenting rt.jar in compile time?
You are not compiling the String class, so you can only do weaving at runtime. And yes, this is the only way without creating a custom JVM.
Further is it possible to filter these notifications only for calls within a specific package?
It is possible to check the caller with Reflection.getCallerClass(n)
Further is it possible to make these notifications async (like events) so that actual JVM invocations are not slowed down
All this is very expensive as is passing work to another thread.
make a pattern match on the content
Pattern matching is very expensive compared to creating a String. If you are not careful you will slow down your application by an order of magnitude or two. I suggest you reconsider your real requirements and see if there is another way to some what you are trying to do.
Are you sure you don't want to use a profiler to do this. Note: even profilers generally only sub-sample e.g. every 10th allocation. There is plenty of free ones, in fact two come with the JVM. I suggest using Flight Recorder to track allocations as this has a very low overhead.

Java, runtime class reloading

I am looking for a way to reload a class into Java at runtime. The motivation is to make debugging more efficient. The application is a typical client/server design that synchronously processes requests. A "handler" object is instantiated for each request. This is the only class I intend to dynamically replace. Since each request deals with a fresh instance, reloading this class won't have any side-effects. In short, I do not want to restart the entire application every time there is a change to this module.
In my design, the Java process becomes aware that a .class file has been updated in the classpath in between requests. When this happens, the "handler" class is unloaded and a new one is loaded.
I know I can use the classLoader interface to load in a new class. I seem to be having trouble finding the proper way of "unloading".

Classes will be unloaded and garbage collected like any other object, if there is no remaining reference to them. That means there must be no reachable instance of the class (as loaded by that particular classloader instance) and the classloader instance itself must be eligible for garbage collection as well.
So basically, all you have to do is to create a new classloader instance to load the new version of the class, and make sure that no references to instances of the old version remain.

I believe that you actually need to have a hierarchy of classloaders, and in order to reload you actually get rid of the low level classloader (by normall GC means), and hence all the classes it loaded. So far as I know this technique is used by Java EE app servers for reloading applications, and there's all manner of fun results when framework code loaded in one classloader wants to use classes loaded somewhere else.

As of 2015 also java's class reloading is a missing feature.
Use OSGi to create a class reloading application.
Use jrebel for testing. There are a few others which does the same thing.
Use application server and externalize the parts which you want to reload into a separate web application. Then keep deploying/undeploying. You will eventually get some perm gen space overflow kind of errors due to dangling old ClassLoader instances.
Use a script runner to execute parts of changeable code. JSR-223 Java Scripting API support for the scripting language "Java".
I had written a series about class reloading. But all of those methods are not good for production.
The blog and source codes in google sources
IMHO this class reloading is messy in java and its not worth trying it. But I would very much like this to be a specification in java.

Necessary and sufficient validations upon application init

I have a new puzzle for you :-).
I was thinking on how should an application handle his own start up. Like : checking for required libraries, correct versions, database connectivity, database compatibility, etc. To be specific, here is the test case. I use SWT and Log4J, for obvious reasons. Now, the questions :
Should the app check itself for the required dependencies? If yes, should the user be given specific details of what it's missing? Or just a message, and details to the logs?
What if the log4J library is unavailable?
What is the best to do the test? Verifying the file existance (using file.exists(), at specified path), or loading a class, say Class.forName("org.apache.log4j.Logger")? What should be the proper order to do the checks? For instance, if i test for SWT, i have no idea if logger is available or not, and the error will occur when i try to access that. Backwards, if i test for the logger 1st : a) The lib could be unavailable - i cannot log the error; b) SWT could be unavailable - unable to display the user message.
I've discovered apache.commons.lang framework today, and i find very useful the method org.apache.commons.lang.SystemUtils.isJavaVersionAtLeast(Float value)
, and manny others, i am sure. However, importing too much libs to your project dont make it hard to maintain? Versions change, compatibilities are lost, eg. one cannot control a 3rd party developement style or direction.
Thank u for your answers.

I agree with your need. Checking for required runtime environment provides:
immediate feedback, instead of randomly breaking when accessing some functionnality
hopefully more skilled user, as the immediate feedback is available to the guy that is installing the software, hopefully more skilled than an average user, or at least less confident (installing is always a special operation). A more skilled user is less disturbed if the error is coming in the console, he doesn't depend on a graphical interface.
improved reporting : the error message can be explicit (you're in charge), while default error messages come in many flavours (they are not always that helpful on 1. what's wrong 2. suggesting a fix).
But please note that the runtime requirements could be checked in two situations:
when installing : long verifications are always acceptable ; if a library is not here, a required database or WebService is not accessible, it won't be here at runtime either, so you can complain immediately.
when starting the execution : you can verify again (and some verifications may only happen at that point)
This suggests creating an installer for your application.
Potentially, errors would not all be blocking for the installation. Some would rather accumulate as a list of tasks to be done after installation, maybe nicely formatted in a file with all reference information.
Here, we once again hit the notion of error level in validation (similar to what happens for Log4j) : some validation errors are at fatal level, others are errors, possibly also warnings ...
In our projects, we have some sort of initialization and validation going on on startup. Based on our day-to-day experience, I would suggest the following:
When the application gets big, you don't want to have all init centralized in one class, so we have a modular structure.
A small kernel is configured with a list of modules classes. It's whole init sequence is under strict control, ready for any exceptions (translating them to appropriate messages, but memorizing the stack traces that are so useful to the developpers), making no assumption on the available libraries and so on... CheckStyle can be configured specially for this code.
The interface (of course, abstract class is possible) that the modules implement typically have several initialization methods. They could be:
getDependencies : returns a list of modules that this one depends on.
startup : when the whole application is starting. This will be called only once during startup, and cannot be called again.
start : when the module gets ready for regular operation
stop : reverse from start
shutdown : reverse from startup.
The kernel instanciates each of the module in turn. Then he calls one init method on all of them, then another init method and so on as needed. Each init method can:
signal error conditions (using levels, like Log4J).
an exception thrown would be caught by the kernel, and translated to an error condition
consult another module for its status (because dependencies are the general case), and react accordingly. If needed, the dependencies could be made declaratively.
The kernel takes care of module dependencies generically:
He sorts the modules so that dependencies are respected.
He doesn't initialize a module if one of its dependencies couldn't make it.
If asked to stop a module, he will first stop the modules that depends on it.
A nice feature of this kernel approach is that it is easy to aggregate the errors, at various levels (although fatal could stop it), and report all of them at the end, using whatever means is available (SWT or not, Log4J or not ...). So instead of discovering the problems one after the other, and having to start again each time, you could deliver in one blow (nicely prioritized of course).
Concerning your precise questions:
Should the app check itself for the required dependencies?
Yes (see higher)
If yes, should the user be given specific details of what it's missing? Or just a message, and details to the logs?
As said higher, when installing the user is more prepared to deal with this.
When starting, we use an easy message for the end-user, but give access to the full stack traces for the developper (we have a button that copies in the clipboard the application environment, the stack traces and so on).
What if the log4J library is unavailable?
Log without it (see higher).
What is the best to do the test? Verifying the file existance (using file.exists(), at specified path), or loading a class, say Class.forName("org.apache.log4j.Logger")?
I would load a class. But if it failed, I might check the file existence on disk to give a improved message, including "how to fix".
What should be the proper order to do the checks? For instance, if i test for SWT, i have no idea if logger is available or not, and the error will occur when i try to access that. Backwards, if i test for the logger 1st : a) The lib could be unavailable - i cannot log the error; b) SWT could be unavailable - unable to display the user message.
As I said higher, I suggest these low-level errors get accumulated in a small area of code (kernel), where you could use anything that is available to display them. If nothing is available, you could simply log in the console without Log4J.

The short answer is no. The JVM appropriately handles this functionality on initialization, or at runtime. If a required class is not found on the classpath, a ClassNotFoundException will be thrown. If a class was found, but a required method was not, a NoSuchMethodException is thrown.

Regarding 1 through 3 , there are 2 main use cases here:
application packaging is under your control, and can make sure that all required dependencies are packaged properly. Run-time validations are not useful here.
application packaging is not under your control, and you deliver the main jar and the instructions on what the requirements are. Run-time validations might be useful, but someone who wants to package your application usually has enough skill to understand what a ClassNotFoundException: org.apache.logging.LogManager means.
Regarding 4, as long as you keep the same version of the dependency included in your project, you will have no problems in keeping control. Upgrading to a newer version is a conscious decision, which requires thought and testing.

JAXBContext initialization speedup?

Is there any way to speed up the initialization of javax.xml.bind.JAXBContexts with a large (>1000) number of classes? In our XML heavy application the startup time is some 10 minutes and consists mainly of the initialization time of the JAXBContexts. :-(
We are using Sun's JAXB implementation in the JDK 1.5 and the org.jvnet.jaxb2.maven2.maven-jaxb2-plugin for the code generation from XSDs.
Clarification: The problem is not that we have many instances of a JAXBContext with the same contextpaths, but the problem is that the initialization of one single JAXBContext takes tens of seconds since it has to load and process thousands of classes. (Our XSDs are fairly large and complicated.) All JAXBContext instances have different contextpaths - we cannot reduce the number further.

The JAXB reference implementation has a sort-of-undocumented system property for exactly this reason:
-Dcom.sun.xml.internal.bind.v2.runtime.JAXBContextImpl.fastBoot=true
or for old versions prior to the package refactoring:
-Dcom.sun.xml.bind.v2.runtime.JAXBContextImpl.fastBoot=true
This instructs JAXB to skip the expensive process of pre-caching the various reflection muscles it needs to do the job. Instead, it will do all the reflection when the context gets used. This makes for a slower runtime, but considerably faster initialization, especially for large numbers of classes.
However, one part of the speed problem is unavoidable, and that's the fact that JAXB has to load every single one of your classes, and classloading is slow. This is apparent if you create a 2nd context immediately after the first, with the same configuration - you'll see it's much, much faster, having already loaded the classes.
Also, you say that you have multiple JAXBContext instances because you have multiple contextpaths. Did you realise that you can put multiple context paths into a single context? You just need to pass them all as a semicolon-delimited string when you initialize the context, e.g.
JaxbContext.newInstance("a.b.c:x.y.z");
will load the contexts a.b.c and x.y.z. It likely won't make any difference to performance, though.

In general, you should not have to create many instances of JAXBContext, as they are thread-safe after they have been configured. In most cases just a single context is fine.
So is there specific reason why many instances are created? Perhaps there was assumption they are not thread-safe? (which is understandable given this is not clearly documented -- but it is a very common pattern, need syncing during configuration, but not during usage as long as config is not changed).
Other than this, if this is still a problem, profiling bottlenecks & filing an issue at jaxb.dev.java.net (pointing hot spots from profile) would help in getting things improved.
JAXB team is very good, responsive, and if you can show where problems are they usually come up with good solutions.

JAXBContext is indeed thread-safe, so wrapping it with a singleton is advised. I wrote a simple singleton containing a class->context map that seems to do the job. You may also want to create a pool of [un]marshaller objects if you're application uses many threads, as these objects are not thread-safe and you may see some initialization penalties with these as well.

In our case updating the JAXB libraries was a good idea. Incidentially, using the server VM instead of the client VM even in the development environment was a good idea here, even though it normally slows down server startup: since the JAXB initialization takes so much time the better compilation of the server VM helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.