URLClassLoader.getResources("") (empty resource name) not giving roots of jars - java

Consider a URLClassLoader parameterized with a collection of URLs which is a mix of expanded directories and jar files. For example:
URL[] urls = new URL[] {
new URL("file:/D:/work/temp/jars/spring-security-core-3.2.0.RELEASE.jar"),
new URL("file:/D:/work/temp/jars/spring-security-config-3.2.0.RELEASE.jar"),
...
new URL("file:/D:/work/temp/domain/bin/"),
new URL("file:/D:/work/temp/web/bin/"),
...
}
URLClassLoader cl = new URLClassLoader(urls);
The classloader correctly handles getResources() requests for resources located somewhere inside a package like "org/my/package/conf.properties". By correctly handles I mean the classloader successfully finds all matches inside both directories and jars.
A special empty string name passed in getResources("") is supposed to yield the URLs for all available roots (in both the directories and the jars). However there is a known limitation in ClassLoaders which results in only returning roots that correspond to directories. All roots to jars are discarded.
Using classloader.getURLs[] instead of classloader.getResources("") will not work with me as I have a complex graph of interdependent URLClassLoaders, so the results are going to be completely different. Also my classloaders are to be consumed by a third party classpath scanning facilities that uses getResources("") calls in order to set up an internal search base. This way resources located in jars are simply not found.
I currently have a working fix where I extend from URLClassLoader and manually handle requests with an empty string by forcing roots for jars in addition to those for directories within the returned collection of URLs.
However my questions are:
What was the conceptual/technical reason for this limitation (where paths to jars are not returned)?
By fixing this manually, do I violate any important contract?
Is there any nice way to get the desired behavior?
Thanks for any thoughts on that!

What was the conceptual/technical reason for this limitation (where paths to jars are not returned)?
The behavior of ClassLoader.getResources("") is unspecified.
The implementation for loading resources from the file system in URLClassPath$Loader is solely based on URLs. It constructs a new file URL by adding the resource name to the base URL of the directory
and returns the URL when it points to an existing resource.
There's no special handling for an empty resource name.
Whether this is wanted behavior or not is undocumented.
The implementation for JAR files in URLClassPath$JarLoader works on an index over JAR files. To get the same behaviour for JAR files the implementation would require a special handling for empty resource names, i.e. it would need to check for an empty resource name first and return the file URL of the JAR file instead of searching within the index. The implementation does not have a special handling for empy resource names. Whether this is wanted behavior or not is again undocumented.
Since the API specification does not specify the behavior for
empty resource names both implementations are valid.
Some may argue that exposing roots is a security issues, especially when running in a sandbox. Others may argue that getResources() should return null for empty resources since there actually does not exist a resource with the name "".
In any case current behavior of URLClassLoader leads to unexpected behavior in Class.getResource(). When this method is called with an empty string for a class in the default package it returns the root directory of the class when the class was loaded from the file system. This violates the contract of the method. For details see for example this open Java bug: https://bugs.openjdk.java.net/browse/JDK-8202687.
By fixing this manually, do I violate any important contract?
As long as you only override the findResource() method
of your ClassLoader, call the super method and then add the additional
URLs of your JAR files you shouldn't violate any contract.
But be aware that there are already implementations out there, that have a special handling for URLClassLoaders. For example
Spring's PathMatchingResourcePatternResolver has a special handling (here) for class loaders that are instances of URLClassLoader, which adds additional URLs for JARs.
Is there any nice way to get the desired behavior?
There is no nice way to get the desired behavior since every solution would be based on unspecified behavior that may theoretically change with every new JRE version.
With the introduction of multi-release JAR files in Java 9 the behavior already changed:
For a multi-release JAR file with Java 8 classes and Java 9 classes
ClassLoader.getResource("") returns now an URL for the JAR file when it is executed within a JRE version > 8. With JRE 8 it still returns no URL for the same JAR file. With it, the returned URLs for an empty resource string depend now even on the JRE version, resp. the type of JAR file.
There exist workarounds to get also the URLs for JAR files. PathMachintResourcePatternResolver for example loads JAR file names from the java.class.path system property (in case of the system class loader) and loads additional URLs by calling URLClassLoader.getURLs() (in case of a URLClassLoader). But again, these are only workarounds based on unspecified behavior.
Ideally searches on the classpath are only performed in the context of a java package. Frameworks like Spring (boot) perform searches on the classpath only in the context of a java package. This avoids to rely on unspecified behavior of class loaders and also avoids to search in JAR files of irrelevant third-party libraries. So, whenever possible I recommend to search on the classpath in context of a java package instead of searching resources by using an empty resource name.

Related

How to read the classes from another JRT?

From Java 11, how can I read the content of another runtime image?
In order to list the content of a Java runtime image, JEP 220 suggests the following solution:
A built-in NIO FileSystem provider for the jrt URL scheme ensures that development tools can enumerate and read the class and resource files in a run-time image by loading the FileSystem named by the URL jrt:/, as follows:
FileSystem fs = FileSystems.getFileSystem(URI.create("jrt:/"));
byte[] jlo = Files.readAllBytes(fs.getPath("modules", "java.base",
"java/lang/Object.class"));
This snippet works and will allow me to read the content of java/lang/Object.class in the runtime image of the Java installation that is executing the code.
How can I get it to read the content of java/lang/Object.class in another Java installation, given its java home?
I have read this SO answer which explains how to read a Java runtime image's content from a Java 8 runtime. Unfortunately, this won't work with newer Java runtimes, since, I believe, the filesystem for jrt:/ will always point to the current runtime image.
You may still use jrt:/ scheme as described in this answer, you just need to provide an alternative java.home path in the environment argument when creating a FileSystem object:
public static void listModules(String javaHome) throws IOException {
FileSystem fs = FileSystems.newFileSystem(
URI.create("jrt:/"),
Collections.singletonMap("java.home", javaHome));
try (Stream<Path> stream = Files.list(fs.getPath("/modules"))) {
stream.forEach(System.out::println);
}
}
Or, to read a single resource:
public static byte[] readResource(String javaHome, String module, String path) throws IOException {
FileSystem fs = FileSystems.newFileSystem(
URI.create("jrt:/"),
Collections.singletonMap("java.home", javaHome));
return Files.readAllBytes(fs.getPath("modules", module, path));
}
I think what you want is impossible. To wit:
Up to JDK8, you can rely on e.g. Paths.get(pathToJdk8Home, "jre", "lib", "rt.jar") to exist, which you can then turn into a URL (you're looking for jar:file:/that/path), and you can then toss that URL at FileSystems.newFileSystem), see this documentation for more.
But from JDK9 and up, the core java API is loaded in jmod files, and jmod files have an unspecified format by design - right now jmods are just zips, but unlike jars you explicitly get no guarantees that they will remain zip formatted, and there is no jmod URL scheme and no JmodFileSystemProvider. It is, in effect, impossible to read a jmod file in a way that is future compatible. Unfortunately the OpenJDK project has been on a murderous spree turning a ton of useful things, such as 'read a jmod', into implementation details. Bit user-hostile - just be aware of that, and I'm trying to do some expectation management: Stuff like this is way, way harder, and a huge maintenance burden (as you're forced to dip into workarounds, hacks, and going beyond spec thus needing to check it still works for every point release). See also this SO answer.
The jrt scheme can only load data from jmods that are actually 'loaded' into the VM's mod base, which I gather is explicitly not what you want (in fact, I'm pretty sure you cannot load e.g. the JDK11 core jmods into a JDK14, as it already loaded its jmods, and you'd get a split package violation). The jrt:// URL scheme, per its spec, isn't base file system related. You specify a module name (or nothing, and you get all loaded modules as one file system). There is no place for you to list a JDK installation path or jmod file, so that can't help you either.
Thus, you have only two options:
Accept that what you want cannot be done.
Accept that you're going to have to write hackery (as in, go beyond things that specifications guarantee you), and you accept the large maintenance burden that comes with the territory.
The hackery would involve:
Detect targeted JDK version or go on a hunting spree within the provided JDK installation directory (using e.g. Files.walk) to find a file named rt.jar. If it's there, load it up as ZipFileSystem and carry on. Modules 'do not exist', just turn any desired class into a path by replacing dots with slashes and appending .class (note that you'll need the binary name; e.g. package com.foo; class Outer { class Inner {}} means you want the name of Inner to be com.foo.Outer$Inner, so that you turn that into /com/foo/Outer$Inner.class).
For JDK9 and up, hunt for a file at JDK_HOME/jmods/java.base.jmod, and throw that at ZipFileSystem. A given class is in subdir classes. So, you're looking for e.g. the entry classes/java/lang/Object.class within the zip (that jmod is the zip). However, festoon this code with comments stating that this is a total hack and there is zero guarantee that this will work in the future. I can tell you, however, that JDK16, at least, still has zip-based jmod files.
Alternatively, given that you have a JDK installation path, you can use ProcessBuilder to exec List.of("JDK_HOME/bin/jmod" /* or jmod.exe, you'll have to check which one to call! */, "extract", "JDK_HOME/jmods/java.base.jmod"), but note that this will extract all of those files into the current working directory (you can set the cwd for the invoked process to be some dir you just created for the purpose of being filled with the files inside). Quite a big bazooka if all you wanted was the one file. (You can also use the --dir option instead). The advantage is that this will still work even if hypothetically JDK17 is using some different format; presumably JDK17 will still have both bin/jmod as well as jmods/java.base.jmod, and the bin/jmod of JDK17 should be able to unpack the jmod files in your JDK17 installation. Even if you are running all this from e.g. JDK16 which wouldn't be able to read them.

Why does the Java Extension Mechanism not check the classpath for an optional package implementing LocaleServiceProvider?

I have created a maven project L and written a Java extension (i.e. an optional package) implementing (i.e. extending) the (abstract) service providers that implement (i.e. extend) LocaleServiceProvider, to support a dialect (let's call it xy) that isn't normally supported by the JRE. (I do not want to use the CLDR extension that came with Java 8, even though I'm running 8.141.)
The project compiles, and produces a jar with a META-INF/services folder that contains the provider-configuration files in UTF-8 with the qualified provider class names being on a line that ends with a line feed (\n).
I have then declared a maven dependency in my project P on the locale project L, and I thought that that would work, because the tutorial states
The extension framework makes use of the class-loading delegation
mechanism. When the runtime environment needs to load a new class for
an application, it looks for the class in the following locations, in
order:
[...]
The class path: classes, including classes in JAR files,
on paths specified by the system property java.class.path. If a JAR
file on the class path has a manifest with the Class-Path attribute,
JAR files specified by the Class-Path attribute will be searched also.
By default, the java.class.path property's value is ., the current
directory. You can change the value by using the -classpath or -cp
command-line options, or setting the CLASSPATH environment variable.
The command-line options override the setting of the CLASSPATH
environment variable.
Maven puts all dependencies on the classpath, I believe.
Yet when I run my unit test in P (in IntelliJ; L is on the classpath), it fails:
#Test
public void xyLocalePresent() {
Locale xy = new Locale("xy");
assertEquals("P not on classpath", xy, com.example.l.Locales.XY); // access constant in my Locale project L; should be equals to locale defined here
SimpleDateFormat df = (SimpleDateFormat) DateFormat.getDateInstance(DateFormat.SHORT, xy);
assertEquals("dd/MM/yy", df.toPattern()); // fails; L specifies the short date pattern as dd/MM/yy
}
I have to start it with -Djava.locale.providers=SPI,JRE -Djava.ext.dirs=/path/to/project/L/target. If I do that, it works, indicating that L's service providers were loaded successfully (indicating the jar's structure is ok).
NB: the Java 8 technotes say that the order SPI,JRE is the default.
Why, oh why does it not work when I just put L on the classpath? Why do I have to point to it explicitly?
Update: After going through the JavaDoc again, I just saw this (emphasis mine):
Implementations of these locale sensitive services are packaged using
the Java Extension Mechanism as installed extensions.
That explains things. :(
Is there any way to make this work by just putting L on the classpath when P runs, i.e. without having to install L (or having to use -D system properties)? (P uses maven, Struts2 and Spring, if that helps...)
In more complex applications, such as web servers (e.g. Tomcat), there are multiple ClassLoaders, so each WebApp served by the web server can be kept independent.
The extension mechanism is for extending the core Java functionality, i.e. features available globally within the running JVM (the web server). As such, they must be loaded by the System ClassLoader.
The standard way to add an extension to the code Java runtime, is to either
add the Jar file to the JRE_HOME/lib/ext folder
add extra folders to be searched by specifying the java.ext.dirs system property
You could also just add it to the Bootstrap ClassPath yourself, but that might cause problems if the Security Manager is activated. Not sure about that part. So it's best to do it the official way.
Note that the classpath defined by the CLASSPATH environment variable, or the -cp command-line option, does not define the Bootstrap ClassPath.
To learn more, read the Java documentation "How Classes are Found".

Order of loading jars

Assuming there are two jars of different library versions on a classpath, e.g.
java -cp A-2.1.jar:A-2.2.jar ...
The package and class names in the first and second jars are the same, but class implementation is different. Is it specified whether root jvm classloader will try to find a class in A-2.1 before A-2.2?
The problem is that AWS EMR adds hadoop jars to a classpath and some of its dependencies are of older versions. However, our application needs to use new versions of the same libraries, so will prepending the classpath with newer versions of libraries be enough or is shading a recommended practice in this case? http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-hadoop-config_hadoop-user-env.sh.html
From the Setting the Class Path documentation:
The order in which you specify multiple class path entries is
important. The Java interpreter will look for classes in the
directories in the order they appear in the class path variable.
That said, overriding the dependency JARs of another library will always be risky since the library provider might not have tested that combination, so you'll either need to ask them for reassurance, do your own testing, or shade/repackage the classes as you suggested.

Ignoring JARS the ugly way via classloader

So, I am trying to do something ugly here, let's say it's a desperate measure as I cannot have control over the runtime environment.
So having that said, I run some code in an environment where I cannot have control over the classpath (bad thing)... What's still worse is that the classpath has two jars, let's say productV1.jar and productV2.jar Both are exactly the same, but in different versions, so they have same classes.
For some reason, in most environments the productV2.jar is loaded and the productV1.jar is left out, but in some other environment, productV1.jar is called, and it causes the program to crash.
What I want to do as a workaround is mess with the classloader to explicitly ignore "productV1.jar". Ideally by overwriting some classloader funtion. I have done this with other resources (like the persistence.xml file from Hibernate), but I don't want to filter based on containing classes, but rather on the jar file. Is there a way?
This is only meant to work until I can do the negotiations to get rid of the offending jar...
Edit: I will leave this question open in case there is some interesting hack. However, the issue is that everything that is in the classpth is loaded by the system classloader, and trying to switch it at runtime is not an easy task (or maybe not possible at all?). Only way I see is starting the process with a custom classloader, which is not what I am looking for.
Did you try to use URLClassLoader and manually add jars to the classpath? In constructor of URLClassLoader you pass an array of URLs which point to jars that you want to load.
You can make the following experiment: create an URLClassLoader subclass that includes only the jars you want (i.e. call super constructor with appropriate array of URLs) and then call your java with the following property:
-Djava.system.class.loader=test.CustomJarsClassloader
To set your classloader as the default one.
The classloader may look like:
public class CustomJarsClassLoader extends URLClassLoader{
public CustomJarsClassLoader(){
super(new URL[]{ /*List of URLs to jars... */});
}
}
UPDATE:
Ok, if you can't add this argument to command line then try another approach:
In your main() function create a new Thread
Set the classloader that I mentioned above as this thread's context classloader (see: javadocs)
Run all your application's code inside this Thread. Your classloader should be used to load classes.

Overriding classes in Java

Can I use ClassLoader's definePackage to override some packages from inside a jar?
For example, the application currently contains "javax.xml.bind" from abc.jar. If I call ClassLoader.definePackage(def.jar), in which the def.jar contains another version of javax.xml.bind, can I replace the classpath for the entire application to point to that of def.jar? Thanks.
No, you definitely can not use ClassLoader.definePackage to "override" some packages from inside a jar.
If I understand correctly, you want to make your JVM load any class under javax.xml.bind from def.jar while all other ones from abc.jar. In this case you can (in my personal order of preference):
1) Put def.jar before abc.jar in the CLASSPATH. This requires that no class you want loaded from abc.jar is present in def.jar.
2) Unzip def.jar, abc.jar, or both, and remove any conflicting classes so it is really irrelevant which jar comes first in the CLASSPATH. Then re-zip them. Or you can do this only on one jar and put it before the other.
3) Use a configurable classloader (sorry, no public domain one that I know of; let me know if you find one). This could be an interesting topic for an OS project, except that several initiatives with similar (but much broader) objectives are already ongoing, some at the core of the language.
4) Create a classloader for this purpose, probably extending the default one.

Categories