Case sensitivity of Java class names - java

If one writes two public Java classes with the same case-insensitive name in different directories then both classes are not usable at runtime. (I tested this on Windows, Mac and Linux with several versions of the HotSpot JVM. I would not be surprised if there other JVMs where they are usable simultaneously.) For example, if I create a class named a and one named A like so:
// lowercase/src/testcase/a.java
package testcase;
public class a {
public static String myCase() {
return "lower";
}
}
// uppercase/src/testcase/A.java
package testcase;
public class A {
public static String myCase() {
return "upper";
}
}
Three eclipse projects containing the code above are available from my website.
If try I calling myCase on both classes like so:
System.out.println(A.myCase());
System.out.println(a.myCase());
The typechecker succeeds, but when I run the class file generate by the code directly above I get:
Exception in thread "main" java.lang.NoClassDefFoundError: testcase/A (wrong name: testcase/a)
In Java, names are in general case sensitive. Some file systems (e.g. Windows) are case insensitive, so I'm not surprised the above behavior happens, but it seems wrong. Unfortunately the Java specifications are oddly non-commital about which classes are visible. The Java Language Specification (JLS), Java SE 7 Edition (Section 6.6.1, page 166) says:
If a class or interface type is declared public, then it may be accessed by
any code, provided that the compilation unit (§7.3) in which it is declared is
observable.
In Section 7.3, the JLS defines observability of a compilation unit in extremely vague terms:
All the compilation units of the predefined package java and its subpackages lang
and io are always observable. For all other packages, the host system determines which compilation units are observable.
The Java Virtual Machine Specification is similarly vague (Section 5.3.1):
The following steps are used to load and thereby create the nonarray class or
interface C denoted by [binary name] N using the bootstrap class loader [...]
Otherwise, the Java virtual machine passes the argument N to an invocation of a
method on the bootstrap class loader to search for a purported representation of C
in a platform-dependent manner.
All of this leads to four questions in descending order of importance:
Are there any guarantees about which classes are loadable by the default class loader(s) in every JVM? In other words, can I implement a valid, but degenerate JVM, that won't load any classes except those in java.lang and java.io?
If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
Is there any way to make HotSpot load a and A simultaneously? Would writing a custom class loader work?

Are there any guarantees about which classes are loadable by the bootstrap class loader in every JVM?
The core bits and pieces of the language, plus supporting implementation classes. Not guaranteed to include any class that you write. (The normal JVM loads your classes in a separate classloader from the bootstrap one, and in fact the normal bootstrap loader loads its classes out of a JAR normally, as this makes for more efficient deployment than a big old directory structure full of classes.)
If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
Is there any way to make "standard" JVMs load a and A simultaneously? Would writing a custom class loader work?
Java loads classes by mapping the full name of the class into a filename that is then searched for on the classpath. Thus testcase.a goes to testcase/a.class and testcase.A goes to testcase/A.class. Some filesystems mix these things up, and may serve the other up when one is asked for. Others get it right (in particular, the variant of the ZIP format used in JAR files is fully case-sensitive and portable). There is nothing that Java can do about this (though an IDE could handle it for you by keeping the .class files away from the native FS, I don't know if any actually do and the JDK's javac most certainly isn't that smart).
However that's not the only point to note here: class files know internally what class they are talking about. The absence of the expected class from the file just means that the load fails, leading to the NoClassDefFoundError you received. What you got was a problem (a mis-deployment in at least some sense) that was detected and dealt with robustly. Theoretically, you could build a classloader that could handle such things by keeping searching, but why bother? Putting the class files inside a JAR will fix things far more robustly; those are handled correctly.
More generally, if you're running into this problem for real a lot, take to doing production builds on a Unix with a case-sensitive filesystem (a CI system like Jenkins is recommended) and find which developers are naming classes with just case differences and make them stop as it is very confusing!

Donal's fine explanation leaves little to add, but let me briefly muse on this phrase:
... Java classes with the same case-insensitive name ...
Names and Strings in general are never case-insensitive in themselves, it's only there interpretation that can be. And secondly, Java doesn't do such an interpretation.
So, a correct phrasing of what you had in mind would be:
... Java classes whose file representations in a case-insensitive file-system have identical names ...

I tried to add or remove a character from one of the class names and it worked. I feel it's always better to use different class names.

Don't think just about folders.
Use explicit different namespaces ("packages") for your classes, and maybe use folders to match your classes.
When I mention "packages", I don't mean "*.JAR" files, but, just the concept of:
package com.mycompany.mytool;
// "com.mycompany.mytool.MyClass"
public class MyClass
{
// ...
} // class MyClass
When you do not specify a package for your code, the java tools (compiler, I.D.E., whatever), assume to use the same global package for all. And, in case of several similar classes, they have a list of folders, where to look for.
Packages are like "virtual" folders in your code, and apply to all your packages on your classpath, or installation of Java. You can have several classes, with the same I.D., but, if they are in different package, and you specify which package to look for, you won't have any problem.
Just my 2 cents, for your cup of Java coffe

Related

Java class name same as the nested package name

In my Java application, I use a third-party library.
However, I found something strange, there are some nested packages, and some classes whose name may be the same as the name of the package.
I am afraid I can not make it clear. Here is an example:
package
com.xx.a
com.xx.a.a
And there is a class named 'a' inside the 'com.xx.a'.
So if I want to call this class 'a'...
I write:
a ma = new com.xx.a.a();
Then the IDE will think that I mean the package 'com.xx.a.a'.
Then I can not call it.
I wonder why?
By the way, it seems that the library provider did not want us to use these kinds of classes.
How do they do this?
The Java language allows class identifiers to be obscured by package identifiers. In your case the class com.xx.a is obscured by the package com.xx.a.
From the Java Language Specification:
6.3.2 Obscured Declarations
A simple name may occur in contexts where it may potentially be interpreted as the name of a variable, a type or a package. In these situations, the rules of §6.5 specify that a variable will be chosen in preference to a type, and that a type will be chosen in preference to a package. Thus, it may sometimes be impossible to refer to a visible type or package declaration via its simple name. We say that such a declaration is obscured.
I must say that the rules in §6.5 for classifying the meaning of an identifier are far from clear though.
The reason why you still happen to have a copy of a library that violates this rule is because the rule does not apply for class files / JAR files and the JVM.
This means that you can have such naming conflicts in JAR files, but you'll never see it as output from javac. The tool that has produced these class / package names is most likely a code obfuscator which produces this kind of messy code to compress the size of the files and to obfuscate the code to prevent reverse engineering.
PS. At a closer look it may actually be a bug on the Eclipse side (assuming that's the IDE you're talking about). By letting an empty package name collide with a class name, Eclipse chokes on something javac accepts. The spec is hard to follow, but from what I can see, javac follows the spec in this case.
This is a common issue when decompiling jars.
The Compiler will get confused when there is a class and a subpackage with the same name. If you don't find a compiler with the option to append a prefix regarding the type(package, class variable) you have to refactor the source files. You can do that with regex by for example renaming every package declaration and import from
import A.B.C
to something like
import pkgA.pkgB.C.
Of course you can't do that for the external packages from the sdk or other libraries but most of the time the used obfuscator renames them in the same way so for renaming to letters from A-Z you could use something like:
RegexFindAll("import\s+(?:[A-Z]\s*.\s*)*([A-Z])\s*.\s*(?:[A-Z]\s*.\s*)*[A-Z]\s*;")
RegexFindAll("package\s+(?:([A-Z])\s*.\s*)*([A-Z])\s*;")
And from there on you can rename every package. If your IDE doesn't offer such functionality you can also rely on the terminal with following commands.
To find all the files by name recursively(extendable with filename filter)
find -follow from https://stackoverflow.com/a/105249/4560817
To iterate over the found filenames
sudo find . -name *.mp3 |
while read filename
do
echo "$filename" # ... or any other command using $filename
done
from https://stackoverflow.com/a/9391044/4560817
To replace text inside a file with regex
sed -i 's/original/new/g' file.txt from https://askubuntu.com/a/20416
You need to do this:
com.xx.a.a ma = new com.xx.a.a();
Or import the package:
import com.xx.a;
a ma = new a();
The library is likely obfuscated (e.g. using proguard) to reduce size, prevent reverse engineering and "hide" stuff you're not supposed to use. Even if you manage to create an instance of this class, I would recommend against it, as you don't know what it will do or how it can/should be used.
we can not do this in java:
com.xx.A
com.xx.A.yy
the package name clashes with a class in the parent package,.

When you have multiple package-private classes in a file, can they be referred to from other files in the package?

According to the JLS, it is valid syntax to have multiple classes in one file, so long as only a single class in the file is public. As I understand it, this is usually to allow small classes referred to only in a single file to be maintained within that file.
One area I'm not sure about is if other files in the same package are able to safely refer to that second class in the original file - by the scoping rules it would seem valid, but I'm not sure if it is a problem while compiling. I have seen it work quite frequently, but I've also been told by other developers on the project that there are occasional build issues finding the symbol in question after making changes elsewhere in the system. Is this setup of referring to package private classes embedded in other class' files introducing some sort compilation order dependency into the process that is making the build fragile?
Yes, that should be absolutely fine - unless you've got a badly-configured build system, basically. You should probably be compiling all the source for the same package in one go anyway.
I can see it potentially causing a problem for some build systems which try to work out what needs recompiling - if they assume that the name of the source file matches the name of the resulting class, they could get confused here (even if you don't have multiple classes in the same file) but that's a tool problem rather than a language problem.
Note that normally if I have "small classes referred to only in a single file" I'd normally make them private static nested classes:
public class OuterClass
{
// Normal code...
// Only used within OuterClass
private static class Foo
{
}
}
That's cleaner (IMO) than giving something package-private access, if it's really only intended to be used from a single class.

How Do I Place Auto-generated Java Classes in a Single .java File?

As everyone knows - public java classes must be placed in their own file named [ClassName].java
( When java class X required to be placed into a file named X.java? )
However, we are auto-generating 50+ java classes, and I'd like to put them all in the same file for our convenience. This would make it substantially easier to generate the file(s), and copy them around when we need to.
Is there any way I can get around this restriction? It seems like more of a stylistic concern - and something I might be able to disable with a compiler flag.
If not, what would you recommend?
Can you put wrapper class around your classes? Something like:
public class Wrapper {
public static class A {...}
public static class B {...}
....
}
Then you can access them via Wrapper.A, Wrapper.B.
At the .class level, this is a requirement per the Java spec. Even the inner classes get broken out into their own class file in the from Outer$Inner.class. I believe the same is true at the language level.
Your best bet is to generate the files and make your copy script smart. Perhaps generate them and zip them up. Usually, if you have to move these files around then either everyone has the same generator script OR you distribute them as a JAR.
Is there any way I can get around this restriction?
You can change your generated source code to make it acceptable; e.g. by using nested classes, by putting the generated classes into their own package.
It seems like more of a stylistic concern - and something I might be able to disable with a compiler flag.
It is not just a stylistic concern:
The one file per class rule is allowed by the Java Language Specification.
It is implemented by all mainstream Java compilers.
It is implemented by all mainstream JVMs in the form of the default classloader behavior.
It is assumed by 3rd party Java tools; e.g. IDEs, style checkers, bug checkers, code generation frameworks, etc.
In short, while it would theoretically be legal to implement a Java ecosystem that didn't have this restriction, it is impractical. No such compiler switch exists, and implementing one would be impractical for the reasons above.
The nested class solution is a good one. Another alternative would be to put the generated classes into a separate package (but with separate file) to make them easier to manage.

What is the variance of java .class files across different compilers, versions, dependencies?

Hi I was wondering how much Java class files change across different compilers. So how much do the actual bytes change if a .java files is compiled by say a Sun JDK 1.4, 1.5 1.6 or even IBM JDK. I know that class files can be different with regards to debug information and obfuscation, but let's assume for the question that those options are the same, so debug information included, no obfuscation. If I ran a MD5 or SHA-1 has on a .class file that was compiled by JDK 1.4 would the Hash be different if I compiled it in JDK 1.5 but targeting 1.4 what when targeting JDK 1.5?
Also related to that, does a binary of a class file change when different dependencies are used, or asked differently can the binary of a class file change based on it's dependencies ?
And last but not least are there programmatic ways to analyse the metadata of a .class file in order to identify compiler version and or switches that were used when compiling it ?
The Java compilers have quite some freedom when creating classes and bytecode from source. They can reorder the methods, reorder the constant pool (with class names, method names and strings - this results in different method byte code, too) and reorder the actual byte code commands, as long as the result when executing them is the same.
So, using MD5 or similar hashes to prove that two class files came from the same source is not really sensible.
For the format of class files, see http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html
Yes, class files can and usually do change depending on which specific compiler is used to build them. There are many compiler implementation details which will result into different bytecode -- e.g. listing dependencies in different orders in the interfaces[] or fields[] arrays. Plus compilers are free to use different optimizations.
Adding or removing an "import" statement does not necessarily change the class file -- but using a class in one package instead of another certainly would. Not sure if this answers your 2nd question.
I don't believe compilers leave their identity in class files. Any such analysis would need to be indirect and most likely heuristic (in the lines of telling the author of a book by its style) -- unless you've got the source code and can compile with each compiler and compare.
Paŭlo has answered your question about hashing well. As for your other question:
Also related to that, does a binary of a class file change when different dependencies are used, or asked differently can the binary of a class file change based on it's dependencies ?
Yes. The class file contains signatures for all methods invoked, and these could have changed. Consider:
void test() {
Foo.bar(1,2);
}
where Foo in version 1 is defined by:
class Foo {
public static void bar(int x, int y) {
// do something
}
}
and in version 2 by:
class Foo {
public static <T> T bar(T... ts) {
// do something
}
}

How to determine which classes are referenced in a compiled .Net or Java application?

I wonder if there's an easy way to determine which classes from a library are "used" by a compiled .NET or Java application, and I need to write a simple utility to do that (so using any of the available decompilers won't do the job).
I don't need to analyze different inputs to figure out if a class is actually created for this or that input set - I'm only concerned whether or not the class is referenced in the application. Most likely the application would subclass from the class I look for and use the subclass.
I've looked through a bunch of .Net .exe's and Java .classes with a hex editor and it appears that the referenced classes are spelled out in plaintext, but I am not sure if it will always be the case - my knowledge of MSIL/Java bytecode is not enough for that. I assume that even though the application itself can be obfuscated, it'll still have to call the library classes by the original name?
Extending what overslacked said.
EDIT: For some reason I thought you asked about methods, not types.
Types
Like finding methods, this doesn't cover access through the Reflection API.
You have to locate the following in a Reflector plugin to identify referenced types and perform a transitive closure:
Method parameters
Method return types
Custom attributes
Base types and interface implementations
Local variable declarations
Evaluated sub-expression types
Field, property, and event types
If you parse the IL yourself, all you have to do is process from the main assembly is the TypeRef and TypeSpec metadata, which is pretty easy (of course I'm speaking from parsing the entire byte code here). However, the transitive closure would still require you process the full byte code of each referenced method in the referenced assembly (to get the subexpression types).
Methods
If you can write a plugin for Reflector to handle the task, it will definitely be the easiest way. Parsing the IL is non-trivial, though I've done it now so I would just use that code if I had to (just saying it's not impossible). :D
Keep in mind that you may have method dependencies you don't see on the first pass that neither method mentioned will catch. These are due to indirect dispatch via the callvirt (virtual and interface method calls) and calli (generally delegates) instructions. For each type T created with newobj and for each method M within the type, you'll have to check all callvirt, ldftn, and ldvirtftn instructions to see if the base definition for the target (if the target is a virtual method) is the same as the base method definition for M in T or M is in the type's interface map if the target is an interface method. This is not perfect, but it is about the best you can do for static analysis without a theorem prover. It is a superset of the actual methods that will be called outside of the Reflection API, and a subset of the full set of methods in the assembly(ies).
For .NET: it looks like there's an article on MSDN that should help you get started. For what it's worth, for .NET the magic Google words are ".net assembly references".
In Java, the best mechanism to find class dependencies (in a programmatic fashion) is through bytecode inspection. This can be done with libraries like BCEL or (preferably) ASM. If you wish to parse the class files with your own code, the class file structure is well documented in the Java VM specification.
Note that class inspection won't cover runtime dependencies (like classes loaded using the service API).

Categories