Is the creation of Java class files deterministic?

Is the creation of Java class files deterministic? - java

When using the same JDK (i.e. the same javac executable), are the generated class files always identical? Can there be a difference depending on the operating system or hardware? Except of the JDK version, could there be any other factors resulting in differences? Are there any compiler options to avoid differences? Is a difference only possibly in theory or does Oracle's javac actually produce different class files for the same input and compiler options?
Update 1 I'm interested in the generation, i.e. compiler output, not whether a class file can be run on various platforms.
Update 2 By 'Same JDK', I also mean the same javac executable.
Update 3 Distinction between theoretical difference and practical difference in Oracle's compilers.
[EDIT, adding paraphrased question]
"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"

Let's put it this way:
I can easily produce an entirely conforming Java compiler that never produces the same .class file twice, given the same .java file.
I could do this by tweaking all kinds of bytecode construction or by simply adding superfluous attributes to my method (which is allowed).
Given that the specification does not require the compiler to produce byte-for-byte identical class files, I'd avoid depending such a result.
However, the few times that I've checked, compiling the same source file with the same compiler with the same switches (and the same libraries!) did result in the same .class files.
Update: I've recently stumbled over this interesting blog post about the implementation of switch on String in Java 7. In this blog post, there are some relevant parts, that I'll quote here (emphasis mine):
In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using HashMap and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes .
This pretty clearly illustrates the issue: The compiler is not required to act in a deterministic manner, as long as it matches the spec. The compiler developers, however, realize that it's generally a good idea to try (provided it's not too expensive, probably).

There is no obligation for the compilers to produce the same bytecode on each platform. You should consult the different vendors' javac utility to have a specific answer.
I will show a practical example for this with file ordering.
Let's say that we have 2 jar files: my1.jar and My2.jar. They're put in the lib directory, side-by-side. The compiler reads them in alphabetical order (since this is lib), but the order is my1.jar, My2.jar when the file system is case insensitive , and My2.jar, my1.jar if it is case sensitive.
The my1.jar has a class A.class with a method
public class A {
public static void a(String s) {}
}
The My2.jar has the same A.class, but with different method signature (accepts Object):
public class A {
public static void a(Object o) {}
}
It is clear that if you have a call
String s = "x";
A.a(s);
it will compile a method call with different signature in different cases. So, depending on your filesystem case sensitiveness, you will get different class as a result.

Short Answer - NO
Long Answer
They bytecode need not be the same for different platform. It's the JRE (Java Runtime Environment) which know how exactly to execute the bytecode.
If you go through the Java VM specification you'll come to know that this needs not to be true that the bytecode is same for different platforms.
Going through the class file format, it shows the structure of a class file as
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Checking about the minor and major version
minor_version, major_version
The values of the minor_version and
major_version items are the minor and major version numbers of this
class file.Together, a major and a minor version number determine the
version of the class file format. If a class file has major version
number M and minor version number m, we denote the version of its
class file format as M.m. Thus, class file format versions may be
ordered lexicographically, for example, 1.5 < 2.0 < 2.1. A Java
virtual machine implementation can support a class file format of
version v if and only if v lies in some contiguous range Mi.0 v
Mj.m. Only Sun can specify what range of versions a Java virtual
machine implementation conforming to a certain release level of the
Java platform may support.1
Reading more through the footnotes
1 The Java virtual machine implementation of Sun's JDK release 1.0.2
supports class file format versions 45.0 through 45.3 inclusive. Sun's
JDK releases 1.1.X can support class file formats of versions in the
range 45.0 through 45.65535 inclusive. Implementations of version 1.2
of the Java 2 platform can support class file formats of versions in
the range 45.0 through 46.0 inclusive.
So, investigating all this shows that the class files generated on different platforms need not be identical.

Firstly, there's absolutely no such guarantee in the spec. A conforming compiler could stamp the time of compilation into the generated class file as an additional (custom) attribute, and the class file would still be correct. It would however produce a byte-level different file on every single build, and trivially so.
Secondly, even without such nasty tricks about, there's no reason to expect a compiler to do exactly the same thing twice in a row unless both its configuration and its input are identical in the two cases. The spec does describe the source filename as one of the standard attributes, and adding blank lines to the source file could well change the line number table.
Thirdly, I've never encountered any difference in build due to the host platform (other than that which was attributable to differences in what was on the classpath). The code which would vary based on platform (i.e., native code libraries) isn't part of the class file, and the actual generation of native code from the bytecode happens after the class is loaded.
Fourthly (and most importantly) it reeks of a bad process smell (like a code smell, but for how you act on the code) to want to know this. Version the source if possible, not the build, and if you do need to version the build, version at the whole-component level and not on individual class files. For preference, use a CI server (such as Jenkins) to manage the process of turning source into runnable code.

I believe that, if you use the same JDK, the generated byte code will always be the same, without relation with the harware and OS used. The byte code production is done by the java compiler, that uses a deterministic algorithm to "transform" the source code into byte code. So, the output will always be the same. In these conditions, only a update on the source code will affect the output.

Overall, I'd have to say there is no guarantee that the same source will produce the same bytecode when compiled by the same compiler but on a different platform.
I'd look into scenarios involving different languages (code-pages), for example Windows with Japanese language support. Think multi-byte characters; unless the compiler always assumes it needs to support all languages it might optimize for 8-bit ASCII.
There is a section on binary compatibility in the Java Language Specification.
Within the framework of Release-to-Release Binary Compatibility in SOM
(Forman, Conner, Danforth, and Raper, Proceedings of OOPSLA '95), Java
programming language binaries are binary compatible under all relevant
transformations that the authors identify (with some caveats with
respect to the addition of instance variables). Using their scheme,
here is a list of some important binary compatible changes that the
Java programming language supports:
•Reimplementing existing methods, constructors, and initializers to
improve performance.
•Changing methods or constructors to return values on inputs for which
they previously either threw exceptions that normally should not occur
or failed by going into an infinite loop or causing a deadlock.
•Adding new fields, methods, or constructors to an existing class or
interface.
•Deleting private fields, methods, or constructors of a class.
•When an entire package is updated, deleting default (package-only)
access fields, methods, or constructors of classes and interfaces in
the package.
•Reordering the fields, methods, or constructors in an existing type
declaration.
•Moving a method upward in the class hierarchy.
•Reordering the list of direct superinterfaces of a class or
interface.
•Inserting new class or interface types in the type hierarchy.
This chapter specifies minimum standards for binary compatibility
guaranteed by all implementations. The Java programming language
guarantees compatibility when binaries of classes and interfaces are
mixed that are not known to be from compatible sources, but whose
sources have been modified in the compatible ways described here. Note
that we are discussing compatibility between releases of an
application. A discussion of compatibility among releases of the Java
SE platform is beyond the scope of this chapter.

Java allows you write/compile code on one platform and run on different platform.
AFAIK; this will be possible only when class file generated on different platform is same or technically same i.e. identical.
Edit
What i mean by technically same comment is that. They don't need to be exactly same if you compare byte by byte.
So as per specification .class file of a class on different platforms don't need to match byte-by-byte.

For the question:
"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"
The Cross-Compilation example shows how we can use the Javac option:-target version
This flag generates class files which are compatible with the Java version we specify while invoking this command. Hence the class files will differ depending on the attributes we supply during the compaliation using this option.

Most probably, the answer is "yes", but to have precise answer, one does need to search for some keys or guid generation during compiling.
I can't remember the situation where this occurs. For example to have ID for serializing purposes it is hardcoded, i.e. generated by programmer or IDE.
P.S. Also JNI can matter.
P.P.S. I found that javac is itself written in java. This means that it is identical on different platforms. Hence it would not generate different code without a reason. So, it can do this only with native calls.

There are two questions.
Can there be a difference depending on the operating system or hardware?
This is a theoretical question, and the answer is clearly, yes, there can be. As others have said, the specification does not require the compiler to produce byte-for-byte identical class files.
Even if every compiler currently in existence produced the same byte code in all circumstances (different hardware, etc.), the answer tomorrow might be different. If you never plan to update javac or your operating system, you could test that version's behavior in your particular circumstances, but the results might be different if you go from, for example, Java 7 Update 11 to Java 7 Update 15.
What are the circumstances where the same javac executable, when run on a different platform, will produce different bytecode?
That's unknowable.
I don't know if configuration management is your reason for asking the question, but it's an understandable reason to care. Comparing byte codes is a legitimate IT control, but only to determine if the class files changed, not top determine if the source files did.

I would put it another way.
First, I think the question is not about being deterministic:
Of course it is deterministic: randomness is hard to achieve in computer science, and there is no reason a compiler would introduce it here for any reason.
Second, if you reformulate it by "how similar are bytecode files for a same sourcecode file ?", then No, you can't rely on the fact that they will be similar.
A good way of making sure of this, is by leaving the .class (or .pyc in my case) in your git stage. You'll realize that among different computers in your team, git notices changes between .pyc files, when no changes were brought to the .py file (and .pyc recompiled anyway).
At least that's what I observed. So put *.pyc and *.class in your .gitignore !

Related

Reverse Engineer Java *.class file to change data type of variable

I have a problem with an old application which runs on a Java Tomcat server and the source code for the application is not fully available, but the .class files are obviously all running on the tomcat server.
Can I somehow manipulate the bytecode of a .class file (used by JVM) so that I can change a variables datatype (because this is what has to be done)? Or even reverse engineer it to its old .java source code?
I have used decompilers and javap command up to now. Can I somehow copy the whole Tomcat application and:
decompile it
do my changes
recompile it?

Well, if you decompile it to make changes and recompile, then you're not going to need to change the byte code directly.
If you change the type, you'll have to change the type of any methods (like getters and setters) that use the variable. Then you'll need to change the calls of any methods in all classes that CALL those methods, and the types of their variables that hold these values, etc. The good news is that, if you manage to decompile it successfully, your IDE will tell you where all those places are, assuming the new type is incompatible with the old type.
I would evaluate this as "theoretically possible", but problematic. With the little information you've given us, there's no way to know the size of the job AFTER you successfully decompile the entire application.

I have a wild and crazy idea; I haven't done this, but as long as we're talking about things that are theoretically possible...
IF you manage to decompile all the code and get a system that you can recompile and run (and I strongly recommend you do that before you make any changes), if you are able to identify where the int is that you want to replace with a long, and then all the direct and indirect references to it, hopefully (because it's just this file size limit that you mention elsewhere) you end up with only a handful of classes.
The decompile should tell you their names. Create new classes with the exact same names, containing (of course) all their decompiled code. Change the methods that you need to change.
Now put those classes in a jar that is searched before the jar containing the application. You're limiting the number of classes for which you're providing new .class files to just those. This makes it easier to see exactly what has been changed, for future programmers, if it doesn't do anything else. It's possible because of the way Java handles its runtime; the step that's equivalent to 'linking' in a traditional compiled non-virtual-machine language happens when the class is loaded, instead of at compile time.

I did that. Not exactly that, but something very similar. Instead of decompiling and recompiling, which is very long, and tedious, I directly edited the byte-code of the class file.
pros
You do not need to compile anything at all, you just edit a file
no SDK, no IDE, etc is necessary, just a java-byte code editor
for small changes you can get away with single method modification
cons
very-very error-prone even if you know what you are doing
no way to track changes as you do with git
will probably require modifying all dependent classes
you should have some knowledge about how compiled code looks like, and behaves before even attempting such a thing.
you will most likely break a law or two since this will not be for "educational" purposes
you will be marked as "the hacker" and every odd job will be forwarded to you
PS: I had to edit licensing class of a product to allow more users. The company writing it ceased to exist, so buying was not an option. We switched to a new product anyway, it was just temporarily.

How many classes can be put in a Java package?

Is there any limit on number of Java classes(Both Public and Non Public) that can be put in a Java Package?

tl;dr: the value is implementation-dependent since it's not defined anywhere in the JVM or Java spec.
Practically speaking, it could be somewhere between maxUint16 and maxUint64.
There many limits due to the nature of JVM, but classes and packages count is not one of them.
I find this sentence from a Java spec worthwhile in this context (source 7.2):
For example, a system that uses a database to store packages may not enforce a maximum of one public class or interface per compilation unit.
It does encourage JVM implementation to avoid such a limit even if it uses something other than a file system for storing compilation artifacts like packages and classes.
As noted in one of the comments to your question, there can be technical limits, but they're probably high enough for most use cases. It can also vary a lot depending on the implementation you're using. That being said, you probably can have millions of packages and classes without any issues.
For completeness and reachability, I'll inline a #Holger comment here:
There is no limit in the specification, so you can put classes into the package until hitting a technical limitation. If not hitting a limit at the file system or archive format, the runtime implementation likely uses arrays or collections to hold the classes, which limits the number to something close to 2³¹. You may hit memory limitations before…

How safe is it to use -XX:-UseSplitVerifier?

There are known compatibility issues with JDK7 compiled code using instrumentation.
As for http://www.oracle.com/technetwork/java/javase/compatibility-417013.html
Classfiles with version number 51 are exclusively verified using the type-checking verifier, and thus the methods must have StackMapTable attributes when appropriate. For classfiles with version 50, the Hotspot JVM would (and continues to) failover to the type-inferencing verifier if the stackmaps in the file were missing or incorrect. This failover behavior does not occur for classfiles with version 51 (the default version for Java SE 7).
Any tool that modifies bytecode in a version 51 classfile must be sure to update the stackmap information to be consistent with the bytecode in order to pass verification.
The solution is to use -XX:-UseSplitVerifier as summarised here:
https://community.oracle.com/blogs/fabriziogiudici/2012/05/07/understanding-subtle-new-behaviours-jdk-7
How safe it is? I suppose Oracle has put this check in for a reason. If I don't use it, I may be risking some other issues.
What can be consequences of using -XX:-UseSplitVerifier?
Thanks,
Piotr.

In short, it's perfectly safe.
Since Java 6, Oracle's compiler has made class files with a StackMapTable. The basic idea is that the compiler can explicitly specify what the type of an object is, instead of making the runtime do it. That provides a tiny speedup in the runtime, in exchange for some extra time during compile and some complexity in the compiled class file (the aforementioned StackMapTable).
As an experimental feature, it was not enabled by default in the Java 6 compiler. The runtime defaults to verifying the object types itself if no StackMapTable exists.
Until Java 7. Oracle made it mandatory: the compiler generates them, and the runtime verifies them. It still uses the old verifier if the StackMapTable isn't there... but only on class files from Java 6 or earlier (version 50). Java 7 class files (version 51) are required to use the StackMapTable, and so the runtime won't cut them the same slack.
That's only a problem if your classfiles were generated without a StackMapTable. For instance, if you used a non-Oracle JVM. Or if you messed with bytecode afterwards -- like instrumenting it for use with a debugger, optimizer, or code coverage analyzer.
But you can get around it! Oracle's JVM provides the -XX:+UseSplitVerifier to force the runtime to fallback to the old type verifier. It doesn't care about StackMapTable.
In practice, the hoped-for optimization in runtime speed and efficiency hasn't materialized: if it exists, it hasn't been enough for anyone to notice. As the new type verifier doesn't provide any new features (just the optimization), it's perfectly safe to shut it off.
Oracle's explanation is at http://www.oracle.com/technetwork/java/javase/compatibility-417013.html if you search for JSR 202.

Yes -- it's safe. As Judebert says, it just slows class loading slightly.
To add a little more info: What exactly is a StackMap Table? Well, the Bytecode verifier needs to make two passes over the code in the class file to validate proper types of data are being passed around and used. The first pass, which is the slower one, does flow analysis of all the code's branches to see what type of data could be on the stack at each bytecode instruction. The second pass looks at each instruction to see if it can validly operate on all those types.
Here's the key: the compiler already has all the information at hand that the first pass generates - so (in Java 6 & 7) it stores it in a StackMap table in the class file.
This speeds up class loading because the class loader doesn't have to do that first pass. That's why it's called a Split Verifier, because the work is split between the compiler and the runtime loading mechanism. When you use the -XX:-UseSplitVerifier option, you tell Java to do both passes at class load time (and to ignore any StackMap table). Many products (like profilers that modify bytecode at load time) did not know about the StackMap table initially, so when they modified classes at load time, the StackMap table from the compiler was out of date and caused errors.
SO, to summarize, the -XX:-UseSplitVerifier option slows class loading. It does not affect security, runtime performance or functionality.

Stack Map Frames were added in Java 7 and "prashant" argues that the idea is flawed and proposes that developers always use the -XX:-UseSplitVerifier flag to avoid using them.
Read more: Java 7 Bytecode Verifier: Huge backward step for the JVM

Can I remove any implicitly imported Java library?

Can I remove any implicitly imported Java library?
It may not seem useful.
But I think it may reduce some execution time!

Imports are just syntactic sugar. All they do is let you access things in other packages without having to state their fully qualified name. The code that is produced is exactly the same as if you fully-qualified everything. So there is no runtime performance penalty to having imports.
This also goes for the "implicit imports" (ie: java.lang): you don't pay any price for the classes you don't actually use.

This will have no effect on execution type - I think I'm correct in saying that, by default, classes are only loaded as and when they are needed, not on mass at start-up.
To improve performance you need to profile your application with a tool like Visual VM and address the bottlenecks it identifies (which will never be where you'd expect).

Java doesn't include all of the classes in java.lang.* in your program. The compiler only includes the ones you explicitly use (or are used by classes you use, etc.).

What settings affect the layout of compiled java .class files? How can you tell if two compiled classes are equal?

I have an app that was compiled with the built-in Eclipse "Compile" task. Then I decided to move the build procedure into Ant's javac, and the result ended being smaller files.
Later I discovered that adjusting the debuglevel to "vars,lines,source" I could embed the same debug information that Eclipse did, and in a lot of cases files stayed exactly the same size but the internal layout was different. And as a consequence, I couldn't use md5sum signatures to determine if they were exactly the same version.
Besides debug information, what can be the reason that 2 supposedly equal files get a different internal layout or size?
And how can you compare compiled .class files?

THere is no required order for things such as the order of the constant pool entries (essentially all of the symbol info) as well as the attributes for each field/method/class. Different compilers are free to write out in whatever order they want.
You can compared compiled classes, but you would need to dig into the class file structure and parse it. There are libraries out there for doing that, like BCEL or ASM, but I am not 100% sure they will help you with what you want to do.

The ASM Eclipse plugin has a bytecode comparer in it. You select two classes, right click, and do Compare With / Each Other Bytecode.

An important thing to note is that Eclipse does not use javac. Eclipse has its own compiler, the JDT, and so differences in the resulting .class files do not surprise me. I'd expect them to not be verbatim, because they are different compilers.
Due to their differences, there exists code that compiles with javac but not JDT, and vice versa. Typically I have seen the differences in the two become apparent in cases of heavy use of generics

Most importantly, the stack slots for local variables can be arranged arbitrarily without changing the semantics of the code. So basically, you cannot compare compiled class files without parsing and normalizing them - quite a lot of effort.
Why do you want to do that anyway?

as Michale B said, it can be arbitrary.
I work on systems that are using file sizes as security. If the .class files change in size, the class won't be given certain permissions.
Normally that would be easy to get around, but we have fairly complete control over the environment, so it's actually pretty functional.
Anyway, any time the classes that are watched are recompiled, it seems, we have to recalculate the size.
Another thing--a special key number is generated when the file is compiled. I don't know much about this, but it often prevents classes from working together. I believe the procedure is, compile class A and save it (call it a1). compile class a again (a2). Compile class b against class a2. Try to run b against a1. I believe that in this case it will fail at runtime.
If you could learn more about that key number, it might give you the info you are after.

For the comparisson you can decompile your class files and play with the sources generated. See this.

Is Eclipse doing some instrumentation to assist with running in the debugger?

Ultimately the configurations being used are probably making the difference. Assuming they are using the same versions of Java, there are a host of options that are available for the compile configuration (JDK compliance, class file compatibility and a host of debugging information options).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Is the creation of Java class files deterministic? - java

Related

Reverse Engineer Java *.class file to change data type of variable

How many classes can be put in a Java package?

How safe is it to use -XX:-UseSplitVerifier?

Can I remove any implicitly imported Java library?

What settings affect the layout of compiled java .class files? How can you tell if two compiled classes are equal?

Categories

Resources