I have a binary class which I want to load, but I don't have it's dependencies.
I still want to load it though, to get it's qualified name.
I understand that I will not be able to use it for anything else (and that is ok), I just need the qualified name.
So is there a way to do this with a custom class loader ?
Thanks,
As for strictly answering the question: see section on symbolic references resolution in the JVM specification. In short: little is guaranteed as to when resolution is performed. I'm not sure what is the precise behaviour of current implementations, but if you go this way, the solution will not be completely reliable, even though it will probably work.
You said you only need its qualified name, though. That's a different story, that's pretty easy, in fact. JVM specification describes in detail the format of the class file. Since you have the binaries, you can extract it directly from the data, bypassing classloading mechanism of the JVM completely. If you don't want to do it by hand, use appropriate tools - ASM comes to mind, with its wonderful, detailed documentation. Some alternatives are BCEL, javassist and CGLIB (no longer maintained).
When using the same JDK (i.e. the same javac executable), are the generated class files always identical? Can there be a difference depending on the operating system or hardware? Except of the JDK version, could there be any other factors resulting in differences? Are there any compiler options to avoid differences? Is a difference only possibly in theory or does Oracle's javac actually produce different class files for the same input and compiler options?
Update 1 I'm interested in the generation, i.e. compiler output, not whether a class file can be run on various platforms.
Update 2 By 'Same JDK', I also mean the same javac executable.
Update 3 Distinction between theoretical difference and practical difference in Oracle's compilers.
[EDIT, adding paraphrased question]
"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"
Let's put it this way:
I can easily produce an entirely conforming Java compiler that never produces the same .class file twice, given the same .java file.
I could do this by tweaking all kinds of bytecode construction or by simply adding superfluous attributes to my method (which is allowed).
Given that the specification does not require the compiler to produce byte-for-byte identical class files, I'd avoid depending such a result.
However, the few times that I've checked, compiling the same source file with the same compiler with the same switches (and the same libraries!) did result in the same .class files.
Update: I've recently stumbled over this interesting blog post about the implementation of switch on String in Java 7. In this blog post, there are some relevant parts, that I'll quote here (emphasis mine):
In order to make the compiler's output predictable and repeatable, the maps and sets used in these data structures are LinkedHashMaps and LinkedHashSets rather than just HashMaps and HashSets. In terms of functional correctness of code generated during a given compile, using HashMap and HashSet would be fine; the iteration order does not matter. However, we find it beneficial to have javac's output not vary based on implementation details of system classes .
This pretty clearly illustrates the issue: The compiler is not required to act in a deterministic manner, as long as it matches the spec. The compiler developers, however, realize that it's generally a good idea to try (provided it's not too expensive, probably).
There is no obligation for the compilers to produce the same bytecode on each platform. You should consult the different vendors' javac utility to have a specific answer.
I will show a practical example for this with file ordering.
Let's say that we have 2 jar files: my1.jar and My2.jar. They're put in the lib directory, side-by-side. The compiler reads them in alphabetical order (since this is lib), but the order is my1.jar, My2.jar when the file system is case insensitive , and My2.jar, my1.jar if it is case sensitive.
The my1.jar has a class A.class with a method
public class A {
public static void a(String s) {}
}
The My2.jar has the same A.class, but with different method signature (accepts Object):
public class A {
public static void a(Object o) {}
}
It is clear that if you have a call
String s = "x";
A.a(s);
it will compile a method call with different signature in different cases. So, depending on your filesystem case sensitiveness, you will get different class as a result.
Short Answer - NO
Long Answer
They bytecode need not be the same for different platform. It's the JRE (Java Runtime Environment) which know how exactly to execute the bytecode.
If you go through the Java VM specification you'll come to know that this needs not to be true that the bytecode is same for different platforms.
Going through the class file format, it shows the structure of a class file as
ClassFile {
u4 magic;
u2 minor_version;
u2 major_version;
u2 constant_pool_count;
cp_info constant_pool[constant_pool_count-1];
u2 access_flags;
u2 this_class;
u2 super_class;
u2 interfaces_count;
u2 interfaces[interfaces_count];
u2 fields_count;
field_info fields[fields_count];
u2 methods_count;
method_info methods[methods_count];
u2 attributes_count;
attribute_info attributes[attributes_count];
}
Checking about the minor and major version
minor_version, major_version
The values of the minor_version and
major_version items are the minor and major version numbers of this
class file.Together, a major and a minor version number determine the
version of the class file format. If a class file has major version
number M and minor version number m, we denote the version of its
class file format as M.m. Thus, class file format versions may be
ordered lexicographically, for example, 1.5 < 2.0 < 2.1. A Java
virtual machine implementation can support a class file format of
version v if and only if v lies in some contiguous range Mi.0 v
Mj.m. Only Sun can specify what range of versions a Java virtual
machine implementation conforming to a certain release level of the
Java platform may support.1
Reading more through the footnotes
1 The Java virtual machine implementation of Sun's JDK release 1.0.2
supports class file format versions 45.0 through 45.3 inclusive. Sun's
JDK releases 1.1.X can support class file formats of versions in the
range 45.0 through 45.65535 inclusive. Implementations of version 1.2
of the Java 2 platform can support class file formats of versions in
the range 45.0 through 46.0 inclusive.
So, investigating all this shows that the class files generated on different platforms need not be identical.
Firstly, there's absolutely no such guarantee in the spec. A conforming compiler could stamp the time of compilation into the generated class file as an additional (custom) attribute, and the class file would still be correct. It would however produce a byte-level different file on every single build, and trivially so.
Secondly, even without such nasty tricks about, there's no reason to expect a compiler to do exactly the same thing twice in a row unless both its configuration and its input are identical in the two cases. The spec does describe the source filename as one of the standard attributes, and adding blank lines to the source file could well change the line number table.
Thirdly, I've never encountered any difference in build due to the host platform (other than that which was attributable to differences in what was on the classpath). The code which would vary based on platform (i.e., native code libraries) isn't part of the class file, and the actual generation of native code from the bytecode happens after the class is loaded.
Fourthly (and most importantly) it reeks of a bad process smell (like a code smell, but for how you act on the code) to want to know this. Version the source if possible, not the build, and if you do need to version the build, version at the whole-component level and not on individual class files. For preference, use a CI server (such as Jenkins) to manage the process of turning source into runnable code.
I believe that, if you use the same JDK, the generated byte code will always be the same, without relation with the harware and OS used. The byte code production is done by the java compiler, that uses a deterministic algorithm to "transform" the source code into byte code. So, the output will always be the same. In these conditions, only a update on the source code will affect the output.
Overall, I'd have to say there is no guarantee that the same source will produce the same bytecode when compiled by the same compiler but on a different platform.
I'd look into scenarios involving different languages (code-pages), for example Windows with Japanese language support. Think multi-byte characters; unless the compiler always assumes it needs to support all languages it might optimize for 8-bit ASCII.
There is a section on binary compatibility in the Java Language Specification.
Within the framework of Release-to-Release Binary Compatibility in SOM
(Forman, Conner, Danforth, and Raper, Proceedings of OOPSLA '95), Java
programming language binaries are binary compatible under all relevant
transformations that the authors identify (with some caveats with
respect to the addition of instance variables). Using their scheme,
here is a list of some important binary compatible changes that the
Java programming language supports:
•Reimplementing existing methods, constructors, and initializers to
improve performance.
•Changing methods or constructors to return values on inputs for which
they previously either threw exceptions that normally should not occur
or failed by going into an infinite loop or causing a deadlock.
•Adding new fields, methods, or constructors to an existing class or
interface.
•Deleting private fields, methods, or constructors of a class.
•When an entire package is updated, deleting default (package-only)
access fields, methods, or constructors of classes and interfaces in
the package.
•Reordering the fields, methods, or constructors in an existing type
declaration.
•Moving a method upward in the class hierarchy.
•Reordering the list of direct superinterfaces of a class or
interface.
•Inserting new class or interface types in the type hierarchy.
This chapter specifies minimum standards for binary compatibility
guaranteed by all implementations. The Java programming language
guarantees compatibility when binaries of classes and interfaces are
mixed that are not known to be from compatible sources, but whose
sources have been modified in the compatible ways described here. Note
that we are discussing compatibility between releases of an
application. A discussion of compatibility among releases of the Java
SE platform is beyond the scope of this chapter.
Java allows you write/compile code on one platform and run on different platform.
AFAIK; this will be possible only when class file generated on different platform is same or technically same i.e. identical.
Edit
What i mean by technically same comment is that. They don't need to be exactly same if you compare byte by byte.
So as per specification .class file of a class on different platforms don't need to match byte-by-byte.
For the question:
"What are the circumstances where the same javac executable,when run on a different platform, will produce different bytecode?"
The Cross-Compilation example shows how we can use the Javac option:-target version
This flag generates class files which are compatible with the Java version we specify while invoking this command. Hence the class files will differ depending on the attributes we supply during the compaliation using this option.
Most probably, the answer is "yes", but to have precise answer, one does need to search for some keys or guid generation during compiling.
I can't remember the situation where this occurs. For example to have ID for serializing purposes it is hardcoded, i.e. generated by programmer or IDE.
P.S. Also JNI can matter.
P.P.S. I found that javac is itself written in java. This means that it is identical on different platforms. Hence it would not generate different code without a reason. So, it can do this only with native calls.
There are two questions.
Can there be a difference depending on the operating system or hardware?
This is a theoretical question, and the answer is clearly, yes, there can be. As others have said, the specification does not require the compiler to produce byte-for-byte identical class files.
Even if every compiler currently in existence produced the same byte code in all circumstances (different hardware, etc.), the answer tomorrow might be different. If you never plan to update javac or your operating system, you could test that version's behavior in your particular circumstances, but the results might be different if you go from, for example, Java 7 Update 11 to Java 7 Update 15.
What are the circumstances where the same javac executable, when run on a different platform, will produce different bytecode?
That's unknowable.
I don't know if configuration management is your reason for asking the question, but it's an understandable reason to care. Comparing byte codes is a legitimate IT control, but only to determine if the class files changed, not top determine if the source files did.
I would put it another way.
First, I think the question is not about being deterministic:
Of course it is deterministic: randomness is hard to achieve in computer science, and there is no reason a compiler would introduce it here for any reason.
Second, if you reformulate it by "how similar are bytecode files for a same sourcecode file ?", then No, you can't rely on the fact that they will be similar.
A good way of making sure of this, is by leaving the .class (or .pyc in my case) in your git stage. You'll realize that among different computers in your team, git notices changes between .pyc files, when no changes were brought to the .py file (and .pyc recompiled anyway).
At least that's what I observed. So put *.pyc and *.class in your .gitignore !
I'm new to Java. I've discovered, while trying to structure my code, that Java intimately ties source file organisation (directory structure) to package structure and package structure to external visibility of classes (a class is either visible to all other packages, or none).
This makes it quite difficult to organise the internal implementation details of my public library into logical units of related functionality while maintaining good encapsulation. JSR 294 explains it best:
Today, an implementation can be partitioned into multiple packages.
Subparts of such an implementation need to be more tightly coupled to
each other than to the surrounding software environment. Today
designers are forced to declare elements of the program that are
needed by other subparts of the implementation as public - thereby
making them globally accessible, which is clearly suboptimal.
Alternately, the entire implementation can be placed in a single
package. This resolves the issue above, but is unwieldy, and exposes
all internals of all subparts to each other.
So my question is, what workarounds exist for this limitation, and what are the pros & cons? Two are mentioned in the JSR - use packages for logical grouping (violating encapsulation); place everything in a single package (unwieldy). Are there other pros/cons to these workarounds? Are there other solutions? (I've become vaguely aware of OSGi bundles, but I've found it hard to understand how they work and what the the pros/cons might be (perhaps that's a con). It appears to be very intrusive compared to vanilla packages, to development & deployment.
Note: I'll upvote any good answers, but the the best answer will be one that comprehensively folds in the pros & cons of others (plagiarise!).
Related (but not duplicate!) questions
Anticipating cries of 'Possible duplicate', here are similar questions that I've found on SO; I present them here for reference and also to explain why they don't answer my question.
Java : Expose only a single package in a jar file: asks how to do it, but given that it's not possible in current releases of Java, doesn't discuss workarounds. Has interesting pointers to forthcoming Modularization (Project Jigsaw) in Java 8.
Package and visibility - duplicate question of the above, basically.
Best practice for controlling access to a ".internal" package - question and answers seem to be specific to OSGi or Eclipse plug-ins.
Tools like ProGuard can be used to repackage a JAR, exposing only those classes you specify in the configuration file. (It does this in addition to optimizing, inlining, and obfuscating.) You might be able to set up ProGuard in e.g. a Maven or Ant build, so you write your library exposing methods as public, and then use ProGuard to eliminate them from the generated JAR.
I'll get the ball rolling. Steal this answer and add to it/correct it/elaborate please!
Use multiple packages for multiple logical groupings
Pros: effective logical grouping of related code.
Cons: when internal implementation detail classes in different packages need to use one another, they must be made public - even to the end user - violating encapsulation. (Work around this by using a standard naming convention for packages containing internal implementation details such as .internal or .impl).
Put everything in one package
Pros: effective encapsulation
Cons: unwieldy for development/maintenance of the library if it contains many classes
Use OSGi bundles
Pros: ? (do they fix the problem?)
Cons: appears to be very intrusive at development (for both library user and author) and deployment, compared to just deploying .jar files.
Wait for Jigsaw in Java 8
http://openjdk.java.net/projects/jigsaw/
Pros: fixes the problem for good?
Cons: doesn't exist yet, not specific release date known.
I've never found this to be a problem. The workaround (if you want to call it that) is called good API design.
If you design your library well, then you can almost always do the following:
Put the main public API in one package e.g. "my.package.core" or just "my.package"
Put helper modules in other packages (according to logical groupings), but give each one it's own public API subset (e.g. a factory class like "my.package.foobarimpl.FoobarFactory")
The main public API package uses only the public API of helper modules
Your tests should also run primarily against the public APIs (since this is what you care about in terms of regressions or functionality)
To me the "right level of encapsulation" for a package is therefore to expose enough public API that your package can be used effectively as a dependency. No more and no less. It shouldn't matter whether it is being used by another package in the same library or by an external user. If you design your packages around this principle, you increase the chance of effective re-use.
Making parts of a package "globally accessible" really doesn't do any harm as long as your API is reasonably well designed. Remember that packages aren't object instances and as a result encapsulation doesn't matter nearly as much: making elements of a package public is usually much less harmful than exposing internal implementation details of a class (which I agree should almost always be private/protected).
Consider java.lang.String for example. It has a big public API, but whatever you do with the public API can't interfere with other users of java.lang.String. It's perfectly safe to use as a dependency from multiple places at the same time. On the other hand, all hell would break loose if you allowed users of java.lang.String to directly access the internal character array (which would allow in-place mutation of immutable Strings.... nasty!!).
P.S. Honourable mention goes to OSGi because it is a pretty awesome technology and very useful in many circumstances. However its sweet spot is really around deployment and lifecycle management of modules (stopping / starting / loading etc.). You don't really need it for code organisation IMHO.
Can I remove any implicitly imported Java library?
It may not seem useful.
But I think it may reduce some execution time!
Imports are just syntactic sugar. All they do is let you access things in other packages without having to state their fully qualified name. The code that is produced is exactly the same as if you fully-qualified everything. So there is no runtime performance penalty to having imports.
This also goes for the "implicit imports" (ie: java.lang): you don't pay any price for the classes you don't actually use.
This will have no effect on execution type - I think I'm correct in saying that, by default, classes are only loaded as and when they are needed, not on mass at start-up.
To improve performance you need to profile your application with a tool like Visual VM and address the bottlenecks it identifies (which will never be where you'd expect).
Java doesn't include all of the classes in java.lang.* in your program. The compiler only includes the ones you explicitly use (or are used by classes you use, etc.).
Is there any tool that lists which and when some classes are effectively used by an app or, even-better, automatically trims JAR libraries to only provide classes that are both referenced and used?
Bear in mind that, as proven by the halting problem, you can't definitely say that a particular class is or isn't used. At least on any moderately complex application. That's because classes aren't just bound at compile-time but can be loaded:
based on XML config (eg Spring);
loaded from properties files (eg JDBC driver name);
added dynamically with annotations;
loaded as a result of external input (eg user input, data from a database or remote procedure call);
etc.
So just looking at source code isn't enough. That being said, any reasonable IDE will provide you with dependency analysis tools. IntelliJ certainly does.
What you really need is runtime instrumentation on what your application is doing but even that isn't guaranteed. After all, a particular code path might come up one in 10 million runs due to a weird combination of inputs so you can't be guaranteed that you're covered.
Tools like this do have some value though. You might want to look at something like Emma. Profilers like Yourkit can give you a code dump that you can do an analysis on too (although that won't pick up transient objects terribly well).
Personally I find little value beyond what the IDE will tell you: removing unused JARs. Going more granular than that is just asking for trouble for little to no gain.
Yes, you want ProGuard. It's a completely free Java code shrinker and obfuscator. It's easy to configure, fast and effective.
You might try JarJar http://code.google.com/p/jarjar/
It trims the jar dependencies.
For most cases, you can do it quite easily using just javac.
Delete you existing class files. Call javac with the name of your entry classes. It will compile those classes necessary, but no more. Job done.