How to circumvent the "Method too large" error in Java Compilation?

How to circumvent the "Method too large" error in Java Compilation? - java

I have a parser written in bigloo scheme functional language which I need to compile into a java class. The whole of the parser is written as a single function. Unfortunately this is causing the JVM compiler to throw a "Method too large" warning and later give "far label in localvar" error. Is there any possible way where I can circumvent this error? I read somewhere about a DontCompileHugeMethods option, does it work? Splitting the function doesnt seem to be a viable option to me :( !!

Is there any possible way where I can circumvent this error?
Well, the root cause of this compiler error is that there are hard limits in the format of bytecode files. In this case, the problem is that a single method can consist of at most 65536 bytes of bytecodes. (See the JVM spec).
The only workaround is to split the method.

Split the method in related operations or splitting utilities separately.

Well, the case is a bit different
here, the method only consists of a
single function call. Now this
function has a huge parameter list(the
whole of the parser actually!!). So I
have no clues how to split this!!
The way to split up such a beast could be:
define data holder objects for your parameters (put sets of parameters in objects according to the ontology of your data model),
build those data holder objects in their own context
pass the parameter objects to the function

Quick and Dirty: Assign all your parameters to class variables of the same name (you must rename your parameters) at the beginning of your function and start chopping up your function in pieces and put those pieces in functions. This should guarantee that your function will basically operate with the same semantics.
But, this will not lead to pretty code!

Related

Adding immutable programming rules to the Java language within a program

I'm writing a program in Java. I find that reading and debugging code is easiest when the paradigm techniques are consistent, allowing me very quickly assume where and what a problem is.
Doing this has, as you might guess, made my programming much faster, and so I want to find a way to enforce these rules.
For example, lets say I have a method that makes changes to the state of an object, and returns a value. If the method is called outside of the class, I don't ever want to see it resolve inside parameter parentheses, like this:
somefunction(param1, param2, object.change_and_return());
Instead, I want it to be done like this:
int relevant_variable_name = object.change_and_return();
somefunction(param1, param2, relevant_variable_name);
Another example, is I want to create a base class that includes certain print methods, and I want all classes that are user defined to be derived from that base class, much in the way java has done so.
Within my objects, is there a way I can force myself (and anyone else) to adhere to these rules? Ie. if you try to run code that breaks the rules, it will terminate and return the custom error report. Also, if you write code that breaks the rules, the IDE (I use eclipse) will recognize it as an error, underline and call the appropriate javadoc?

For the check and underline violations part:
You can use PMD, it is a static code analyzer.
It has a default ruleset, and you can write custom rules matching what you need.
However your controls seem to be quite complex to express in "PMD language".
PMD is available in Eclipse Marketplace.
For the crash if not conform part
There see no easy way to do it.
Hard/complex ways could be:
Write a rule within PMD, run the analysis at compile time, parse the report (still at compile time) and return an error if your rule is violated.
Write a Java Agent doing the rule check and make it crash the VM if the rule is violated (not sure it is really feasable, agents are meant for instrumentation).
Use reflection anywhere in your code to load classes, and analyze loaded class against your rules and crash the VM if the rule is violated (seriously don't do this: the code would be ugly and the rule easily bypassable).

Instrument intermediary local method call within a method body

I know (at least using either BCEL, or ASM, for instance), it is possible to somehow access local variables of a method... but, I need something more, what I would like is:
to get the type of such a variable (or a way to convert from the signature)
to know (distinguish) when this variable is used (either sees it value affected, or is passed as parameter)
when this variable is used as parameter, to know which method call it was passed to
to break "method-chains" in their respective method calls and get their return value so I can manipulate them
The basic idea is that I would like to "instrument" methods a bit in the same way a debugger does (though limited to the first frame depth...).
Any pointer appreciated.
If more information need, feel free to ask.

This is only possible using a byte code-level API. cglib does not expose such an API such that you have to choose between ASM, BCEL and Javassist where I would recommend you ASM which has the best documentation.
What you would need to do:
Parse the signature of the method, ASM offers utilities for that. You would get any type by its internal name. You would need to map these names to their index.
Find any use of the variable that is used from that index.
This is however a quite difficult task. In order to predict your code, you would have to emulate the method invocation. The JVM is a stack machine, arguments can be placed on the operand stack as a result of an arbitrary chain of commands. Therefore, you would effectively have to interpret any byte code instruction that you find. You will, more or less, need to write your own simplistic interpreter what is quite a task.

What is a stack map frame

I've recently been looking at The Java Virtual Machine Specifications (JVMS) to try to better understand the what makes my programs work, but I've found a section that I'm not quite getting...
Section 4.7.4 describes the StackMapTable Attribute, and in that section the document goes into details about stack map frames. The issue is that it's a little wordy and I learn best by example; not by reading.
I understand that the first stack map frame is derived from the method descriptor, but I don't understand how (which is supposedly explained here.) Also, I don't entirely understand what the stack map frames do. I would assume they're similar to blocks in Java, but it appears as though you can't have stack map frames inside each other.
Anyway, I have two specific questions:
What do the stack map frames do?
How is the first stack map frame created?
and one general question:
Can someone provide an explanation less wordy and easier to understand than the one given in the JVMS?

Java requires all classes that are loaded to be verified, in order to maintain the security of the sandbox and ensure that the code is safe to optimize. Note that this is done on the bytecode level, so the verification does not verify invariants of the Java language, it merely verifies that the bytecode makes sense according to the rules for bytecode.
Among other things, bytecode verification makes sure that instructions are well formed, that all the jumps are to valid instructions within the method, and that all instructions operate on values of the correct type. The last one is where the stack map comes in.
The thing is that bytecode by itself contains no explicit type information. Types are determined implicitly through dataflow analysis. For example, an iconst instruction creates an integer value. If you store it in slot 1, that slot now has an int. If control flow merges from code which stores a float there instead, the slot is now considered to have invalid type, meaning that you can't do anything more with that value until overwriting it.
Historically, the bytecode verifier inferred all the types using these dataflow rules. Unfortunately, it is impossible to infer all the types in a single linear pass through the bytecode because a backwards jump might invalidate already inferred types. The classic verifier solved this by iterating through the code until everything stopped changing, potentially requiring multiple passes.
However, verification makes class loading slow in Java. Oracle decided to solve this issue by adding a new, faster verifier, that can verify bytecode in a single pass. To do this, they required all new classes starting in Java 7 (with Java 6 in a transitional state) to carry metadata about their types, so that the bytecode can be verified in a single pass. Since the bytecode format itself can't be changed, this type information is stored seperately in an attribute called StackMapTable.
Simply storing the type for every single value at every single point in the code would obviously take up a lot of space and be very wasteful. In order to make the metadata smaller and more efficient, they decided to have it only list the types at positions which are targets of jumps. If you think about it, this is the only time you need the extra information to do a single pass verification. In between jump targets, all control flow is linear, so you can infer the types at in between positions using the old inference rules.
Each position where types are explicitly listed is known as a stack map frame. The StackMapTable attribute contains a list of frames in order, though they are usually expressed as a difference from the previous frame in order to reduce data size. If there are no frames in the method, which occurs when control flow never joins (i.e. the CFG is a tree), then the StackMapTable attribute can be omitted entirely.
So this is the basic idea of how StackMapTable works and why it was added. The last question is how the implicit initial frame is created. The answer of course is that at the beginning of the method, the operand stack is empty and the local variable slots have the types given by the types of the method parameters, which are determined from the method decriptor.
If you're used to Java, there are a few minor differences to how method parameter types work at the bytecode level. First off, virtual methods have an implicit this as first parameter. Second, boolean, byte, char, and short do not exist at the bytecode level. Instead, they are all implemented as ints behind the scenes.

Is Object.class.getName() Slow?

I'm writing code in the Java ME environment, so speed is absolutely an important factor. I have read several places that reflection of any sort (even the very limited amounts that are allowed on java ME) can be a very large bottleneck.
So, my question is this: is doing String.class.getName() slow? What about myCustomObject.getClass().getName()? Is it better to simply replace those with string constants, like "java.lang.String" and "com.company.MyObject"?
In case you're wondering, I need the class names of all primitives (and non-primitives as well) because Java ME does not provide a default serialization implementation and thus I have to implement my own. I need a generic serialization solution that will work for both communication across the network as well as local storage (RMS, but also JSR-75)
Edit
I'm using Java 1.3 CLDC.

String.class.getName() would be not slow because its value will be loaded before executed.i.e compiler will put its value before line will execute.
myCustomObject.getClass().getName() would be bit slower then previous as it will be retrieved at time for execution

Reflection is not unnaturally slow; it's just as slow as you'd expect, but no slower. First, calling a method via reflection requires all the object creation and method calling that is obvious from the reflection API, and second, that if you're calling methods through reflection, Hotspot won't be able to optimize through the calls.
Calling getClass().getName() is no slower than you'd expect, either: the cost of a couple of virtual method calls plus a member-variable fetch. The .class version is essentially the same, plus or minus a variable fetch.

I can't speak for Java ME, but I'm not surprised at the overhead by using reflection on a resource constrained system. I wouldn't think it is unbearably slow, but certainly you would see improvements from hard-coding the names into a variable.
Since you mentioned you were looking at serialization, I'd suggest you take a look into how its done in the Kryo project. You might find some of their methods useful, heck you might even be able to use it in Java ME. (Unfortunately, I have no experience with ME)

Any Java counterpart for `/usr/bin/strip`?

Is there any tool that can remove debug info from Java .class files, just like /usr/bin/strip can from C/C++ object files on Linux?
EDIT: I liked both Thilo's and Peter Mmm's answers: Peter's was short and to the point exposing my ignorance of what ships with JDK; Thilo's ProGuard suggestion is something I'll definitely be checking out anyway for all those extra features it appears to provide. Thank you Thilo and Peter!

ProGuard (which the Android SDK for example ships with to reduce code size), can do all kinds of manipulation to shrink JAR files:
Evaluate constant expressions.
Remove unnecessary field accesses and method calls.
Remove unnecessary branches.
Remove unnecessary comparisons and instanceof tests.
Remove unused code blocks.
Merge identical code blocks.
Reduce variable allocation.
Remove write-only fields and unused method parameters.
Inline constant fields, method parameters, and return values.
Inline methods that are short or only called once.
Simplify tail recursion calls.
Merge classes and interfaces.
Make methods private, static, and final when possible.
Make classes static and final when possible.
Replace interfaces that have single implementations.
Perform over 200 peephole optimizations, like replacing ...*2 by ...<<1.
Optionally remove logging code.
They do not mention removing debug info in that list, but I guess they can also do that.
Update: Yes, indeed:
By default, compiled bytecode still contains a lot of debugging information: source file names, line numbers, field names, method names, argument names, variable names, etc. This information makes it straightforward to decompile the bytecode and reverse-engineer entire programs. Sometimes, this is not desirable. Obfuscators such as ProGuard can remove the debugging information and replace all names by meaningless character sequences, making it much harder to reverse-engineer the code. It further compacts the code as a bonus. The program remains functionally equivalent, except for the class names, method names, and line numbers given in exception stack traces.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.