Unsafe, reflection access vs toCharArray (performance) - java

JDK9 team puts effort into helping us removing non-public dependencies (using jdeps). I am using Unsafe class for faster access to Strings inner char array - without creating new char array. If I want to drop dependency on Unsafe class, I would need to load it dynamically and call Unsafe.getObject and other methods using reflection.
Now I wonder the performances: now when I use reflection with Unsafe, how this matches the String.toCharArray performances? Would it make sense to keep using Unsafe?
I assume JDK >= 7.
EDIT
Yes, I totally know that everyone can write these tests using eg JMH; but it takes a lot of time to measure with different inputs and different VM versions (7,8). So I wonder if someone already did this; as many libraries are using the Unsafe.

There is a chance that there will be no backing char[] array at all in Java 9 version of String, see JEP 254. That is, toCharArray() will be your only option.
Generally you should never use Unsafe APIs unless you are absolutely sure it is neccessary. But since you are asking this question, I guess you are not. On my laptop, toCharArray() completes in 25 nanoseconds for 100-chars string, i.e. I could call this 40 million times a second! Do you really have such kind of workloads?
If absolutely needed, use MethodHandles instead of both Reflection and Unsafe. MethodHandles are as fast as direct field access, but unlike Unsafe they are public, supported and safe API.

Related

Vector is an obsolete Collection

The Inspection reports any uses of java.util.Vector or java.util.hashtable. While still supported, these classes were made obsolete by the JDK 1.2 Collection classes and should probably not be used in new Development....
I have a project in Java which uses vector Everywhere, and I'm using JDK 8 which is the latest one. I want to know if I can run that application on latest java.
And tell if i can use some other keyword for ArrayList like Vector for new java.
First of all, although Vector is mostly obsoleted by ArrayList, it is still perfectly legal to use, and your project should run just fine.
As noted, however, it is not recommended to use. The main reason for this is that all of its methods are synchronized, which is usually useless and could considerably slow down your application. Any local variable that's not shared outside the scope of the method can safely be replaced with an ArrayList. Method arguments, return values and data members should be inspected closely before being replaced with ArrayList, lest you unwittingly change the synchronization semantics and introduce a hard-to-discover bug.

Performance of reflection: quality byte code in JVM

Edit 2:
Does a program with a fully object-oriented implementation give high performance? Most of the framework is written with full power of it. However, reflection is also heavily used to achieve it like for AOP and dependency injection. Use of reflection affects the performance to a certain extent.
So, Is it good practice to use reflection? Is there some alternative to reflection from programming language constructs? To what extent should reflection be used?
Reflection is, in itself and by nature, slow. See this question for more details.
This is caused by a few reasons. Jon Skeet explains it nicely:
Check that there's a parameterless constructor Check the accessibility
of the parameterless constructor Check that the caller has access to
use reflection at all Work out (at execution time) how much space
needs to be allocated Call into the constructor code (because it won't
know beforehand that the constructor is empty)
Basically, reflection has to perform all the above steps before invocation, whereas normal method invocation has to do much less.
The JITted code for instantiating B is incredibly lightweight.
Basically it needs to allocate enough memory (which is just
incrementing a pointer unless a GC is required) and that's about it -
there's no constructor code to call really; I don't know whether the
JIT skips it or not but either way there's not a lot to do.
With that said, there are many cases where Java is not dynamic enough to do what you want, and reflection provides a simple and clean alternative. Consider the following scenario:
You have a large number of classes which represent various items, i.e. a Car, Boat, and House.
They both extend/implement the same class: LifeItem.
Your user inputs one of 3 strings, "Car", "Boat", or "House".
Your goal is to access a method of LifeItem based on the parameter.
The first approach that comes to mind is to build an if/else structure, and construct the wanted LifeItem. However, this is not very scalable and can become very messy once you have dozens of LifeItem implementations.
Reflection can help here: it can be used to dynamically construct a LifeItem object based on name, so a "Car" input would get dispatched to a Car constructor. Suddenly, what could have been hundreds of lines of if/else code turns into a simple line of reflection. The latter scenario would not be as valid on a Java 7+ platform due to the introduction of switch statements with Strings, but even then then a switch with hundreds of cases is something I'd want to avoid. Here's what the difference between cleanliness would look like in most cases:
Without reflection:
public static void main(String[] args) {
String input = args[0];
if(input.equals("Car"))
doSomething(new Car(args[1]));
else if(input.equals("Boat"))
doSomething(new Boat(args[1]));
else if (input.equals("House"))
doSomething(new House(args[1]));
... // Possibly dozens more if/else statements
}
Whereas by utilizing reflection, it could turn into:
public static void main(String[] args) {
String input = args[0];
try {
doSomething((LifeItem)Class.forName(input).getConstructor(String.class).newInstance(args[1]));
} catch (Exception ie) {
System.err.println("Invalid input: " + input);
}
}
Personally, I'd say the latter is neater, more concise, and more maintainable than the first. In the end its a personal preference, but that's just one of the many cases where reflection is useful.
Additionally, when using reflection, you should attempt to cache as much information as possible. In other words employ simple, logical things, like not calling get(Declared)Method everywhere if you can help it: rather, store it in a variable so you don't have the overhead of refetching the reference whenever you want to use it.
So those are the two extremes of the pro's and con's of reflection. To sum it up if reflection improves your code's readability (like it would in the presented scenario), by all means go for it. And if you do, just think about reducing the number of get* reflection calls: those are the easiest to trim.
While reflection is most expensive than "traditional code", premature optimization is the root of all evil. From a decade-long empirical evidence, I assume that a method invoked via reflection will hardly affect performance unless it is invoked from a heavy loop, and even so there have been some performance enhancements on reflection:
Certain reflective operations, specifically Field, Method.invoke(),
Constructor.newInstance(), and Class.newInstance(), have been
rewritten for higher performance. Reflective invocations and
instantiations are several times faster than in previous releases
Enhancements in J2SDK 1.4 -
Note that method lookup (i.e. Class.getMethod) is not mentioned above, and choosing the right Method object usually requires additional steps such as traversing the class hierarchy while asking for the "declared method" in case that it is not public), so I tend to save the found Method in a suitable map whenever it is possible, so that the next time the cost would be only that of a Map.get() and Method.invoke(). I guess that any well-written framework can handle this correctly.
One should also consider that certain optimizations are not possible if reflection is used (such as method inlining or escape analysis. Java HotSpotâ„¢ Virtual Machine Performance Enhancements). But this doesn't mean that reflection has to be avoided at all cost.
However, I think that the decision of using reflection should be based in other criteria, such as code readability, maintainability, design practices, etc. When using reflection in your own code (as opposed to using a framework that internally uses reflection), one risk transforming compile-time errors into run-time errors, which are harder to debug. In some cases, one could replace the reflective invocation by a traditional OOP pattern such as Command or Abstract Factory.
I can give you one example (but sorry, I can't show you the test results, because it was few months ago). I wrote an XML library (custom project oriented) which replaced some old DOM parser code with classes + annotations. My code was half the size of the original. I did tests, and yes, reflection was more expensive, but not much (something like 0.3 seconds out of 14-15 seconds of executing (loss is about 2%)). In places, where code is executed infrequently, reflection can be used with a small performance loss.
Moreover, I am sure, that my code can be improved for better performance.
So, I suggest these tips:
Use reflection if you can do it in a way that is beautiful, compact & laconic;
Do not use reflection if your code will be executed many-many times;
Use reflection, if you need to project a huge amount of information from another source (XML-files, for example) to Java application;
The best usage for reflections and annotations is where code is executed only once (pre-loaders).

Is Object.class.getName() Slow?

I'm writing code in the Java ME environment, so speed is absolutely an important factor. I have read several places that reflection of any sort (even the very limited amounts that are allowed on java ME) can be a very large bottleneck.
So, my question is this: is doing String.class.getName() slow? What about myCustomObject.getClass().getName()? Is it better to simply replace those with string constants, like "java.lang.String" and "com.company.MyObject"?
In case you're wondering, I need the class names of all primitives (and non-primitives as well) because Java ME does not provide a default serialization implementation and thus I have to implement my own. I need a generic serialization solution that will work for both communication across the network as well as local storage (RMS, but also JSR-75)
Edit
I'm using Java 1.3 CLDC.
String.class.getName() would be not slow because its value will be loaded before executed.i.e compiler will put its value before line will execute.
myCustomObject.getClass().getName() would be bit slower then previous as it will be retrieved at time for execution
Reflection is not unnaturally slow; it's just as slow as you'd expect, but no slower. First, calling a method via reflection requires all the object creation and method calling that is obvious from the reflection API, and second, that if you're calling methods through reflection, Hotspot won't be able to optimize through the calls.
Calling getClass().getName() is no slower than you'd expect, either: the cost of a couple of virtual method calls plus a member-variable fetch. The .class version is essentially the same, plus or minus a variable fetch.
I can't speak for Java ME, but I'm not surprised at the overhead by using reflection on a resource constrained system. I wouldn't think it is unbearably slow, but certainly you would see improvements from hard-coding the names into a variable.
Since you mentioned you were looking at serialization, I'd suggest you take a look into how its done in the Kryo project. You might find some of their methods useful, heck you might even be able to use it in Java ME. (Unfortunately, I have no experience with ME)

Memory efficient, low overhead replacement for String in Java

After reading answers on this old question, I'm a bit curious to know if there are any frameworks now, that provide for storing large no.(millions) of small size(15-25 chars long) Strings more efficiently than java.lang.String.
If possible I would like to store represent the string using byte[] instead of char[].
My String(s) are going to be constants & I don't really require numerous utility methods as provided by java.lang.String class.
Java 6 does this with -XX:+UseCompressedStrings which is on by default in some updates.
Its not in Java 5.0 or 7. It is still listed as on by default, but its not actually supported in Java 7. :P
Depending on what you want to do you could write your own classes, but if you only have a few 100 MBs of Strings I suspect its not worth it.
Most likely this optimization is not worth the effort and complexity it brings with it. Either live with what the VM offers you (as Peter Lawrey suggests), or go through great lengths to work your own solution (not using java.lang.String).
There is an interface CharSequence your own String class could implement. Unfortunately very few JRE methods accept a CharSequence, so be prepared that toString() will need to be used frequently on your class if you need to pass any of your 'Strings' to any other API.
You could also hack String to create your Strings in a more memory efficient (and less GC friendly way). String has a (package access level) constructor String(offset, count, char[]) that does not copy the chars but just takes the char[] as direct reference. You could put all your strings into one big char[] array and construct the strings using reflection, this would avoid much of the overhead normally introduced by the char[] array in a string. I can't really recommend this method, since it relies on JRE private functionality.

Extending ByteBuffer class

Is there any way to create class that extends ByteBuffer class?
Some abstract methods from ByteBuffer are package private, and if I create package java.nio, security exception is thrown.
I would want to do that for performance reasons - getInt for example has about 10 method invocations, as well as quite a few if's. Even if all checks are left, and only method calls are inlined and big/small endian checks are removed, tests that I've created show that it can be about 4 times faster.
You cant extend ByteBuffer and thanks God for.
You cant extend b/c there are no protected c-tors. Why thank god part? Well, having only 2 real subclasses ensures that the JVM can Heavily optimizes any code involving ByteBuffer.
Last, if you need to extend the class for real, edit the byte code, and just add protected attribute the c-tor and public attribute to DirectByteBuffer (and DirectByteBufferR). Extending the HeapBuffer serves no purposes whatsoever since you can access the underlying array anyways
use -Xbootclasspath/p and add your own classes there, extend in the package you need (outside java.nio). That's how it's done.
Another way is using sun.misc.Unsafe and do whatever you need w/ direct access to the memory after address().
I would want to do that for
performance reasons - getInt for
example has about 10 method
invocations, as well as quite a few
if's. Even if all checks are left, and
only method calls are inlined and
big/small endian checks are removed,
tests that I've created show that it
can be about 4 times faster.
Now the good part, use gdb and check the truly generated machine code, you'd be surprised how many checks would be removed.
I can't imagine why a person would want to extend the classes. They exist to allow good performance not just OO polymorph execution.
edit:
How to declare any class and bypass Java verifier
On Unsafe: Unsafe has 2 methods that bypass the verifier and if you have a class that extends ByteBuffer you can just call any of them. You need some hacked version (but that's super easy) of ByteBuffer w/ public access and protected c-tor just for the compiler.
The methods are below. You can use 'em on your own risk. After you declare the class like that you can even use it w/ new keyword (provided there is a suitable c-tor)
public native Class defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain);
public native Class defineClass(String name, byte[] b, int off, int len);
You can disregard protection levels by using reflection, but that kinda defeats the performance goal in a big way.
You can NOT create a class in the java.nio package - doing so (and distributing the result in any way) violates Sun's Java license and could theoretically get you into legal troubles.
I don't think there's a way to do what you want to do without going native - but I also suspect that you're succumbing to the temptation of premature optimization. Assuming that your tests are correct (which microbenchmarks are often not): are you really sure that access to ByteBuffer is going to be the performance bottleneck in your actual application? It's kinda irrelevant whether ByteBuffer.get() could be 4 times faster when your app only spends 5% of its time there and 95% processing the data it's fetched.
Wanting to bypass all checks for the sake of (possibly purely theoretical) performance does not sound a good idea. The cardinal rule of performance tuning is "First make it work correctly, THEN make it work faster".
Edit: If, as stated in the comments, the app actually does spend 20-40% of its time in the ByteBuffer methods and the tests are correct, that means a speedup potential of 15-30% - significant, but IMO not worth starting to use JNI or messing with the API source. I'd try to exhaust all other options first:
Are you using the -server VM?
Could the app be modified to make fewer calls to ByteBuffer rather than trying to speed up those it does make?
Use a profiler to see where the calls are coming from - perhaps some are outright unnecessary
Maybe the algorithm can be modified, or you can use some sort of caching
ByteBuffer is abstract so, yes, you can extend it... but I think what you want to do is extend the class that is actually instantiated which you likely cannot. It could also be that the particular one that gets instantiated overrides that method to be more efficient than the one in ByteBuffer.
I would also say that you are likely wrong in general about all of that being needed - perhaps it isn't for what you are testing, but likely the code is there for a reason (perhaps on other platforms).
If you do believe that you are correct on it open a bug and see what they have to say.
If you want to add to the nio package you might try setting the boot classpath when you call Java. It should let you put your classes in before the rt.jar ones. Type java -X to see how to do that, you want the -Xbootclasspath/p switch.
+50 bounty for a way to circumvent the access restriction (tt cannot be
done using reflection alone. Maybe
there is a way using sun.misc.Unsafe
etc.?)
Answer is: there is no way to circumvent all access restrictions in Java.
sun.misc.Unsafe works under the authority of security managers, so it won't help
Like Sarnum said:
ByteBuffer has package private
abstract _set and _get methods, so you
couldn't override it. And also all the
constructors are package private, so
you cannot call them.
Reflection allows you to bypass a lot of stuff, but only if the security manager allows it. There are many situations where you have no control on the security manager, it is imposed on you. If your code were to rely on fiddling with security managers, it would not be 'portable' or executable in all circumstances, so to speak.
The bottom line of the question is that trying to override byte buffer is not going to solve the issue.
There is no other option than implementing a class yourself, with the methods you need. Making methods final were you can will help the compiler in its effort to perform optimizations (reduce the need to generate code for runtime polymorphism & inlining).
The simplest way to get the Unsafe instances is via reflection. However if reflection is not available to you, you can create another instance. You can do this via JNI.
I tried in byte code, to create an instance WITHOUT calling a constructor, allowing you create an instance of an object with no accessible constructors. However, this id not work as I got a VerifyError for the byte code. The object has to have had a constructor called on it.
What I do is have a ParseBuffer which wraps a direct ByteBuffer. I use reflection to obtain the Unsafe reference and the address. To avoid running off the end of the buffer and killing the JVM, I allocate more pages than I need and as long as they are not touched no physical memory will be allocated to the application. This means I have far less bounds checks and only check at key points.
Using the debug version of the OpenJDK, you can see the Unsafe get/put methods turn into a single machine code instruction. However, this is not available in all JVM and may not get the same improvement on all platforms.
Using this approach I would say you can get about a 40% reduction in timings but comes at a risk which normal Java code does not have i.e. you can kill the JVM. The usecase I have is an object creation free XML parser and processor of the data contained using Unsafe compared with using a plain direct ByteBuffer. One of the tricks I use in the XML parser is to getShort() and getInt() to examine multiple bytes at once rather than examining each byte one at a time.
Using reflection to the the Unsafe class is an overhead you incurr once. Once you have the Unsafe instance, there is no overhead.
A Java Agent could modify ByteBuffer's bytecode and change the constructor's access modifier. Of course you'd need to install the agent at the JVM, and you still have to compile get your subclass to compile. If you're considering such optimizations then you must be up for it!
I've never attempted such low level manipulation. Hopefully ByteBuffer is not needed by the JVM before your agent can hook into it.
I am answering the question you WANT the answer to, not the one you asked. Your real question is "how can I make this go faster?" and the answer is "handle the integers an array at a time, and not singly."
If the bottleneck is truly the ByteBuffer.getInt() or ByteBuffer.getInt(location), then you do not need to extend the class, you can use the pre-existing IntBuffer class to grab data in bulk for more efficient processing.
int totalLength = numberOfIntsInBuffer;
ByteBuffer myBuffer = whateverMyBufferIsCalled;
int[] block = new int[1024];
IntBuffer intBuff = myBuffer.asIntBuffer();
int partialLength = totalLength/1024;
//Handle big blocks of 1024 ints at a time
try{
for (int i = 0; i < partialLength; i++) {
intBuff.get(block);
// Do processing on ints, w00t!
}
partialLength = totalLength % 1024; //modulo to get remainder
if (partialLength > 0) {
intBuff.get(block,0,partialLength);
//Do final processing on ints
}
} catch BufferUnderFlowException bufo {
//well, dang!
}
This is MUCH, MUCH faster than getting an int at a time. Iterating over the int[] array, which has set and known-good bounds, will also let your code JIT much tighter by eliminating bounds checks and the exceptions ByteBuffer can throw.
If you need further performance, you can tweak the code, or roll your own size-optimized byte[] to int[] conversion code. I was able to get some performance improvement using that in place of the IntBuffer methods with partial loop unrolling... but it's not suggested by any means.

Categories