Java Optimizations

Java Optimizations - java

I am wondering if there is any performance differences between
String s = someObject.toString(); System.out.println(s);
and
System.out.println(someObject.toString());
Looking at the generated bytecode, it seems to have differences. Is the JVM able to optimize this bytecode at runtime to have both solutions providing same performances ?
In this simple case, of course solution 2 seems more appropriate but sometimes I would prefer solution 1 for readability purposes and I just want to be sure to not introduce performances "decreases" in critical code sections.

The creation of a temporary variable (especially something as small as a String) is inconsequential to the speed of your code, so you should stop worrying about this.
Try measuring the actual time spent in this part of your code and I bet you'll find there's no performance difference at all. The time it takes to call toString() and print out the result takes far longer than the time it takes to store a temporary value, and I don't think you'll find a measurable difference here at all.
Even if the bytecode looks different here, it's because javac is naive and your JIT Compiler does the heavy lifting for you. If this code really matters for speed, then it will be executed many, many times, and your JIT will select it for compilation to native code. It is highly likely that both of these compile to the same native code.
Finally, why are you calling System.out.println() in performance-critical code? If anything here is going to kill your performance, that will.

If you have critical code sections that demand performance, avoid using System.out.println(). There is more overhead incurred by going to standard output than there ever will be with a variable assignment.
Do solution 1.
Edit: or solution 2

There is no* code critical enough that the difference between your two samples makes any difference at all. I encourage you to test this; run both a few million times, and record the time taken.
Pick the more readable and maintainable form.
* Exaggerating for effect. If you have code critical enough, you've studied it to learn this.

The generated bytecode is not a good measure of the the performance of an given piece of code, since this bytecode will get analysed, optimised and ( in case of the server compiler ) re-analysed and re-optimised if it is deemed to be a performance bottleneck.
When in doubt, use a profiler.

Compared to output to the console, I doubt that any difference in performance between the two is going to be measurable. Don't optimize before you have measured and confirmed that you have a problem.

Related

Define StringBuilder capacity when number of characters is unknown

I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
My question is, if we don't know the expected size, how should one go about it? Is there a standard value/way to avoid expensive operations under the hood?
If not, is there potentially a way of alarming/logging in the code if the capacity is bigger than the value given upon initialisation?

I'm aware that, for good practice, StringBuilder should be initialised with a capacity value of the expected content. Otherwise, increasing the size after compilation is going to be an expensive operation.
This is a wildly incorrect statement. It is very bad practice to do this. Even if you know exactly how large it'll be.
If I see this code:
StringBuilder sb = new StringBuilder(in1.length() + in2.length() * 3 + loaded ? suffixLen : 0);
Then this is an additional thing to worry about, test, and keep up to date. I would assume if all this is present that for whatever reason somebody did some performance testing and actually figured out that this saves a worthwhile chunk of cycles, and somehow, in a fit of idiocy, neglected to write an enlightening comment and link to the JMH or profiler result analysis to verify this conclusion.
So, I'd either painstakingly attempt to manually analyse precisely if the calculation is still correct after an update to this code, or, I'd fix the problem and add the tests (and then be utterly befuddled, when, of course, the profile review shows this code is utterly inconsequential), or, I'd go through the considerable trouble of writing an assert based test case that will run the entire operation and then verify at the end that the size calculation done at the top is, in fact, correct.
I don't think you fully grasp why the hyperbolic premature optimization is the root of all evil statement is so popular.
Here's the problem. 99% of the system's resources are spent on 1% of the code. That's not an exaggeration; in fact, that is likely understating the issue.
Developer time is not infinite, and even if it was, the programmer's ability to comprehend code and focus on the relevant parts, is limited because, in the end, they are humans. Spending additional code that needs to be parsed and understood by human eyeballs and brains is therefore bad if the code does something irrelevant. We're literally talking about the same order of magnitude as you throwing a glass of water into the ocean down by the seashore in europe and then watching the water levels rise in manhattan. Beyond any and all ability to measure, and utterly incomprehensive. Bold does not do sufficient justice to how little it matters. Even if this code runs 100 million times a day for 15 years, it amounts to perhaps 5 cents in IAAS deployment costs total over that entire decade and a half, and that's if there is even a performance impact, which often there isn't because modern VMs, GCs, OSes, and CPU architectures get up to some crazy shenanigans.
Furthermore, the system optimizes. Optimizers, such as JVM hotspot engines, are in the end pattern matching machines. They find commonly used patterns and recognize how to run them as efficient as possible. By writing code in ways that nobody else does, it is highly unlikely that code is going to actually outperform the common (idiomatic) case. Most likely because it just doesn't matter, and even if it does, because the idiomatic case gets optimized much more readily.
Here is a trivial example:
List<String> someList = new ArrayList<String>();
for (int i = 0; i < 10000; i++) someList.add(someRandomString);
String[] arrayForm = someList.toArray(new String[0]);
Here you may go: Huh, well, we can optimize this code a little bit and pass new String[10000] instead; this saves the system from having to allocate an admittedly small object (a 0-size string array).
You would be wrong. The above code, with the new String[0], is in fact faster. How can that be? Optimizers, with pattern matching. They recognize the pattern and realize that the system can create a new array of the requisite size, not zero it out, and then run the code that fills it. Whereas the optimization patterns do not include the new String[reqSize] variant where the system could in theory also realize it can allocate the array and then omit zeroing out the array (which the JVM spec guarantees, which merely means the spec guarantees you can never observe that it wasn't zeroed out; it doesn't actually mean the JVM must zero it out, that's where the pattern optimization of not doing so is coming from). However, it doesn't do that - not common enough, and somewhat more complicated.
I'm not saying that new StringBuilder() is neccessarilyt faster than new StringBuilder(knownSize). I'm saying it:
99.9% of the time literally does not make one iota of difference. Not a single nanosecond - the speedup is entirely theoretical: No performance test of any stripe can detect the difference. If a tree falls in the forest, and all that.
You have no idea when that 0.1% of the time even is, or if it's not actually straight up never - 0%. Between a CPU that caches and pipelines (did you know modern CPUs cannot access memory? At all? I bet you didn't. The basic von neumann model of how CPUs work? Totally misleads you if you try to performance analyse machine code if you do that) - VMs, garbage collectors (did you know that garbage is free but live objects are expensive? Re-using an object is in fact more expensive that creating a ton of fast garbage.. depending on many factors, of course this too is an oversimplification. That's the real point: This is an intractable thing; you cannot just look at code and jump to conclusions about performance) - you stand no chance to know what's 'faster'.
The only right move is to write code as simple and as clean as you can ('clean' defined as: When you look at it, you jump to conclusions, and these conclusions are correct. It is easy to adjust in the face of changing requirements, and flexible in how it connects to the rest of the codebase). IF (big if!) real life situations result in a performance issue, you first run a profiler so you know the 1% of the code that is in any way relevant, and then you go ham on that, with JMH benchmarks and all sorts of performance experiments to optimize the heck out of it. If your code is clean, that's great, because almost always this requires adjusting how the code that calls into the 'hot path' or where the code flows to out of the 'hot path' - and the cleaner your code the easier that will be.
Needless performance optimization almost invariably reduces flexibility, and makes code harder to understand.
Hence, objectively, micro-optimizing like this just makes your code slower and buggier for literally no benefit. Not even a tiny, almost immeasurable one.
Hence, the advice is silly. The only correct call is new StringBuilder() - no pre-configured size. The one and only excuse you have to write new StringBuilder(presetCapacity) is if there's a lengthy comment that immediately precedes it that lays out in a lot of detail, or links to a ticket, the exact performance study done to indicate this indeed fixes a real performance issue and how to recreate that study, and on what schedule it should be revisited.

Does having a large method name increase latency?

Do longer method names have an increased latency when called? Is this a noticeable effect after a long period of calls?

No, method names have nothing to do with performance. And btw, the JVM doesn't use name the name to invoke the method, it uses it symbolic link, which point to the method name in the constant pool.
invokestatic #2 //Method fn:()V
So even the byte-code doesn't get bloated with lengthy method names.

No. The length of the method name doesn't make a difference.
The fact that you're using a method does, though. More the methods, more the overhead. That's the price we pay for cleaner and more maintainable code.

Just for the sake of it: be careful when diving into "performance" from this "microscopic" point of view.
Of course it is a good thing to understand the effects of using different approaches in your source code (to avoid "stupid mistakes").
But efficiency and performance is something that receives "too much priority" too often. Typically, when you focus on creating a clean, maintainable design, the outcome will have "good enough performance". If not; well a clean, maintainable design is easier to change than a messed-up design anyway.
Keep in mind:
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil." and I agree with this. Its usually not worth spending a lot of time micro-optimizing code before its obvious where the performance bottlenecks are. But, conversely, when designing software at a system level, performance issues should always be considered from the beginning. A good software developer will do this automatically, having developed a feel for where performance issues will cause problems. An inexperienced developer will not bother, misguidedly believing that a bit of fine tuning at a later stage will fix any problems."
What Hoare and Knuth are really saying is that software engineers should worry about other issues (such as good algorithm design and good implementations of those algorithms) before they worry about micro-optimizations such as how many CPU cycles a particular statement consumes.
from here

Which manner of processing two repetative operations will be completed faster?

Generally speaking, is there a significant differance in the processing speeds of these two example segments of code, and if so, which should complete faster? Assume that "processA(int)" and "processB(int)" are voids that are common to both examples.
for(int x=0;x<1000;x++){
processA(x);
processB(x);
}
or
for(int x=0;x<1000;x++){
processA(x);
}
for(int x=0;x<1000;x++){
processB(x);
}
I'm looking to see if I can speed up one of my programs and it involves cycling through blocks of data several times and processing it differant ways. Currantly, it runs a seperate cycle for each processing method, meaning a lot of cycles get run in total, but each cycle does very little work. I was thinking about rewriting my code so that each cycle incorporates every processing method; in other words, much fewer cycles, but each cycle has a heavier workload.
This would be a very intensive rewrite of my program stucture. So unless it would give me a significant performance boost, it won't be worth the trouble.

The first case will be slightly faster then the second case because a for loop in and of itself has an effect on performance. However, the main question you should ask yourself is this: will the effect be of significance to my program? If not, you should opt for clear and readable code.
One thing to remember in such a case is that the JVM (Java Virtual Machine) does a whole lot of optimisation, and in your case the JVM can even get rid of the for-loop and rewrite the code into 1000 successive calls to processA() and processB(). So even if you have two for loops, the JVM can get rid of both, making your program more optimal than even your first case.
To get a basic understanding of method calls, cost, and the JVM, you can read this short article:
https://plumbr.eu/blog/how-expensive-is-a-method-call-in-java

Absolutely, fusing the 2 loops will be faster. (Some compilers do this automatically as an optimization.) How much faster? That depends on how many iterations the loops are running. Unless the number of iterations is very high, you can expect that the improvement will be minimal.

The single loop case will contain fewer instructions so will run faster.
But, unless processA and processB are very quick functions, such a substantial refactoring would give you negligible performance gain.
If this is production code, you should also take care as there may be side effects. You should make alterations in the context of a unit testing framework testing the code in question. (In C++ for example x may be passed by reference and could be modified by the functions! Of course Java has no such hazard but there may be other reasons why all processA functions have to run before all processB functions and the program comments may not make this clear).

The first piece of code is faster, because it only has one for loop.
If, however, you need to do something AFTER processA() has been executed n times and before processB()'s loop starts, then the second option would be ideal.

Does jit optimize switch statements with too few branches?

Recently I bumped into a situation where a static code analysis tool (PMD) complainted about a switch statement that had too few branches. It suggested turning it into an if statement, that I did not wanted to do because I knew that soon more cases will be added. But I wondered if the javac performs such an optimization or not. I decompiled the code using JAD but it showed still a switch. Is it possible that this is optimized runtime by the JIT?
Update: Please do not be misleaded by the context of my question. I'm not asking about PMD, I'm not asking about the need for micro-optimisation, etc. The question is clearly only this: does the current (Oracle 1.6.x) JVM implementation contain a JIT that deals with switches with too few branches or not.

The way to determine how the JIT compiler is optimizing switch statements is either:
read the JIT compiler source code (OpenJDK 6 and 7 are open source), or
run the JVM with the switch that tells it to dump the JIT compiled code for the classes of interest to a file.
Note that like all questions related to performance and optimization, the answer depends on the hardware platform and the JVM vendor and version.
Reference: Disassemble Java JIT compiled native bytecode
If this Question is "mere idle curiosity", so be it.
However, it should also be pointed out that rewriting your code to use switch or if for performance reasons is probably a bad idea and/or a waste of time.
It is probably a waste of time because the chances are that the difference in time (if any) between the original and hand optimized versions will be insignificant.
It is a bad idea because, your optimization may only be helpful for specific hardware and JVM combinations. On others, it may have no effect ... or even be an anti-optimization.
In short, even if you know how the JIT optimizer handles this, you probably shouldn't be taking it into account in your programming.
(The exception of course is when you have a real measurable performance problem, and profiling points to (say) a 3-branch switch as being one of the bottlenecks.)

If you compiled it in debug mode, it is normal that when you decompile it, you still get the switch. Otherwise, any debugging attempt would miss some information such as line number and the original instruction flow.
You could thus try to compile in production mode and see what the decompilation result would be.
However, a switch statement, especially if it is expected to grow, is generally considered as a code smell and should be evaluated as a good candidate for a refactoring.

As for after your clarification on what the question is.
Since this denepnds so strongly on the hardware and the JVM (JVMs using the Java trademark may be developed by companies other than Oracle as long as they adhere to the JVM specification) Id say the only valid method would be to make speed tests.
Cut out a chunk of code, lock it in a loop for a considerable amount of repetitions, check the time before and after execution of the loop. Repeat for both solutions (switch and if)
This may seem simplistic and silly, but it actually works, and is a lot faster than decompiling, reading through bytecode and memory dumps etc.
You have to remember, that Java actually uses Virtual Machines and bytecode. Im pretty sure this is all handled and optimized. We are using high level languages to AVOID such micromanagement and optimization that youre asking about
On a more general note, I think you are trying to optimize a bit too early. If you know there are going to be more cases in that switch, why bother at all? Did you run a profiler? If not, its no use optimizing. "Premature optimization is the root of all evil". You might be optimizing a part of code that actually isnt the bottleneck, incresing code complexity and wasting your own time on writing code that does not contribute in any way.
I dont know what type of app you are making, but a rule of thumb says that clarity is king, and you usually should choose simpler, more elegant, self-documenting solution.

The javac performance almost no optimisations. All the optimisations are performed at runtime using the JIT. Unless you know you have a performance problem, I would assume you don't.
What the PMD is complaining about is clarity. e.g.
if (a == 5) {
// something
} else {
// something else
}
is clearer than
switch(a) {
case 5:
// something
break;
default:
// something else
break;
}

Should I inline long code in a loop, or move it in a separate method?

Assume I have a loop (any while or for) like this:
loop{
A long code.
}
From the point of time complexity, should I divide this code in parts, write a function outside the loop, and call that function repeatedly?
I read something about functions very long ago, that calling a function repeatedly takes more time or memory or like something, I don't remember it exactly. Can you also provide some good reference about things like this (time complexity, coding style)?
Can you also provide some reference book or tutorial about heap memory, overheads etc. which affects the performance of program?

The performance difference is probably very minimal in this case. I would concentrate on clarity rather than performance until you identify this portion of your code to be a serious bottleneck.
It really does depend on what kind of code you're running in the loop, however. If you're just doing a tiny mathematical operation that isn't going to take any CPU time, but you're doing it a few hundred thousand times, then inlining the calculation might make sense. Anything more expensive than that, though, and performance shouldn't be an issue.

There is an overhead of calling a function.
So if the "long code" is fast compared to this overhead (and your application cares about performance), then you should definitely avoid the overhead.
However, if the performance is not noticably worse, it's better to make it more readable, by using a (or better multiple) function.

Rule one of performance optmisation: Measure it.
Personally, I go for readable code first and then optimise it IF NECESSARY. Usually, it isn't necessary :-)
See the first line in CHAPTER 3 - Measurement Is Everything
"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil." - Donald Knuth
In this case, the difference in performance will probably be minimal between the two solutions, so writing clearer code is the way to do it.

There really isnt a simple "tutorial" on performance, it is a very complex subject and one that even seasoned veterans often dont fully understand. Anyway, to give you more of an idea of what the overhead of "calling" a function is, basically what you are doing is "freezing" the state of your function(in Java there are no "functions" per se, they are all called methods), calling the method, then "unfreezing", where your method was before.
The "freezing" essentially consists of pushing state information(where you were in the method, what the value of the variables was etc) on to the stack, "unfreezing" consists of popping the saved state off the stack and updating the control structures to where they were before you called the function. Naturally memory operations are far from free, but the VM is pretty good at keeping the performance impact to an absolute minimum.
Now keep in mind Java is almost entirely heap based, the only things that really have to get pushed on the stack are the value of pointers(small), your place in the program(again small), and whatever primitives you have local to your method, and a tiny bit of control information, nothing else. Furthermore, although you cannot explicitly inline in Java(though Im sure there are bytecode editors out there that essentially let you do that), most VMs, including the most popular HotSpot VM, will do this automatically for you. http://java.sun.com/developer/technicalArticles/Networking/HotSpot/inlining.html
So the bottom line is pretty much 0 performance impact, if you want to verify for yourself you can always run benchmarking and profiling tools, they should be able to confirm it for you.

From a execution speed point of view it shouldn't matter, and if you still believe this is a bottleneck it is easy to measure.
From a development performance perspective, it is a good idea to keep the code short. I would vote for turning the loop contents into one (or more) properly named methods.

Forget it! You can't gain any performance by doing the job of the JIT. Let JIT inline it for you. Keep the methods short for readability and also for performance, as JIT works better with short methods.
There are microptimizations which may help you gain some performance, but don't even think about them. I suggest the following rules:
Write clean code using appropriate objects and algorithms for readability and for performance.
In case the program is too slow, profile and identify the critical parts.
Think about improving them using better objects and algorithms.
As a last resort, you may also consider microoptimizations.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.