Recently someone told me that a comparison involving smaller integers will be faster, e.g.,
// Case 1
int i = 0;
int j = 1000;
if(i<j) {//do something}
// Case 2
int j = 6000000;
int i = 500000;
if(i<j){//do something else}
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
Edit 1: I was thinking of binary representation of i & j, e.g., for i=0, it will be 0 and for j=1000 its 1111101000 (in 32-bit presentation it should be: 22 zeros followed by 1111101000, completely forgot about 32-bit or 64-bit representation, my bad!)
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me. Now the question is how does comparison work for numbers in java? I guess that will also answer why (or why not) any of the cases will be faster.
Edit 2: I am just looking for a detailed explanation, I am not really worried about micro optimizations
If you care about performance, you really only need to consider what happens when the code is compiled to native code. In this case, it is the behaviour of the CPU which matters. For simple operations almost all CPUs work the same way.
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
An int is always 32-bit in Java. This means it always takes the same space. In any case, the size of the data type is not very important e.g. short and byte are often slower because the native word size is 32-bit and/or 64-bit and it has to extract the right data from a larger type and sign extend it.
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me.
The JLS specifies behaviour, not performance characteristics.
Now the question is how does comparison work for numbers in java?
It works the same way it does in just about every other compiled language. It uses a single machine code instruction to compare the two values and another to perform a condition branch as required.
I guess that will also answer why (or why not) any of the cases will be faster.
Most modern CPUs use branch prediction. To keep the CPUs pipeline full it attempt to predict which branch will be taken (before it know it is the right one to take) so there is no break in the instruction executed. WHen this works well the branch has almost no cost. When it mis-predicts, the pipeline can be filled with instructions for a branch which was the wrong guess and this can cause a significant delay as it clears the pipeline and takes the right branch. In the worst case it can mean 100s of clock cycles delay. e.g.
Consider the following.
int i; // happens to be 0
if (i > 0) {
int j = 100 / i;
Say the branch is usually taken. This means the pipeline could be loaded with an instruction which triggers an interrupt (or Error in Java) before it knows the branch will not be taken. This can result in a complex situation which takes a while to unwind correctly.
These are called Mispredicted branches In short a branch which goes the same way every/most of the time is faster, a branch which suddenly changes or is quite random (e.g. in sorting and tree data structures) will be slower.
Int might be faster on 32-bit system and long might be faster on 64-bit system.
Should I bother about it? No you dont code for system configuration you code based on what are your requirements. Micro optinizations never work and they might introduce some unprecedented issues.
Small Integer objects can be faster because java treats them specific.
All Integer values for -128 && i <= 127 are stored in IntegerCache an inner class of Integer
Related
Why does the reverse() method in StringBuffer/StringBuilder classes use bitwise operator?
I would like to know the advantages of it.
public AbstractStringBuilder reverse() {
boolean hasSurrogate = false;
int n = count - 1;
for (int j = (n-1) >> 1; j >= 0; --j) {
char temp = value[j];
char temp2 = value[n - j];
if (!hasSurrogate) {
hasSurrogate = (temp >= Character.MIN_SURROGATE && temp <= Character.MAX_SURROGATE)
|| (temp2 >= Character.MIN_SURROGATE && temp2 <= Character.MAX_SURROGATE);
}
value[j] = temp2;
value[n - j] = temp;
}
if (hasSurrogate) {
// Reverse back all valid surrogate pairs
for (int i = 0; i < count - 1; i++) {
char c2 = value[i];
if (Character.isLowSurrogate(c2)) {
char c1 = value[i + 1];
if (Character.isHighSurrogate(c1)) {
value[i++] = c1;
value[i] = c2;
}
}
}
}
return this;
}
Right shifting by one means dividing by two, I don't think you'll notice any performance difference, the compiler will perform these optimization at compile time.
Many programmers are used to right shift by two when dividing instead of writing / 2, it's a matter of style, or maybe one day it was really more efficient to right shift instead of actually dividing by writing / 2, (prior to optimizations). Compilers know how to optimize things like that, I wouldn't waste my time by trying to write things that might be unclear to other programmers (unless they really make difference). Anyway, the loop is equivalent to:
int n = count - 1;
for (int j = (n-1) / 2; j >= 0; --j)
As #MarkoTopolnik mentioned in his comment, JDK was written without considering any optimization at all, this might explain why they explicitly right shifted the number by one instead of explicitly dividing it, if they considered the maximum power of the optimization, they would probably have wrote / 2.
Just in case you're wondering why they are equivalent, the best explanation is by example, consider the number 32. Assuming 8 bits, its binary representation is:
00100000
right shift it by one:
00010000
which has the value 16 (1 * 24)
In summary:
The >> operator in Java is known as the Sign Extended Right Bit Shift operator.
X >> 1 is mathematically equivalent to X / 2, for all strictly positive value of X.
X >> 1 is always faster than X / 2, in a ratio of roughly 1:16, though the difference might turn out to be much less significant in actual benchmark due to modern processor architecture.
All mainstream JVMs can correctly perform such optimizations, but the non-optimized byte code will be executed in interpreted mode thousand of times before these optimization actually occurs.
The JRE source code use a lot of optimization idioms, because they make an important difference on code executed in interpreted mode (and most importantly, at the JVM launch time).
The systematic use of proven-to-be-effective code optimization idioms that are accepted by a whole development team is not premature optimization.
Long answer
The following discussion try to correctly address all questions and doubts that have been issued in other comments on this page. It is so long because I felt that it was necesary to put emphasis on why some approach are better, rather than show off personal benchmark results, beliefs and practice, where millage might significantly vary from one person to the next.
So let's take questions one at a time.
1. What means X >> 1 (or X << 1, or X >>> 1) in Java?
The >>, << and >>> are collectively known as the Bit Shift operators. >> is commonly known as Sign Extended Right Bit Shift, or Arithmetic Right Bit Shift. >>> is the Non-Sign Extended Right Bit Shift (also known as Logical Right Bit Shift), and << is simply the Left Bit Shift (sign extension does not apply in that direction, so there is no need for logical and arithmetic variants).
Bit Shift operators are available (though with varying notation) in many programming language (actually, from a quick survey I would say, almost every languages that are more or less descendents of the C language, plus a few others). Bit Shifts are fundamental binary operations, and consquently, almost every CPU ever created offer assembly instructions for these. Bit Shifters are also a classic buiding block in electronic design, which, given a reasonable number of transitors, provide its final result in a single step, with a constant and predicatable stabilization period time.
Concretly, a bit shift operator transforms a number by moving all of its bits by n positions, either left or right. Bits that falls out are forgotten; bits that "comes in" are forced to 0, except in the case of the sign extended right bit shift, in which the left-most bit preserve its value (and therefore its sign). See Wikipedia for some graphic of this.
2. Does X >> 1 equals to X / 2?
Yes, as long as the dividend is guaranteed to be positive.
More generally:
a left shift by N is equivalent to a multiplication by 2N;
a logical right shift by N is equivalent to an unsigned integer division by 2N;
an arithmetic right shift by N is equivalent to a non-integer division by 2N, rounded to integer toward negative infinity (which is also equivalent to a signed integer division by 2N for any strictly positive integer).
3. Is bit shifting faster than the equivalent artihemtic operation, at the CPU level?
Yes, it is.
First of all, we can easily assert that, at the CPU's level, bit shifting does require less work than the equivalent arithmetic operation. This is true both for multiplications and divisions, and the reason for this is simple: both integer multiplication and integer division circuitry themselves contains several bit shifters. Put otherwise: a bit shift unit represents a mere fraction of the complexity level of a multiplication or division unit. It is therefore guaranteed that less energy is required to perform a simple bit shift rather than a full arithmetic operation. Yet, in the end, unless you monitor your CPU's electric consumption or heat dissipation, I doubt that you might notice the fact that your CPU is using more energy.
Now, lets talk about speed. On processors with reasonnably simple architecture (that is roughly, any processor designed before the Pentium or the PowerPC, plus most recent processors that do not feature some form of execution pipelines), integer division (and multiplication, to a lesser degree) is generally implemented by iterating over bits (actually group of bits, known as radix) on one of the operand. Each iteration require one CPU cycle, which means that integer division on a 32 bits processor would require (at most) 16 cycles (assuming a Radix 2 SRT division unit, on an hypothetical processor). Multiplication units usually handle more bits at once, so a 32 bits processor might complete integer multiplication in 4 to 8 cycles. These units might use some form of variable bit shifter to quickly jump over sequence of consecutive zeros, and therefore might terminate quickly when multiplying or dividing by simple operands (such as positive power of two); in that case, the arithmetic operation will complete in less cycles, but will still require more than a simple bit shift operation.
Obviously, instruction timing vary between processor designs, but the preceeding ratio (bit shift = 1, multiplication = 4, division = 16) is a reasonable approximation of actual performance of these instructions. For reference, on the Intel 486, the SHR, IMUL and IDIV instructions (for 32 bits, assuming register by a constant) required respectively 2, 13-42 and 43 cycles (see here for a list of 486 instructions with their timing).
What about CPUs found in modern computers? These processors are designed around pipeline architectures that allow the simultaneous execution of several instructions; the result is that most instructions nowaday require only one cycle of dedicated time. But this is misleading, since instructions actually remains in the pipeline for several cycles before being released, during which they might prevent other instructions from being completed. The integer multiplication or division unit remains "reserved" during that time and therefore any further division will be hold back. That is particularly a problem in short loops, where a single mutliplication or division will end up being stalled by the previous invocation of itself that hasn't yet completed. Bit shift instructions do not suffer from such risk: most "complex" processors have access to several bit shift units, and don't need to reserve them for very long (though generally at least 2 cycles for reasons intrinsic to the pipeline architecture). Actually, to put this into numbers, a quick look at the Intel Optimization Reference Manual for the Atom seems to indicates that SHR, IMUL and IDIV (same parameter as above) respectively have a 2, 5 and 57 latency cycles; for 64 bits operands, it is 8, 14 and 197 cycles. Similar latency applies to most recent Intel processors.
So, yes, bit shifting is faster than the equivalent arithmetic operations, even though in some situations, on modern processors, it might actualy makes absolutely no difference. But in most case, it is very significant.
4. Will the Java Virtual Machine will perform such optimization for me?
Sure, it will. Well... most certainly, and... eventually.
Unlike most language compilers, regular Java compilers perform no optimization. It is considered that the Java Virtual Machine is in best position to decide how to optimize a program for a specific execution context. And this indeed provide good results in practice. The JIT compiler acquire very deep understanding of the code's dynamics, and exploit this knowledge to select and apply tons of minor code transforms, in order to produce a very efficient native code.
But compiling byte code into optimized native methods require a lot of time and memory. That is why the JVM will not even consider optimizing a code block before it has been executed thousands of times. Then, even though the code block has been scheduled for optimization, it might be a long time before the compiler thread actualy process that method. And later, various conditions might cause that optimized code block to be discarded, reverting back to byte code interpretation.
Though the JSE API is designed with the objective of being implementable by various vendor, it is incorrect to claim that so is the JRE. The Oracle JRE is provided to other everyone as the reference implementation, but its usage with another JVM is discouraged (actualy, it was forbiden not so long ago, before Oracle open sourced the JRE's source code).
Optimizations in the JRE source code are the result of adopted conventions and optimization efforts among JRE developpers to provide reasonable performances even in situations where JIT optimizations haven't yet or simply can't help. For example, hundreds of classes are loaded before your main method is invoked. That early, the JIT compiler has not yet acquired sufficient information to properly optimize code. At such time, hand made optimizations makes an important difference.
5. Ain't this is premature optimization?
It is, unless there is a reason why it is not.
It is a fact of modern life that whenever a programmer demonstrate a code optimization somewhere, another programmer will oppose Donald Knuth's quote on optimization (well, was it his? who knows...) It is even perceived by many as the clear assertion by Knuth that we should never try to optimize code. Unfortunately, that is a major misunderstanding of Knuth's important contributions to computer science in the last decades: Knuth as actually authored thousand of pages of literacy on practical code optimization.
As Knuth put it:
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
— Donald E. Knuth, "Structured Programming with Goto Statements"
What Knuth qualify as premature optimization are optimizations that require lot of thinking and apply only to non critical part of a program, and have strong negative impact on debugging and maintenance. Now, all of this could be debated for a long time, but let's not.
It should however be understood that small local optimizations, that have been proven to be effective (that is, at least in average, on the overall), that do not negatively affect the overall construction of a program, do not reduce a code's maintainability, and do not require extraneous thinking are not a bad thing at all. Such optimizations are actualy good, since they cost you nothing, and we should not pass up such opportunities.
Yet, and that is the most important thing to remember, an optimization that would be trivial to programers in one context might turn out to be incomprenhendable to programmers in another context. Bit shifting and masking idioms are particularly problematic for that reason. Programmers that do know the idiom can read it and use it without much thinking, and the effectiveness of these optimizations is proven, though generaly insignificant unless the code contains hundreds of occurences. These idioms are rarely an actual source of bugs. Still, programmers unfamilliar with a specific idiom will loose time understanding what, why and how that specific code snippet does.
In the end, either to favor such optimization or not, and exactly which idioms should be used is really a matter of team decision and code context. I personnaly consider a certain number of idioms to be best practice in all situations, and any new programmer joining my team quickly acquire these. Many more idioms are reserved to critical code path. All code put into internal shared code library are treated as critical code path, since they might turns out to be invoked from such critical code path. Anyway, that is my personal practice, and your millage may vary.
It uses (n-1) >> 1 instead of (n-1)/2 to find the middle index of the internal array to be reversed. Bitwise shift operators are usually more efficient than the division operator.
In this method, there's just this expression: (n-1) >> 1. I assume this is the expression you're referring to. This is called right shift/shifting. It is equivalent to (n-1)/2 but it's generally considered faster, more efficient. It's used often in many other languages too (e.g. in C/C++).
Note though that modern compilers will optimize your code anyway even if you use division like (n-1)/2. So there are no obvious benefits of using right shifting. It's more like a question of coding preference, style, habit.
See also:
Is shifting bits faster than multiplying and dividing in Java? .NET?
I've got a little program that is a fairly pointless exercise in simple number crunching that has thrown me for a loop.
The program spawns a bunch of worker threads that do simple mathematical operations. Recently I changed the inner loop of one variant of worker from:
do
{
int3 = int1 + int2;
int3 = int1 * int2;
int1++;
int2++;
i++;
}
while (i < 128);
to something akin to:
int3 = tempint4[0] + tempint5[0];
int3 = tempint4[0] * tempint5[0];
int3 = tempint4[1] + tempint5[1];
int3 = tempint4[1] * tempint5[1];
int3 = tempint4[2] + tempint5[2];
int3 = tempint4[2] * tempint5[2];
int3 = tempint4[3] + tempint5[3];
int3 = tempint4[3] * tempint5[3];
...
int3 = tempint4[127] + tempint5[127];
int3 = tempint4[127] * tempint5[127];
The arrays are populated by random integers no higher than 1025 in value, and the array values do not change.
The end result was that the program ran much faster, though closer examination seems to indicate that the CPU isn't actually doing anything when running the newer version of the code. It seems that the JVM has figured out that it can safely ignore the code that replaced the inner loop after one iteration of the outer loop since it is only redoing the same calculations on the same set of data over and over again.
To illustrate my point, the old code took maybe ~27000 ms to run and noticeably increased the operating temperature of the CPU (it also showed 100% utilization for all cores). The new code takes maybe 5 ms to run (sometimes less) and causes nary a spike in CPU utilization or temperature. Increasing the number of outer loop iterations does nothing to change the behavior of the new code, even when the number of iterations increases by a hundred times or more.
I have another version of the worker that is identical to the one above except that it has a division operation along with the addition and multiplication operations. In its new unrolled form, the division-enabled version is also much faster than it's previous form, but it actually takes a little while (~300 ms on the first run and ~200 ms on subsequent runs, despite warmup, which is a little odd) and produces a profound spike in CPU temperature for its brief run. Increasing the number of outer loop iterations seems to cause the temperature phenomenon to mostly cease after a certain amount of time has passed while running the program, though utilization still shows 100% for all cores. My guess is the JVM is taking much longer to figure out which operations it can safely ignore when handling division operations, and that it is not ignoring all of them.
Short of adding division operations to all my code (which isn't really a fix anyway beyond a certain number of outer loop iterations), is there any way I can get the JVM to stop reducing my code to apparent NOOPs? I've tried several solutions to the problem, such as generating new random values per iteration of the outer loop, going back to simple integer variables with incrementation, and some other nonsense, but none of those solutions have produced desirable results. Either it continues to ignore the series of instructions, or the performance hit from modifications is bad enough that my division-heavy variant actually performs better than the code without division operations.
edit: to provide some context:
i: this variable is an integer that is used for a loop counter in a do/while loop. It is defined in the class file containing the worker code. It's initial value is 0. It is no longer used in the newer version of the worker.
int1/int2: These are integers defined in the class file containing the worker code. Their initial values are both 0. They were used in the old version of the code to provide changing values for each iteration of the internal loop. All I had to do was increment them upward by one per loop iteration, and the JVM would be forced to carry out every operation faithfully. Unfortunately, this loop apparently prevented the use of SIMD. Each time the outer loop iterated, int1 and int2 had their values reset to prevent overflow of int1, int2, or int3 (I have discovered that integer overflow can slow down the code unnecessarily, as can allowing a float to reach Infinity).
tempint4/tempint5: These are references to a pair of integer arrays defined in the main class file for the program (Mathtester. Yes, unimaginative, I know). When the program first starts, there is a short do/while loop that fills each array with random integers randing from 1-1025. The arrays are 128 integers in size. Each array is static, though the reference variables are not. In truth there is no particular reason for me to use the reference variables. They are leftovers from when I was trying to do an array reference swap so that, after each iteration of the outer loop, tempint4 and tempint5 would be referred to the opposite array. It was my hope that the JVM would stop ignoring my code block. For the division-enabled version of the code, this seems to have worked (sort of), since it fundamentally changes the values to be calculated. Swapping tempint4 for tempint5 and vice versa does not change the results of the addition and multiplication operations, so the JVM can still ignore those.
edit: Making tempint4 and tempint5 (since they are only reference variables, I am actually referring to the main arrays, Mathtester.int4 and Mathtester.int5) volatile worked without notably reducing the amount of CPU activity or level or CPU temperature. It did slow down the code a bit, but that is a probable indicator that the JVM was NOOPing more than I knew.
Is there any way I can get the JVM to stop reducing my code to apparent NOOPs?
Yes, by making int3 volatile.
One of the first things when dealing with Java performance that you have to learn by heart is this:
"A single line of Java code means nothing at all in isolation".
Modern JVMs are very complex beasts, and do all kinds of optimization. If you try to measure some small piece of code, the chances are that you will not be measuring what you think you are - it is really complicated to do it correctly without very, very detailed knowledge of what the JVM is doing.
In this case, yes, it's entirely likely that the JVM is optimizing away the loop. There's no simple way to prevent it from doing this, and almost all techniques are fragile and JVM-version specific (because new & cleverer optimizations are developed & added to the JVM all the time).
So, let me turn the question around: "What are you really trying to achieve here? Why do you want to prevent the JVM from optimizing?"
I've seen that JITC uses unsigned comparison for checking array bounds (the test 0 <= x < LIMIT is equivalent to 0 ≺ LIMIT where ≺ treats the numbers as unsigned quantities). So I was curious if it works for arbitrary comparisons of the form 0 <= x < LIMIT as well.
The results of my benchmark are pretty confusing. I've created three experiments of the form
for (int i=0; i<LENGTH; ++i) {
int x = data[i];
if (condition) result += x;
}
with different conditions
0 <= x called above
x < LIMIT called below
0 <= x && x < LIMIT called inRange
0 <= x & x < LIMIT called inRange2
and prepared the data so that the probabilities of the condition being true are the same.
The results should be fairly similar, just above might be slightly faster as it compares against zero. Even if the JITC couldn't use the unsigned comparison for the test, the results for above and below should still be similar.
Can anyone explain what's going on here? It's quite possible that I did something wrong...
Update
I'm using Java build 1.7.0_51-b13 on Ubuntu 2.6.32-54-generic with i5-2400 CPU # 3.10GHz, in case anybody cares. As the results for inRange and inRange2 near 0.00 are especially confusing, I re-ran the benchmark with more steps in this area.
The likely variation in the results of the benchmarks have to do with CPU caching at different levels.
Since primitive int(s) are being used, there is no JVM specific caching going on, as will happen with auto-boxed Integer to primitive values.
Thus all that remains, given minimal memory consumption of the data[] array, is CPU-caching of low level values/operations. Since as described the distributions of values are based on random values with statistical 'probabilities' of the conditions being true across the tests, the likely cause is that, depending on the values, more or less (random) caching is going on for each test, leading to more randomness.
Further, depending on the isolation of the computer (background services/processes), the test cases may not be running in complete isolation. Ensure that everything is shutdown for these tests except the core OS functions and the JVM. Set the JVM memory min/max the same, shutdown any networking processes, updates, etc.
Are you test results the avarage of a number of runs, or did you only test each function once?
One thing I have found are that the first time you run a for loop the JVM will interpret, then each time its run the JVM will optimize it. Therefore the first few runs may get horrible performance, but after a few runs it will be near native performance.
I also figured out that a loop will not be optimized while its running. I have not tested if this applies to just the loop or the whole function. If it only applies to the loop you may get much more performance if you nest in in an inner and outer loop, and work with your data one block at a time. If its the whole function, you will have to place the inner loop in its own function.
Also run the test more than once, if you compare the code you will notice how the JIT optimizes the code in stages.
For most code this gives Java optimal performance. It allows it to skip costly optimization on code that runs rarely and makes code that run often a lot faster. However if you have a code block that runs once but for a long time, it will become horribly slow.
I'd like to have a solid understanding of when (ignoring available memory space) it makes sense to store the result of a comparison instead of recalculating it. What is the tipping point for justifying the time cost incurred by storage? Is it 2, 3, or 4 comparisons? More?
For example, in this particular case, which option (in general) will perform better in terms of speed?
Option 1:
int result = id.compareTo(node.id);
return result > 0 ? 1 : result < 0 ? -1 : 0;
Option 2:
return id.compareTo(node.id) > 0 ? 1 : id.compareTo(node.id) < 0 ? -1 : 0;
I tried to profile the two options myself in order to answer my own question, but I don't have much experience with this sort of performance testing and, as such, would rather get a more definitive answer from someone with either more experience or else a better grasp of the theoretical elements involved.
I know it's not a big deal and that most of the time the difference will be negligible. However, I'm a perfectionist, and I'd really just like to resolve this particular issue so that I can get on with my life, haha.
Additionally, I think the answer is likely to prove enlightening in regards to similar situations I may encounter in the future wherein the difference might very well be significant (such as when the cost of a comparison or memory allocation is either unable to be incurred or else complex enough to cause a real issue concerning performance).
Answers should be relevant to programming with Java and not other languages, please.
I know I've mentioned it a few times already, but PLEASE focus answers ONLY on the SPEED DIFFERENCE! I am well aware that many other factors can and should be taken into account when writing code, but here I want just a straight-forward argument for which is FASTER.
Experience tells me that option 1 should be faster, because you're making just one call to the compare method and storing the result for reuse. Facts that support this belief are that local variables live on the stack and making a method call involves a lot more work from the stack than just pushing a value onto it. However profiling is the best and safest way to compare two implementations.
The first thing to realise is that the java compiler and JVM together may optimise your code how it wishes to get the job done most efficiently (as long as certain rules are followed). Chances are there is no difference in performance, and chances are also that what is actually executed is not what you think it is.
One really important difference however is in debugging: if you put a break point on the return statement for the store-in-variable version, you can see what was returned from the call, otherwise you can't see that in a debugger. Even more handy is when you seemingly uselessly store the value to be returned from the method in a variable, then return it, so you may see what's going to be returned from a method while debugging, otherwise there's no way to see it.
Option 1 cannot be slower than 2, if the compiler optimizes then both could be equal, but then still 1) is more readable, compacter, and better testable.
So there is no argument for Option 2).
If you like you could change to final int result = ....
Although i expect that the compiler is so clever that the final keyword makes no difference in this case, and the final makes the code a bit less readable.
option1 one always preferred one ,because here the real world scenarion
----->ok lets
1) thread exceution at id.compareTo(node.id) > 0 ? 1 , in this process some other thread
changes the value of node.id right after id.compareTo(node.id) > 0 ? 1 before going to
id.compareTo(node.id) < 0 ? -1 : 0 this check , the result not identical?
performance wise option1 has more performance when there is bit of functionality exisist in checking.
When does it make sense to store the result of a comparison versus recalculating a comparison in terms of speed?
Most of the time, micro-optimizations like option #1 versus option #2 don't make any significant difference. Indeed, it ONLY makes a significant performance difference if:
the comparison is expensive,
the comparison is performed a large number of times, AND
performance matters.
Indeed, the chances are that you have alrady spent more time and money thinking about this than will be saved over the entire useful lifetime of the application.
Instead of focussing on performance, you should be focussing on making your code readable. Think about the next person who has to read and modify the code, and make it so that he/she is less likely to misread it.
In this case, the first option is more readable than the second one. THAT is why you should use it, not performance reasons. (Though, if anything, the first version is probably faster.)
I had a job interview today, we were given a programming question, and were asked to solve it using c/c++/Java, I solved it in java and its runtime was 3 sec (the test was more 16000 lines, and the person accompanying us said the running time was reasonable), another person there solved it in c and the runtime was 0.25 sec, so I was wondering, is a factor of 12 normal?
Edit:
As I said, I don't think there was really much room for algorithm variation except maybe in one little thing, anyway, there was this protocol that we had to implement:
A (client) and B (server) communicate according to some protocol p, before the messages are delivered their validity is checked, the protocol is defined by its state and the text messages that can be sent when it is in a certain state, in all states there was only one valid message that could be sent, except in one state where there was like 10 messages that can be sent, there are 5 states and the states transition is defined by the protocol too.
so what I did with the state from which 10 different messages can be sent was storing their string value in an ArrayList container, then when I needed to check the message validity in the corresponding state i checked if arrayList.contains(sentMessageStr); I would think that this operation's complexity is O(n) although I think java has some built-in optimization for this operation, although now that I am thinking about it, maybe I should've used a HashSet container.I suppose the c implementation would have been storing those predefined legal strings lexicographically in an array and implementing a binary search function.
thanks
I would guess that it's likely the jvm took a significant portion of that 3 seconds just to load. Try running your java version on the same machine 5 times in a row. Or try running both on a dataset 500 times as large. I suspect you'll see a significant constant latency for the Java version that will become insignificant when runtimes go into the minutes.
Sounds more like a case of insufficient samples and unequal implementations (and possibly unequal test beds).
One of the first rules in measurement is to establish enough samples and obtain the mean of the samples for comparison. Even a couple of runs of the same program is not sufficient. You need to tax the machine enough to obtain samples whose values can be compared. That's why test-beds need to be warmed up, so that there are little or no variables at play, except for the system under observation.
And of course, you also have different people implementing the same requirement/algorithm in different manners. It counts. Period. Unless the algorithm implementations have been "normalized", obtaining samples and comparing them are the same as comparing apples and watermelons.
I don't think I need to expand on the fact that the testbeds could have been of varying configurations, or under varying loads.
It's almost impossible to say without seeing the code - you may have a better algorithm for example that scales up much better for larger input but has a greater overhead for small input sizes.
Having said that, this kind of 12x difference is roughly what I would expect if you coded the solution using "higher level" constructs such as ArrayLists / boxed objects and the C solution was basically using optimised, low level pointer arithmetic into a pre-allocated memory region.
I'd rather maintain the higher level solution, but there are times when only hand-optimised low level code will do.....
Another potential explanation is that the JIT had not yet warmed up on your code. In general, you need to reach "steady state" (typically a few thousand iterations of every code path) before you will see top performance in JIT-compiled code.
Performance depends on implementation. Without knowing exactly what you code and what your competitor did, it's very difficult to tell exactly what happened.
But let's say for isntance, that you used objects like vectors or whatever to solve the problem and the C guy used arrays[], his implementation is going to be faster than yours for sure.
C code can be translated very efficiently into assembly instructions, while Java on the other hand, relies on a bunch of stuff (like the JVM) that might make the bytecode of your program fatter and probably a little bit slower.
You will be hard pressed to find something that can execute faster in Java than in C. Its true that an order of magnitude is a big difference but in general C is more performant.
On the other hand you can produce a solution to any given problem much quicker in Java (especially taking into account the richness of the libraries).
So at the end of the day, if there is a choice at all, it comes down as a dilemma between performance and productivity.
That depends on the algorithm. Java is of course generally slower than C/C++ as it's a virtual machine but for most common applications its speed is sufficient. I would not call a factor of 12 normal for common applications.
Would be nice if you posted the C and Java codes for comparison.
A factor of 12 can be normal. So could a factor of 1 or 1/2. As some commentators mentioned, a lot has to do with how you coded your solution.
Dont forget that java programs have to run in a jvm (unless you compile to native machine code), so any benchmarks should take that into account.
You can google for 'java and c speed comparisons' for some analysis
Back in the days I'd say that there's nothing wrong with your Java code being 12 times slower. But nowadays I'd rather say that the C guy implemented it more efficiently. Obviously I might be wrong, but are you sure you used proper data structures and well simply coded it well?
Also did you measure the memory usage? This might sound silly, but last year at the uni we had a programming challenge, don't really remember what it was but we had to solve a graph problem in whatever language we wanted - I did two implementations of my algorithm one in C and one in Java, the Java one was ~1,5-2x slower, BUT for instance I knew I didn't have to worry about memory management (I knew exactly how big the input will be and how many test samples will be run from the teacher) so I simply didn't free any memory (which took way too much time in a programme that run for ~1-2seconds on a graph with ~15k nodes, or was it 150k?) so my Java code was better memory wise but it was slower. I also parsed the input myself in C (didn't do that in Java) which saved me really A LOT of time (~20-25% boost, I was amazed myself). I'd say 1,5-2x is more realistic than 12x.
Most likely the algorithm used in the implementation was different.
For instance ( an over simplification ) if you want to add a number N , M number of times one implementation could be:
long addTimes( long n, long m ) {
long r = 0;
long i;
for( i = 0; i < m ; i++ ) {
r += n;
}
return r;
}
And another implementation could simply be:
long addTimes( long n, long m ) {
return n * m;
}
Both, will run mostly the same in Java and C (you don't even have to change the code ) and still, one implementation will run way lot faster than the other.