{
public static void main(String[] args) {
int a = 10;
if (a == a--)
System.out.println("first\t");
a=10;
if(a==--a)
System.out.println("second\t");
}
}
For the java program, output is "first", whereas for the C/C++ program its "second". The functionality of post/pre fix operations are same in both programs to my knowledge. If anyone can shed some light on the logic, it would be great as I am new to coding.
int main()
{
int a = 10;
if(a==a--)
printf("first\t");
a=10;
if(a==--a)
printf("second\t");
}
In java, you get the guarantee that you'll always observing this code printing First, and never Second.
In C/C++, you get no such guarantees. Depending on compiler, architecture, OS, and phase of the moon, it will likely print either only First or Second but I'm pretty sure the C and C++ spec make it 'legal' for a compiler/architecture/OS/phase-of-moon combo to end up printing BOTH First and Second or even neither.
See order of evaluation rules for C++: Given some binary operator construct: a x b where a and b are expressions and x is some binary operator, then first a and b must be evaluated, and then the x operator is applied to the values so obtained. Unless the operator explicitly decrees an order (which for example the || and && operators do; they promise to short-circuit, that is, to not evaluate b at all, if a is such b cannot affect the result) - then a C(++) compiler is free to emit code such that b is evaluated before a is, or vice versa.
C is filled to the brim with such 'shoulds' and 'mays': The C spec is designed to allow C code to compile on a wide variety of chips with a ton of leeway for a compiler to apply far-reaching optimizations. It goes so far that simple primitive data types have an unspecified bitwidth.
Contrast to java, where almost everything is locked down: There are very few aspects of java code which are intentionally left unspecified, and the compiler is 'on rails' and is very very limited in what bytecode it is allowed to emit (in java, the optimizations are left to the runtime / hotspot compiler, not to javac).
That's why, on java, the spec DOES define explicitly precisely how a x b ought to be resolved: the java spec does decree that regardless of operator, a must always be evaluated before b is evaluated (unless, just like in C, b isn't evaluated at all due to short-circuit rules).
Going all the way back to the Java Language Specification v7, the spec explicitly dictates the left hand side MUST be evaluated first - and this hasn't changed since then (and I'm pretty sure was true since java 1.0, for what it's worth. It's probably chapter 15.7.1 in most JLS versions).
In C and C++, the behavior of the expressions a == a-- and a == --a is undefined:
6.5 Expressions
...
2 If a side effect on a scalar object is unsequenced relative to either a different side effect
on the same scalar object or a value computation using the value of the same scalar
object, the behavior is undefined. If there are multiple allowable orderings of the
subexpressions of an expression, the behavior is undefined if such an unsequenced side
effect occurs in any of the orderings.84)
84) This paragraph renders undefined statement expressions such asi = ++i + 1;
a[i++] = i;
while allowingi = i + 1;
a[i] = i;
C 2011 Online Draft
C does not force left-to-right evaluation of relational and equality expressions, and it does not require that the side effect of the -- and ++ operators be applied immediately after evaluation. The result of a == a-- can vary based on the compiler, hardware, even the surrounding code.
Java, OTOH, does force left-to-right evaluation and applies side effects immediately after evaluation, so the result is consistent and well-defined.
In C (as the code in the second example seems to be),
The evaluation order of the subexpressions is unspecified, and
There is no sequence point between the operands of the == operator.
Thus, If the increment of a in if (a == a++) or decrement of a in if (a == a--) happens before the comparison to a itself is not defined.
The result invokes undefined behavior. It could give different results already at the next execution on the same machine.
In contrary, Java seems to define this behavior.
Related
I am learning Java and am experimenting with the unary operators --expr, expr--, and -expr.
In class, I was told that --3 should evaluate to 3. I wanted to test this concept in the following assignments:
jshell> int t = 10;
t ==> 10
| created variable t : int
jshell> int g = -3;
g ==> -3
| created variable g : int
jshell>
jshell> int d = --3;
| Error:
| unexpected type
| required: variable
| found: value
| int d = --3;
| ^
jshell> int d = --t;
d ==> 9
| created variable d : int
jshell> int f = d---t;
f ==> 0
| created variable f : int
jshell> int f = 1---t;
| Error:
| unexpected type
| required: variable
| found: value
| int f = 1---t;
| ^
| update overwrote variable f : int
My questions:
Why does assigning -3 work and not --3? I thought --3 would give 3.
Are there cases where --expr can be evaluated as double negation instead of decrement?
Why can't values suffice where the unexpected type errors were thrown?
How did Java evaluate d---t? Also, in what order?
For question 4, the way I thought of it was a right-to-left evaluation. So, if d = 9 and t is 9, the rightmost - operator is the first to act on t, making its value -9. Then the same for the second -, so then t's value becomes 9 again. Then I thought the compiler would notice that d is next to the leftmost operator and subtract the values. This would be 9-9, which evaluates to 0. Jshell shows the expression also evaluated to 0, but I want to make sure my reasoning is correct or can be improved.
-- is taken as the decrement operator. Adding spaces or using brackets will allow it to be interpreted as double negation.
int x = - -3;
//or
int x = -(-3);
Why does assigning -3 work and not --3? I thought --3 would give 3.
Java tokenizes the input. A source file consists of a sequence of relevant atomary units, which we shall call words. In y +foobar, we have 3 'words': y, +, and foobar. Tokenizing is the job of splitting them up.
This process is somewhat complicated; whitespace (obviously) separates things, but whitespace isn't neccessary, if the two 'words' don't share any legal characters. Thus, 5+2 is legal as is 5 + 2, and public static works, but publicstatic does not. This tokenization step occurs first and is done on limited knowledge. You could in theory surmize from context that, say, publicstatic void foo() {} can't really mean anything else, but the amount of knowledge you need to draw that conclusion is quite complicated, and tokenization just does not have it. Hence, publicstatic does not work, but 5+2 does.
Based on that rule on tokenization, int y = --3; and int y = - -3 is different for the same reason publicstatic and public static is different. -- is a single 'word', meaning: Unary decrement. You can't split it up (you can't put spaces in between the two minus signs), and 2 consecutive minus signs without spaces in between is going to be tokenized as the unary decrement operator, and not as 2 consecutive binary-minus/unary-negative words. You COULD draw the conclusion that int y = --3; only has one non-stupid interpretation (2 minus signs), because the other obvious interpretation (unary decrement the expression '3') is a compiler error: You can't decrement a constant. But, that goes back to the earlier rule: If you have to take into account all complications at all times, parsing java source files is incredibly complicated, and for no meaningful gain: You really do not want to write source code where things are interpreted correctly or not based on exactly how intelligent the compiler ended up being. It aint english, you want consistency and clarity at all times. Poetic license is not a good thing, when talking directly to computers.
CONCLUSION: --3 does not work. It never can. Whomever informed you just messed up, or you misread it, and they were talking about - -3 and you didn't notice the space, or it got lost in translation.
Are there cases where --expr can be evaluated as double negation instead of decrement?
Not like that, no. - -expr will, -(-expr) will, but -- written just like that, no spaces, no parens, nothing in between? No. Because of the tokenizer.
Why can't values suffice where the unexpected type errors were thrown?
Because there'd be absolutely no point to this, and makes javac (the part that turns your bag-o-characters into a tree structure and from there, into a class file) a few orders of magnitude more complicated. It's a computer, not english. You don't want 'best effort', and the few languages that do this (javascript, PHP) are incredibly stupid languages, universally derided, for trying (these languages are still popular, but that's despite the property they try their best instead of just having a clear spec and failing when the programmer fails to adhere to it - you'll find plenty of talks for such languages, by fans of it, making fun of the corner cases. In javascript, there's the famous WAT talk. For java there is the java puzzlers book. Both talks/books designed to teach something by showing off how you can use the language to write idiotic code that is nevertheless hard to read. Written by fans of these languages. It's a bit much to try to give you some sort of proof that you do not want a language to take its best wild stab in the dark at what you meant, but hopefully a few popular books and talks will go some way in making you realize it works like this.
How did Java evaluate d---t? Also, in what order?
Now we're getting into specifics. Java's tokenizer is such that it will tokenize that as d -- - t or d - -- t, and that's because --- is not a known word, and the java tokenizer is based on splitting things up by applying a list of known symbols and keywords to the job.
So which one does it tokenize to? It doesn't matter. Why do you want to know? So that you can write it? Don't - what possible purpose does that serve? If you write intentionally obfuscated code, you will be fired, and it'll be justified. If you are trying to train yourself to find that code perfectly readable, that's great! But less than 1% of your average java coder, even expert ones, do this, so now you've written code nobody can read, and you find code others wrote aggravating because nobody writes it like you do. So, you're unemployable as a programmer / you can't employ anybody to help you. You can build an entire career writing software on your own for the rest of your life, but it's quite limiting. Why bother?
It gets tokenized into d - -- t or d -- - t.
If you find this sort of thing entertaining (I certainly do!), then know full well this is no more useful than playing a silly game, there is no academic or career point to it, at all.
The problem is, you don't seem to find it entertaining. If you did, you'd have done a trivial experiment to find out. If it's d -- - t (subtract X and Y, where X is d, post-unary-decrement, and Y is t), then after all that, d will be one smaller, which you can trivially test for. If it's d - --t, then t would be one smaller afterwards. If it's d - -(-t) (as in, d minus Z, where Z is negative Y, where Y is negative t), then neither will have changed. You didn't do this experiment. That eliminates 'you find it fun'. I've eliminated 'this is useful', which leaves us with: This question does not matter, therefore neither does the answer.
There is a small chance this question shows up in a curriculum of some sort. If it does, do yourself a giant favour and find another curriculum. It's an incredibly strong indicator of extremely low quality java teaching, if you expect your students to know how d---t breaks down.
You talk about 'reasoning', but reasoning has no place here. d -- - t, d - -- t and d - -(-t) are all equally valid interpretations, reason isn't going to tell you which one is correct. The language designers threw some dice to decide. They knew it didn't matter (Except for the last one, which isn't possible without a 'smart' tokenizer, and you don't want one of those, but the reasoning needed to draw the conclusion that you need a smart tokenizer, let alone the reasoning needed to draw the conclusion that smart tokenizers are a bad idea, is incredibly complicated and requires a ton of experience writing parsers from scratch, or possibly a mere full year's worth of parser language courses at a uni level might give you the wherewithal to figure that one out - I think you can be excused for not picking up that detail :P).
The ternary operator normally just is a subject to philosophical discussions:
whether
a=b>5?1:0;
is more readable, faster, cooler to
if(b>5) { a=1; } else {a=0;}
(take or leave the curly braces) I normally don't care. I like my ternary operator. But we had a discussion concerning this piece of code:
BigObject myBigObject=null;
...
do {
myBigObject=
myBigObject==null?
createBigObject():
myBigObject;
...
} while(manyIteration);
Colleague claimed that this construct will create the myBigObject will be copied every loop (except the first) which will waste precious time and memory
and that he found the case where the ternary operator is useless. the only way is:
do {
if(myBigObject==null)
myBigObject=createBigObject();
...
} while(manyIteration);
I argued that the clever compiler will see that the object is assigned to itself and will optimize it out.
But who is right?
The definite answer lies in section 15.25 of the JLS (emphasis mine):
The resulting boolean value is then used to choose either the second or the third operand expression:
- If the value of the first operand is true, then the second operand expression is chosen.
- If the value of the first operand is false, then the third operand expression is chosen.
The chosen operand expression is then evaluated and the resulting value is converted to the type of the conditional expression as determined by the rules stated below.
This conversion may include boxing or unboxing conversion (§5.1.7, §5.1.8).
The operand expression not chosen is not evaluated for that particular evaluation of the conditional expression.
This means both expressions are not always evaluated: only the one that needs to be is. So actually, none of you are right.
You're wrong because it is not the compiler being clever: it is specified by the language itself;
Your collegue is wrong because the expression won't be evaluated if need not be.
In the code
myBigObject = myBigObject == null ? createBigObject() : myBigObject;
^-----------------^ ^---------------^
this is true the 1st time, hence that ^ is evaluated
myBigObject = myBigObject == null ? createBigObject() : myBigObject;
^-----------------^
this is false the 2nd time, hence that ^ is NOT evaluated, that ^ is
Notice that what is executed is just assigning myBigObject to itself, which does not create a new object.
Colleague claimed that this construct will create the myBigObject will be copied every loop (except the first) which will waste precious time and memory and that he found the case where the ternary operator is useless.
You should note that myBigObject is a reference to an object. This means it is 4 bytes on most JVMs and a redundant copy isn't going to make a big difference.
I argued that the clever compiler will see that the object is assigned to itself and will optimize it out.
Possibly, though I don't see it will make much difference either way.
While a compiler can remove redundant code, it is harder for a human to remove redundant code. In general we code with a purpose. When code has no purpose it is much harder to come to that conclusion. I suggest you avoid confusing anyone who has to read/maintain the code in the future.
As ever, when discussing performance, you should first consider clarity. How surprising in this construct and how easy is it to maintain.
The cost of a redundant assignment (which may or may not be optimised away) is nothing compared with the cost of even an easy to fix bug (never mind a hard to fix one)
You should focus on what is clearer (which is subjective) and not worry about micro-tuning the code unless you have a profiler which indicates this line is a performance issue.
For me, this is clearer, but even if it were slower, I would still argue it is clearer and easier to understand what it is doing.
if (myBigObject == null)
myBigObject = createBigObject();
The fact you have already had to have a discussion but what that code is really doing, means you have spent more time than you can possibly recover.
In short, developer efficiency is usally more important than computer efficiency.
Your colleague is wrong. After the first iteration myBigObject is no longer null and hence won't be created. Sample code proves this...
public static void main(String[] args) {
Object myBigObject=null;
int i=0;
do {
System.out.println("Iteration " + i);
myBigObject=
myBigObject==null?
createBigObject():
myBigObject;
} while(i++ < 10);
}
private static Object createBigObject() {
System.out.println("Creating bigObject");
return new Object();
}
The output of which is
Iteration 0
Creating bigObject
Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
Iteration 6
Iteration 7
Iteration 8
Iteration 9
Iteration 10
See how the create statement is only printed once
Why does your colleague think that?
Only exactly one branch of a ternary conditional is ever evaluated. Imagine if it were not the case? So much code of the form obj == null ? null : obj.someproperty() would break!
So your object will only be created once.
I want to port a crypto function from C to Java. The function has to run in constant time, so no conditional branchings (and no table lookups based on x) are allowed.
The original C code is:
int x,result;
...
result = (x==7);
...
So that 'result' is set to 1 if 'x==7' and to 0 otherwise. The 'result' variable is then used in further computations.
I am now looking for the best way to transpose this to Java. As in Java expressions evaluate to booleans and not to integers, one has to simulate the above using operators.
I currently use
int x,result;
...
result = (1<<(x-7))&1;
...
which works fine for me, as my x is in the range {0,...,15}. (Note that the shift function uses only the lower 5 bits, so that you will get false positives when x is too large.)
The expression will be evaluated millions of times, so if it there is for instance a clever solution that uses only 2 operators instead of 3, this would make the overall computation faster.
The best option as noted by #Hosch250 is ternary operator. Let's take a look at the assembler generated by JIT compiler for this method:
public static int ternary(int x) {
return x == 7 ? 1 : 0;
}
It actually depends on branch profiling. When your x has value 7 quite often, it's compiled like this:
xor %r11d,%r11d
mov $0x1,%eax
cmp $0x7,%edx
cmovne %r11d,%eax ;*ireturn
; - Test::ternary#11 (line 12)
See that ternary was replaced with cmovne which is not the branch instruction.
On the other hand if you pass 7 in very rare cases (e.g. once in 5000 calls), then branch is here:
cmp $0x7,%edx
je <slowpath> ;*if_icmpne
; - Test::ternary#3 (line 12)
xor %eax,%eax
Now branch is almost never taken, so the faster is to keep the condition as CPU branch predictor will be almost always correct. Note that <slowpath> is not just return 1;, it also updates the branch profile, so if it happens that the pattern changed during the program execution (7 become to appear more often), then the method will be recompiled to the first version.
In general, don't try to be smarter than JIT-compiler in such simple cases.
OK, so I think that the reason you are asking this is that if the execution time of a crypto function depends on the inputs to the function, then an attacker can gain clues as to those inputs by measuring the execution time. (Hence, the normal "premature optimization" and "don't try to outsmart the compiler" advice don't really apply.)
In the light of that, here are my suggestions:
If x is a constant at compile time (or JIT compile time) then the chances are that the code will be optimized to either
result = true; or result = false;
If x is not a constant, but there is a small range of possible values then one of the following approaches will probably work:
// It is possible but unlikely that the JIT compiler will
// turn this into conditional code.
private boolean[] LOOKUP = new boolean[] {
true, true, true, true, true, true, true, false};
...
result = LOOKUP[x];
// This depends on how the JIT compiler translates this to native
// code.
switch (x) {
case 0: case 1: case 2: case 3: case 4: case 5: case 6:
result = false;
case 7:
result = true;
}
The problem is that in every possible approach I can think of, the JIT compiler could legally optimize non-branching code into branching code. If this is security critical, then you need to investigate the actual native code emitted for every platform that you need to certify.
The other approach is to:
analyze the Java code algorithm,
try to spot cases where conditional branching is likely,
design test inputs to trigger those branching paths,
measure execution time (on all target platforms) to see if there is a detectable difference across your set of test inputs.
Of course, the other thing to note is that this may be moot anyway; e.g. if result is then used in another part of the crypto function to decide with execution path to take.
And ...
The expression will be evaluated millions of times, so if it there is for instance a clever solution that uses only 2 operators instead of 3, this would make the overall computation faster.
If this is your real motivation ... then my advice is Don't Bother. This is premature optimization. Leave it to the JIT compiler.
Since the goal is to accomplish
if (x == 7)
result = 1;
else
result = 0;
in some sort of algebraic fashion without branching,
result = 1 >> (x^7);
But then this does not work because right shift is masked to only a few bits. So, what you can do is,
result = 1 >> Integer.bitCount(x^7);
but it's still masked for case of -6 (all bits set in case of -6 ^ 7), so,
int bc = Integer.bitCount(x^7);
return 1 >> (bc | (bc>>1));
So, how much slower is it than a branch conditional? Above solution using bitCount(), to compare entire range int range more than once,
user 0m5.948s
Using branching, (x == 7 ? 1 : 0),
user 0m2.104s
So it's not too bad considering you get constant time comparison that works for any value, 7 being just an example. Yes, Integer.bitCount() is constant time too.
A ternary would be a good option here:
result = x == 7 ? 1 : 0;
This code assigns 1 to result if the expression x == 7 evaluates to true, and assigns 0 otherwise.
Taking advantage of the extremely limited range of x, which is in [0,15], I propose using an in-register table lookup, using one bit per table entry. The table has bit i set for those inputs that should produce a 1, and zero otherwise. This means our table is the literal constant 27 = 128:
int x,result;
result = (128 >> x) & 1;
I do understand that both += and var1=var1+var2 both do the same thing in Java. My question is, apart from better readability, what other benefit(if any) the former brings? Im asking because I was curious why java developers introduced this, they should have done it because of some value addition.
It prevents the LHS of the + operation from being evaluated twice.
var.getSomething().x = var.getSomething().x + y;
This is not equivalent to
var.getSomething().x += y;
However, the added value from increased readability and reduced typing effort is not to be underestimated.
x+=y and x=x+y are not same.
x+=y is equivalent to x=(Type of x)(x+y)
byte x=4;
byte y=3;
x+=y; // x =(byte)(x+y)
x = x + y; // compile time error
IMHO these operators (+=, -=, *=, /=) are inherited from C. In the old days (before we had highly optimising compilers), they could improve performance because on many architectures, these operators translated directly into a single assembler instruction. Nowadays, all compilers should automatically produce the same code. As for Java, I don't know if the client VM will produce the same bytecode even if there are no side effects on the RHS. Shouldn't make a difference in performance though.
And Dietrich is right, += et all prevent the argument from being evaluated twice (important when there are side effects).
This may seem like a silly question, but why is it that in many languages there exists a prefix and postfix version of the ++ and -- operator, but no similar prefix/postfix versions of other operators like += or -=? For example, it seems like if I can write this code:
myArray[x++] = 137; // Write 137 to array index at x, then increment x
I should be able to write something like
myArray[5 =+ x] = 137; // Write 137 to array index at x, then add five to x
Of course, such an operator does not exist. Is there a reason for this? It seems like a weird asymmetry in C/C++/Java.
I'd guess there are several reasons, I think among the more heavily weighted might be:
there probably weren't thought to be too many real use cases (it may not have even occurred to some language designers in the early days)
pre/post increment mapped directly to machine operations (at least on several machines), so they found their way into the language (update: it turns out that this isn't exactly true, even if it's commonly thought so in computing lore. See below).
Then again, while the idea for pre/post/increment/decrement operators might have been influenced by machine operations, it looks like they weren't put into the language specifically to take advantage of such. Here's what Dennis Ritchie has to say about them:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few `auto-increment' memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.
As long as y has no side effects:
#define POSTADD(x,y) (((x)+=(y))-(y))
I'll make an assumption. There're lots of use-cases for ++i/i++ and in many the specific type of increment (pre/post) makes difference. I can't tell how many times I've seen code like while (buf[i++]) {...}. On the other hand, += is used much less frequently, as it rarely makes sense to shift pointer by 5 elements at once.
So, there's just no common enough application where difference between postfix and prefix version of += would be important.
I guess it's because it's way too cryptic.
Some argue that even ++/-- should be avoided, because they cause confusion and are responsible for most buffer overrun bugs.
Because the -- and ++ operators map to inc(rement) and dec(rement) instructions (in addition to adding and subtracting) in the CPU, and these operators are supposed to map to the instructions, hence why they exist as separate operators.
Java and C++ have pre- and post- increment and decrement operators because C has them. C has them because C was written, mostly, for the PDP-11 and the PDP-11 had INC and DEC instructions.
Back in the day, optimizing compilers didn't exist so if you wanted to use a single cycle increment operator, either you wrote assembler for it or your language needed an explicit operator for it; C, being a portable assembling language, has explicit increment and decrement operators. Also, the performance difference between ++i and i++ rarely matters now but it did matter in 1972.
Keep in mind that C is almost 40 years old.
If I had to guess, it's common to equate:
x += 5;
...with:
x = x + 5;
And for obvious reasons, it would be nonsensical to write:
x + 5 = x;
I'm not sure how else you would mimic the behavior of 5 =+ x using just the + operator. By the way, hi htiek!