Why does the reverse() method in StringBuffer/StringBuilder classes use bitwise operator?
I would like to know the advantages of it.
public AbstractStringBuilder reverse() {
boolean hasSurrogate = false;
int n = count - 1;
for (int j = (n-1) >> 1; j >= 0; --j) {
char temp = value[j];
char temp2 = value[n - j];
if (!hasSurrogate) {
hasSurrogate = (temp >= Character.MIN_SURROGATE && temp <= Character.MAX_SURROGATE)
|| (temp2 >= Character.MIN_SURROGATE && temp2 <= Character.MAX_SURROGATE);
}
value[j] = temp2;
value[n - j] = temp;
}
if (hasSurrogate) {
// Reverse back all valid surrogate pairs
for (int i = 0; i < count - 1; i++) {
char c2 = value[i];
if (Character.isLowSurrogate(c2)) {
char c1 = value[i + 1];
if (Character.isHighSurrogate(c1)) {
value[i++] = c1;
value[i] = c2;
}
}
}
}
return this;
}
Right shifting by one means dividing by two, I don't think you'll notice any performance difference, the compiler will perform these optimization at compile time.
Many programmers are used to right shift by two when dividing instead of writing / 2, it's a matter of style, or maybe one day it was really more efficient to right shift instead of actually dividing by writing / 2, (prior to optimizations). Compilers know how to optimize things like that, I wouldn't waste my time by trying to write things that might be unclear to other programmers (unless they really make difference). Anyway, the loop is equivalent to:
int n = count - 1;
for (int j = (n-1) / 2; j >= 0; --j)
As #MarkoTopolnik mentioned in his comment, JDK was written without considering any optimization at all, this might explain why they explicitly right shifted the number by one instead of explicitly dividing it, if they considered the maximum power of the optimization, they would probably have wrote / 2.
Just in case you're wondering why they are equivalent, the best explanation is by example, consider the number 32. Assuming 8 bits, its binary representation is:
00100000
right shift it by one:
00010000
which has the value 16 (1 * 24)
In summary:
The >> operator in Java is known as the Sign Extended Right Bit Shift operator.
X >> 1 is mathematically equivalent to X / 2, for all strictly positive value of X.
X >> 1 is always faster than X / 2, in a ratio of roughly 1:16, though the difference might turn out to be much less significant in actual benchmark due to modern processor architecture.
All mainstream JVMs can correctly perform such optimizations, but the non-optimized byte code will be executed in interpreted mode thousand of times before these optimization actually occurs.
The JRE source code use a lot of optimization idioms, because they make an important difference on code executed in interpreted mode (and most importantly, at the JVM launch time).
The systematic use of proven-to-be-effective code optimization idioms that are accepted by a whole development team is not premature optimization.
Long answer
The following discussion try to correctly address all questions and doubts that have been issued in other comments on this page. It is so long because I felt that it was necesary to put emphasis on why some approach are better, rather than show off personal benchmark results, beliefs and practice, where millage might significantly vary from one person to the next.
So let's take questions one at a time.
1. What means X >> 1 (or X << 1, or X >>> 1) in Java?
The >>, << and >>> are collectively known as the Bit Shift operators. >> is commonly known as Sign Extended Right Bit Shift, or Arithmetic Right Bit Shift. >>> is the Non-Sign Extended Right Bit Shift (also known as Logical Right Bit Shift), and << is simply the Left Bit Shift (sign extension does not apply in that direction, so there is no need for logical and arithmetic variants).
Bit Shift operators are available (though with varying notation) in many programming language (actually, from a quick survey I would say, almost every languages that are more or less descendents of the C language, plus a few others). Bit Shifts are fundamental binary operations, and consquently, almost every CPU ever created offer assembly instructions for these. Bit Shifters are also a classic buiding block in electronic design, which, given a reasonable number of transitors, provide its final result in a single step, with a constant and predicatable stabilization period time.
Concretly, a bit shift operator transforms a number by moving all of its bits by n positions, either left or right. Bits that falls out are forgotten; bits that "comes in" are forced to 0, except in the case of the sign extended right bit shift, in which the left-most bit preserve its value (and therefore its sign). See Wikipedia for some graphic of this.
2. Does X >> 1 equals to X / 2?
Yes, as long as the dividend is guaranteed to be positive.
More generally:
a left shift by N is equivalent to a multiplication by 2N;
a logical right shift by N is equivalent to an unsigned integer division by 2N;
an arithmetic right shift by N is equivalent to a non-integer division by 2N, rounded to integer toward negative infinity (which is also equivalent to a signed integer division by 2N for any strictly positive integer).
3. Is bit shifting faster than the equivalent artihemtic operation, at the CPU level?
Yes, it is.
First of all, we can easily assert that, at the CPU's level, bit shifting does require less work than the equivalent arithmetic operation. This is true both for multiplications and divisions, and the reason for this is simple: both integer multiplication and integer division circuitry themselves contains several bit shifters. Put otherwise: a bit shift unit represents a mere fraction of the complexity level of a multiplication or division unit. It is therefore guaranteed that less energy is required to perform a simple bit shift rather than a full arithmetic operation. Yet, in the end, unless you monitor your CPU's electric consumption or heat dissipation, I doubt that you might notice the fact that your CPU is using more energy.
Now, lets talk about speed. On processors with reasonnably simple architecture (that is roughly, any processor designed before the Pentium or the PowerPC, plus most recent processors that do not feature some form of execution pipelines), integer division (and multiplication, to a lesser degree) is generally implemented by iterating over bits (actually group of bits, known as radix) on one of the operand. Each iteration require one CPU cycle, which means that integer division on a 32 bits processor would require (at most) 16 cycles (assuming a Radix 2 SRT division unit, on an hypothetical processor). Multiplication units usually handle more bits at once, so a 32 bits processor might complete integer multiplication in 4 to 8 cycles. These units might use some form of variable bit shifter to quickly jump over sequence of consecutive zeros, and therefore might terminate quickly when multiplying or dividing by simple operands (such as positive power of two); in that case, the arithmetic operation will complete in less cycles, but will still require more than a simple bit shift operation.
Obviously, instruction timing vary between processor designs, but the preceeding ratio (bit shift = 1, multiplication = 4, division = 16) is a reasonable approximation of actual performance of these instructions. For reference, on the Intel 486, the SHR, IMUL and IDIV instructions (for 32 bits, assuming register by a constant) required respectively 2, 13-42 and 43 cycles (see here for a list of 486 instructions with their timing).
What about CPUs found in modern computers? These processors are designed around pipeline architectures that allow the simultaneous execution of several instructions; the result is that most instructions nowaday require only one cycle of dedicated time. But this is misleading, since instructions actually remains in the pipeline for several cycles before being released, during which they might prevent other instructions from being completed. The integer multiplication or division unit remains "reserved" during that time and therefore any further division will be hold back. That is particularly a problem in short loops, where a single mutliplication or division will end up being stalled by the previous invocation of itself that hasn't yet completed. Bit shift instructions do not suffer from such risk: most "complex" processors have access to several bit shift units, and don't need to reserve them for very long (though generally at least 2 cycles for reasons intrinsic to the pipeline architecture). Actually, to put this into numbers, a quick look at the Intel Optimization Reference Manual for the Atom seems to indicates that SHR, IMUL and IDIV (same parameter as above) respectively have a 2, 5 and 57 latency cycles; for 64 bits operands, it is 8, 14 and 197 cycles. Similar latency applies to most recent Intel processors.
So, yes, bit shifting is faster than the equivalent arithmetic operations, even though in some situations, on modern processors, it might actualy makes absolutely no difference. But in most case, it is very significant.
4. Will the Java Virtual Machine will perform such optimization for me?
Sure, it will. Well... most certainly, and... eventually.
Unlike most language compilers, regular Java compilers perform no optimization. It is considered that the Java Virtual Machine is in best position to decide how to optimize a program for a specific execution context. And this indeed provide good results in practice. The JIT compiler acquire very deep understanding of the code's dynamics, and exploit this knowledge to select and apply tons of minor code transforms, in order to produce a very efficient native code.
But compiling byte code into optimized native methods require a lot of time and memory. That is why the JVM will not even consider optimizing a code block before it has been executed thousands of times. Then, even though the code block has been scheduled for optimization, it might be a long time before the compiler thread actualy process that method. And later, various conditions might cause that optimized code block to be discarded, reverting back to byte code interpretation.
Though the JSE API is designed with the objective of being implementable by various vendor, it is incorrect to claim that so is the JRE. The Oracle JRE is provided to other everyone as the reference implementation, but its usage with another JVM is discouraged (actualy, it was forbiden not so long ago, before Oracle open sourced the JRE's source code).
Optimizations in the JRE source code are the result of adopted conventions and optimization efforts among JRE developpers to provide reasonable performances even in situations where JIT optimizations haven't yet or simply can't help. For example, hundreds of classes are loaded before your main method is invoked. That early, the JIT compiler has not yet acquired sufficient information to properly optimize code. At such time, hand made optimizations makes an important difference.
5. Ain't this is premature optimization?
It is, unless there is a reason why it is not.
It is a fact of modern life that whenever a programmer demonstrate a code optimization somewhere, another programmer will oppose Donald Knuth's quote on optimization (well, was it his? who knows...) It is even perceived by many as the clear assertion by Knuth that we should never try to optimize code. Unfortunately, that is a major misunderstanding of Knuth's important contributions to computer science in the last decades: Knuth as actually authored thousand of pages of literacy on practical code optimization.
As Knuth put it:
Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.
— Donald E. Knuth, "Structured Programming with Goto Statements"
What Knuth qualify as premature optimization are optimizations that require lot of thinking and apply only to non critical part of a program, and have strong negative impact on debugging and maintenance. Now, all of this could be debated for a long time, but let's not.
It should however be understood that small local optimizations, that have been proven to be effective (that is, at least in average, on the overall), that do not negatively affect the overall construction of a program, do not reduce a code's maintainability, and do not require extraneous thinking are not a bad thing at all. Such optimizations are actualy good, since they cost you nothing, and we should not pass up such opportunities.
Yet, and that is the most important thing to remember, an optimization that would be trivial to programers in one context might turn out to be incomprenhendable to programmers in another context. Bit shifting and masking idioms are particularly problematic for that reason. Programmers that do know the idiom can read it and use it without much thinking, and the effectiveness of these optimizations is proven, though generaly insignificant unless the code contains hundreds of occurences. These idioms are rarely an actual source of bugs. Still, programmers unfamilliar with a specific idiom will loose time understanding what, why and how that specific code snippet does.
In the end, either to favor such optimization or not, and exactly which idioms should be used is really a matter of team decision and code context. I personnaly consider a certain number of idioms to be best practice in all situations, and any new programmer joining my team quickly acquire these. Many more idioms are reserved to critical code path. All code put into internal shared code library are treated as critical code path, since they might turns out to be invoked from such critical code path. Anyway, that is my personal practice, and your millage may vary.
It uses (n-1) >> 1 instead of (n-1)/2 to find the middle index of the internal array to be reversed. Bitwise shift operators are usually more efficient than the division operator.
In this method, there's just this expression: (n-1) >> 1. I assume this is the expression you're referring to. This is called right shift/shifting. It is equivalent to (n-1)/2 but it's generally considered faster, more efficient. It's used often in many other languages too (e.g. in C/C++).
Note though that modern compilers will optimize your code anyway even if you use division like (n-1)/2. So there are no obvious benefits of using right shifting. It's more like a question of coding preference, style, habit.
See also:
Is shifting bits faster than multiplying and dividing in Java? .NET?
Related
Hash Functions are incredibly useful and versatile. In general, they are used to map a space to one much smaller space. Of course that means that two objects may hash to the same
value (collision), but this is because you are reducing the space (pigeonhole principle).
The efficiency of the function largely depends on the size of the hash space.
It comes as a surprise then that a lot of Java hashCode functions are using multiplication to produce the hash code of a new object as e.g. follows (creating-a-hashcode-method-java)
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((email == null) ? 0 : email.hashCode());
result = prime * result + (int) (id ^ (id >>> 32));
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
If we want to mix two hashcodes in the same range, xor should be much better than addition and is I think traditionally used. If we wanted to increase the space, shifting by some bytes and then xoring would still imho make sense. I guess multiplying by 31 is almost the same as shifting one hash by 1 and then adding but it should be much less efficient...
As it is the recommended approach though, I think I am missing something. So my question is why would this be?
Notes:
I am not asking why we use a prime. It is pretty clear that if we used multiplication, we should go with a prime. However multiplying by any number, even a prime, should still be suboptimal to xor. That is why e.g. all these other non-cryptographic hash functions - as well as most cryptographic - use xor and not multiplications...
I have indeed no indication (apart from all those well known hash functions) xor would be better. In fact just by the fact it is so widely accepted, I suspect it should be as good and in practice better to multiply by a prime and sum. I am asking why this is...
The int type in Java can be used to represent any whole number from -2147483648 to 2147483647.
Sometimes the hashcode of an object may be its memory address (which makes sense and is efficient in a lot of situations) (if inherited from e.g. object)
The answer to this is a mixture of different factors:
On modern architecture, the time taken to perform a multiplication versus a shift may not end up being measurable overall within a given pipeline of instructions-- it has more to do with the availability of the relevant execution unit on the CPU than the "raw" time taken;
In practice when integrating with standard collections libraries in day-to-day programming, it's often more important that a hash function is correct, "good enough" and easy to automate in an IDE than for it to be as perfect as possible;
The collections libraries generally add secondary hash functions and potentially other techniques behind the scenes to overcome some of the weaknesses of what would otherwise be a poor hash function;
With resizable collections, an effective hash function has the goal of dispersing its hashes across the available range for arbitrary sizes of hash tables (though as I say, it will get help from the built-in secondary function): multiplying by a "magic" constant is often a cheap way to achieve this (or, even if multiplication turned out to be a bit more expensive than a shift: still cheap enough, given the benefit); addition rather than XOR may help to allow this 'avalanche' effect slightly. (In most practical cases, you will probably find that they work equally well.)
You can generally assume that the JIT compiler "knows" about equivalents such as shifting 5 places and subtracting 1 rather than multiplying by 31. Just because you write "*31" in the source code doesn't mean that it will literally be compiled to a multiplication instruction. (In practice, it might be, though, because despite what you think, the multiply instruction may well be "faster" on average on the architecture in question... It's usually better to make your code stick to the required logic and let the JIT compiler handle the low level optimisations in a case such as this.)
This question already has answers here:
Why does Java's hashCode() in String use 31 as a multiplier?
(13 answers)
Closed 5 years ago.
Question is that simple:
Would combining two low values with a common basic operation like addition, division, modulo, bit shift and others be combuted faster than same operation with greater values?
This would, as far as I can tell, require the CPU to keep track of the most significant bit (which I assume to be unlikely) but maybe there's something else in the business.
I am specifically asking because I often see rather low primes (e.g. 31) used in some of Java's basic classes' hashCode() method (e.g. String and List), which is surprising since greater values would most likely cause more diffusion (which is generally a good thing for hashfunctions).
Arithmetic
I do not think there are many pipelined processors (i.e. almost all except the very smallest) where a simple arithmetic instruction's cost would change with the value of a register or memory operand. This would make the design of the pipeline more complex and may be counterproductive in practice.
I could imagine that a very complex instruction (at least a division) that may take many cycles compared to the pipeline length could show such behaviour, since it likely introduces wait states anyway. Agner Fog writes that this is true "on AMD processors, but not on Intel processors."
If an operation cannot be computed in one instruction, like a multiplication of numbers that are larger than the native integer width, the implementation may well contain a "fast path" for cases e.g. the upper half of both operands is zero. A common example would be 64 bit multiplications on x86 32 bit architectures used by MSVC. Some smaller processors do not have instructions for division, sometimes not even for multiplication. The assembly for computing these operations may well terminate earlier for smaller operands. This effect would be felt more acutely on smaller architectures.
Immediate Value Encodings
For immediate values (constants) this may be different. For example there are RISC processors that allow encoding up to 16 bit immediates in a load/add-immediate instruction and require either two operations for loading a 32-bit word via load-upper-immediate + add-immediate or have to load the constant from program memory.
In CISC processors a large immediate value will likely take up more memory, which may reduce the number of instructions that can be fetched per cycle, or increase the number of cache misses.
In such a cases a smaller constant may be cheaper than a large one.
I am not sure if the encoding differences matter as much for Java, since most code will at least initially be distributed as Java bytecode, althoug a JIT-enabled JVM will translate the code to machine code and some library classes may have precompiled implementations. I do not know enough about Java bytecode to determine the consequences of constant size on it. From what I read it seems to me most constants are usually loaded via an index from constant pools and not directly encoded in the bytecode stream, so I would not expect a large difference here, if any.
Strenght reduction optimizations
For very expensive operations (relative to the processor) compilers and programmers often employ tricks to replace a hard computation by a simpler one that is valid for a constant, like in the multiplication example mentioned where a multiplication is replaced by a shift and a subtraction/addition.
In the example given (multiply by 31 vs. multiply by 65,537), I would not expect a difference. For other numbers there will be a difference, but it will not correlate perfectly with the magnitude of the number. Divisions by constants are also commonly replaced by an arcane sequence of multiplications and shifts.
See for example how gcc translates a division by 13.
On an x86 processors some multiplications by small constants can be replaced by load-effective-address instructions, but only for certain constants.
All in all I would expect this effect to depend very much on the processor architecture and the operations to be performed. Since Java is supposed to run almost everywhere, I think the library authors wanted their code to be efficient over a large range of processors, including small embedded processors, where operand size will play a larger role.
I'm about to start optimizations on an enormous piece of code and I need to know exactly which operations are performed when the modulus operator is used. I have been searching for quite a while, but I can't find anything about the machine code behind it. Any ideas?
If you need to know
exactly which operations are performed when the modulus operator is used
then I would suggest you are "doing it wrong".
Modulus may be different depending on OS and underlying architecture. It may vary or it may not, but if you need to rely on the implementation, it is likely that your time could best be spent elsewhere. The implementation is not guaranteed to stay the same, or to be consistent across different machines.
Why do you believe modulus to be a major source of computation? Regardless of its implementation, the operation is very likely to be a constant - i.e, if it is operating within an algorithm which has big-O greater than constant time, optimize the algorithm first.
Ask yourself why you need to optimize. Is the computation taking (significantly) longer than expected?
Then ask yourself where 90 - 99% of the computation is being spent. Try using a profiler to get numbers, even if you think you know where time is being spent. It may give you a clue or shed light on a bug.
The modulus operator on integers is built in on most platforms. An instruction with the timing comparable to that of division is performed, producing the modulus.
The compiler can perform an optimization for divisors that are powers of two: instead of performing modulos of, say, x % 512, the compiler can use a potentially faster x & 0x01FF.
Any ideas?
Yes, don't waste your time on it. There's going to be other bits of code you can improve far more than trying to beat the compiler at it's own job
Recently someone told me that a comparison involving smaller integers will be faster, e.g.,
// Case 1
int i = 0;
int j = 1000;
if(i<j) {//do something}
// Case 2
int j = 6000000;
int i = 500000;
if(i<j){//do something else}
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
Edit 1: I was thinking of binary representation of i & j, e.g., for i=0, it will be 0 and for j=1000 its 1111101000 (in 32-bit presentation it should be: 22 zeros followed by 1111101000, completely forgot about 32-bit or 64-bit representation, my bad!)
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me. Now the question is how does comparison work for numbers in java? I guess that will also answer why (or why not) any of the cases will be faster.
Edit 2: I am just looking for a detailed explanation, I am not really worried about micro optimizations
If you care about performance, you really only need to consider what happens when the code is compiled to native code. In this case, it is the behaviour of the CPU which matters. For simple operations almost all CPUs work the same way.
so, comparison (if condition) in Case 1 should be faster than that in Case 2. In Case 2 the integers will take more space to store but that can affect the comparison, I am not sure.
An int is always 32-bit in Java. This means it always takes the same space. In any case, the size of the data type is not very important e.g. short and byte are often slower because the native word size is 32-bit and/or 64-bit and it has to extract the right data from a larger type and sign extend it.
I tried to look at JVM Specification and bytecode of a simple comparison program, nothing made much sense to me.
The JLS specifies behaviour, not performance characteristics.
Now the question is how does comparison work for numbers in java?
It works the same way it does in just about every other compiled language. It uses a single machine code instruction to compare the two values and another to perform a condition branch as required.
I guess that will also answer why (or why not) any of the cases will be faster.
Most modern CPUs use branch prediction. To keep the CPUs pipeline full it attempt to predict which branch will be taken (before it know it is the right one to take) so there is no break in the instruction executed. WHen this works well the branch has almost no cost. When it mis-predicts, the pipeline can be filled with instructions for a branch which was the wrong guess and this can cause a significant delay as it clears the pipeline and takes the right branch. In the worst case it can mean 100s of clock cycles delay. e.g.
Consider the following.
int i; // happens to be 0
if (i > 0) {
int j = 100 / i;
Say the branch is usually taken. This means the pipeline could be loaded with an instruction which triggers an interrupt (or Error in Java) before it knows the branch will not be taken. This can result in a complex situation which takes a while to unwind correctly.
These are called Mispredicted branches In short a branch which goes the same way every/most of the time is faster, a branch which suddenly changes or is quite random (e.g. in sorting and tree data structures) will be slower.
Int might be faster on 32-bit system and long might be faster on 64-bit system.
Should I bother about it? No you dont code for system configuration you code based on what are your requirements. Micro optinizations never work and they might introduce some unprecedented issues.
Small Integer objects can be faster because java treats them specific.
All Integer values for -128 && i <= 127 are stored in IntegerCache an inner class of Integer
This may seem like a silly question, but why is it that in many languages there exists a prefix and postfix version of the ++ and -- operator, but no similar prefix/postfix versions of other operators like += or -=? For example, it seems like if I can write this code:
myArray[x++] = 137; // Write 137 to array index at x, then increment x
I should be able to write something like
myArray[5 =+ x] = 137; // Write 137 to array index at x, then add five to x
Of course, such an operator does not exist. Is there a reason for this? It seems like a weird asymmetry in C/C++/Java.
I'd guess there are several reasons, I think among the more heavily weighted might be:
there probably weren't thought to be too many real use cases (it may not have even occurred to some language designers in the early days)
pre/post increment mapped directly to machine operations (at least on several machines), so they found their way into the language (update: it turns out that this isn't exactly true, even if it's commonly thought so in computing lore. See below).
Then again, while the idea for pre/post/increment/decrement operators might have been influenced by machine operations, it looks like they weren't put into the language specifically to take advantage of such. Here's what Dennis Ritchie has to say about them:
http://cm.bell-labs.com/cm/cs/who/dmr/chist.html
Thompson went a step further by inventing the ++ and -- operators, which increment or decrement; their prefix or postfix position determines whether the alteration occurs before or after noting the value of the operand. They were not in the earliest versions of B, but appeared along the way. People often guess that they were created to use the auto-increment and auto-decrement address modes provided by the DEC PDP-11 on which C and Unix first became popular. This is historically impossible, since there was no PDP-11 when B was developed. The PDP-7, however, did have a few `auto-increment' memory cells, with the property that an indirect memory reference through them incremented the cell. This feature probably suggested such operators to Thompson; the generalization to make them both prefix and postfix was his own. Indeed, the auto-increment cells were not used directly in implementation of the operators, and a stronger motivation for the innovation was probably his observation that the translation of ++x was smaller than that of x=x+1.
As long as y has no side effects:
#define POSTADD(x,y) (((x)+=(y))-(y))
I'll make an assumption. There're lots of use-cases for ++i/i++ and in many the specific type of increment (pre/post) makes difference. I can't tell how many times I've seen code like while (buf[i++]) {...}. On the other hand, += is used much less frequently, as it rarely makes sense to shift pointer by 5 elements at once.
So, there's just no common enough application where difference between postfix and prefix version of += would be important.
I guess it's because it's way too cryptic.
Some argue that even ++/-- should be avoided, because they cause confusion and are responsible for most buffer overrun bugs.
Because the -- and ++ operators map to inc(rement) and dec(rement) instructions (in addition to adding and subtracting) in the CPU, and these operators are supposed to map to the instructions, hence why they exist as separate operators.
Java and C++ have pre- and post- increment and decrement operators because C has them. C has them because C was written, mostly, for the PDP-11 and the PDP-11 had INC and DEC instructions.
Back in the day, optimizing compilers didn't exist so if you wanted to use a single cycle increment operator, either you wrote assembler for it or your language needed an explicit operator for it; C, being a portable assembling language, has explicit increment and decrement operators. Also, the performance difference between ++i and i++ rarely matters now but it did matter in 1972.
Keep in mind that C is almost 40 years old.
If I had to guess, it's common to equate:
x += 5;
...with:
x = x + 5;
And for obvious reasons, it would be nonsensical to write:
x + 5 = x;
I'm not sure how else you would mimic the behavior of 5 =+ x using just the + operator. By the way, hi htiek!