How Does Java Interpret Bytecode Constants Larger Than One Byte, Unaligned Efficiently?

How Does Java Interpret Bytecode Constants Larger Than One Byte, Unaligned Efficiently? - java

While all java byte codes are 1 byte wide, at points there are variable sized constants which could range from 1 byte to 8 bytes. How does java fetch these instructions with operands larger than one byte efficiently since this data access would be done unaligned. And lastly how does java virtual machine preform this operation on platforms that do not support unaligned data access (ex ARM,Alpha)?

It cannot be done efficiently. The solutions are (as you are likely to be aware of):
micro-assembler if feasibly (not ARM), reprogramming the CPU;
load time transformation: a better, faster interpreted byte code (unlikely);
load time transformation: a simple compilation;
extending the Just-In-Time compilation;
pre-compilation.
Mind, that the interpretation overhead of byte code is not that much higher than of word code, especially as the interpretation cycle itself has the largest overhead.
Nevertheless I did some work on older processors, with customizable optimizable interpreters, and there it helped.
The GNU java commpiler might be mentioned.

Related

Java add two byte arrays to emulate x86 instruction

I am experimenting with x86 instructions emulation under java (just for fun) and ran into the problem with "override prefixes" which an instruction may have.
A prefix can change the behavior of an instruction
For examle with "operand size override prefix" you can change the size of the operands. 16 bit to 32 bit or vice versa.
the problem is now: when the program runs in 16 bit mode all the operations are done with chars (char is 16 bit wide), when the operand size changes to 32 bit, I would like to run the operations with integers. So I have redundant code. My idea is now to implement a byte array operations, for example I could implement an algorithm for addition of two byte-arrays. The advantage here would be: you could simply switch between different modes even in 128 bit and so on. But on the other side an addition of a bytearray may be not very performant as an addition of two integers...
Do you know a better way to do this?
What do you think about it?

I think you need to model memory as an array of bytes, because x86 supports unaligned loads / stores. You should probably decode instructions into load / ALU / store (where each part is optional, e.g. add eax, ecx only need ALU, not load or store).
You only have to write the code once to make an int32 from 4 bytes, or to store 4 bytes from an int32. Or if Java lets you get an Int reference to an arbitrarily-aligned 4 bytes, then you could use that as a source or destination operand when the operand-size is 32 bits.
If you can write type-generic versions of add, sub, etc., in Java, you can reuse the same code for each operand-size. So you'd have one switch() on the operand-size in the decoder, and dispatch from there to the handler functions for each instruction. If you use a table of pointers (or of Objects with methods), the same object could appear in the 8-bit table and the 32-bit table if it's generic. (unlike div or mul where they use AH:AL for 8-bit but all wider operand sizes use (E|R)DX:(E|R)AX.
BTW, the possible load/store sizes x86 supports are byte/word/dword/qword (x87 and i486 cmpxchg8b) / xmm / ymm / zmm, and 6-byte (segment + 32-bit pointer les or far jmp [mem]). And also 10-byte x87 or segment + 64-bit pointer (e.g. far jmp).
The last two are handled internally as two separate loads, e.g. a 6-byte load isn't guaranteed to be atomic: Why is integer assignment on a naturally aligned variable atomic on x86?. Only power-of-2 sizes up to 8 bytes are guaranteed atomic (with some alignment restrictions).
For more ideas about emulating x86, see some BOCHS design documents, e.g.
How Bochs Works Under the Hood. It's an interpreting emulator, no JIT / dynamic recompilation, like you're writing.
It covers some important ideas like lazy flag handling. Some of the ideas there make the emulator's overall design more complex to gain performance, but lazy flags is pretty limited complexity and should help a lot.

Is a (common) CPU faster at computing low values than great ones? [duplicate]

This question already has answers here:
Why does Java's hashCode() in String use 31 as a multiplier?
(13 answers)
Closed 5 years ago.
Question is that simple:
Would combining two low values with a common basic operation like addition, division, modulo, bit shift and others be combuted faster than same operation with greater values?
This would, as far as I can tell, require the CPU to keep track of the most significant bit (which I assume to be unlikely) but maybe there's something else in the business.
I am specifically asking because I often see rather low primes (e.g. 31) used in some of Java's basic classes' hashCode() method (e.g. String and List), which is surprising since greater values would most likely cause more diffusion (which is generally a good thing for hashfunctions).

Arithmetic
I do not think there are many pipelined processors (i.e. almost all except the very smallest) where a simple arithmetic instruction's cost would change with the value of a register or memory operand. This would make the design of the pipeline more complex and may be counterproductive in practice.
I could imagine that a very complex instruction (at least a division) that may take many cycles compared to the pipeline length could show such behaviour, since it likely introduces wait states anyway. Agner Fog writes that this is true "on AMD processors, but not on Intel processors."
If an operation cannot be computed in one instruction, like a multiplication of numbers that are larger than the native integer width, the implementation may well contain a "fast path" for cases e.g. the upper half of both operands is zero. A common example would be 64 bit multiplications on x86 32 bit architectures used by MSVC. Some smaller processors do not have instructions for division, sometimes not even for multiplication. The assembly for computing these operations may well terminate earlier for smaller operands. This effect would be felt more acutely on smaller architectures.
Immediate Value Encodings
For immediate values (constants) this may be different. For example there are RISC processors that allow encoding up to 16 bit immediates in a load/add-immediate instruction and require either two operations for loading a 32-bit word via load-upper-immediate + add-immediate or have to load the constant from program memory.
In CISC processors a large immediate value will likely take up more memory, which may reduce the number of instructions that can be fetched per cycle, or increase the number of cache misses.
In such a cases a smaller constant may be cheaper than a large one.
I am not sure if the encoding differences matter as much for Java, since most code will at least initially be distributed as Java bytecode, althoug a JIT-enabled JVM will translate the code to machine code and some library classes may have precompiled implementations. I do not know enough about Java bytecode to determine the consequences of constant size on it. From what I read it seems to me most constants are usually loaded via an index from constant pools and not directly encoded in the bytecode stream, so I would not expect a large difference here, if any.
Strenght reduction optimizations
For very expensive operations (relative to the processor) compilers and programmers often employ tricks to replace a hard computation by a simpler one that is valid for a constant, like in the multiplication example mentioned where a multiplication is replaced by a shift and a subtraction/addition.
In the example given (multiply by 31 vs. multiply by 65,537), I would not expect a difference. For other numbers there will be a difference, but it will not correlate perfectly with the magnitude of the number. Divisions by constants are also commonly replaced by an arcane sequence of multiplications and shifts.
See for example how gcc translates a division by 13.
On an x86 processors some multiplications by small constants can be replaced by load-effective-address instructions, but only for certain constants.
All in all I would expect this effect to depend very much on the processor architecture and the operations to be performed. Since Java is supposed to run almost everywhere, I think the library authors wanted their code to be efficient over a large range of processors, including small embedded processors, where operand size will play a larger role.

Why java socket use Byte as base structure rather then Bit?

I writing a protocol use Bit to represent Boolean, but java network not use the smaller structure. I want to know why network data designed as byte rather then bit? expose atom structure isn't better?

Because the fundamental packets of IP are defined in terms of bytes rather than bits. All packets are a whole number of bytes rather than bits. Even bit fields are part of a multi-byte field. Bytes, rather than bits, are fundamental to IP.
This is ultimately because computer memory does not have addressable bits, but rather has addressable bytes. Any implementation of a networking protocol would have to be based on bytes, not bits. And that that is also why Java does not provide direct access to bits.

The network bandwidth saving that could be achieved by carrying single bits of payload, compared to the added complexity both at the hardware and software level, simply does not worth it.
Fundamentally, both at hardware level (registers) and software level, the minimal unit for data handling is byte, 8 bits (or octet, if you want to be nitpicking) or multiple of that. You cannot address memory at the bit level, only at the multiple of a byte level. Doing otherwise would be very complicated, down to the silicium level, without added value.
Whatever the programing language, when you declare and use a boolean, a byte (or a power of 2 multiple number of bytes, why not as long ass I can load it from memory to a CPU register) will actually be used to store it and the language will take care that only 2 cases when using it: is this byte all 0 bits, or not. At the machine code/assembly level: load this byte from its memory address to register FOO, or multiple bytes (if for example 32 bits wide register), cmp FOO to 0, depending on the result JE (Jump If Egal) to code address BAR, else go on with next machine code line. Or JNE (Jump if Not Equal) to such other code address. So your Javan boolean is not actually stored as a bit. It's, at minimum, a byte.
Even the good old Ethernet frame, not even looking at the actual useful payload, starts by a 56-bit preamble to synchronize devices. 56 bits is 7 bytes. Could the synchronization be done with less than that? Not a number of bytes? Maybe, but that does not worth the effort.
https://en.wikipedia.org/wiki/Ethernet_frame#Preamble_and_start_frame_delimiter
Pedantic edit for nitpickers:
A language such as C have a bit field facility:
https://en.wikipedia.org/wiki/Bit_field
...but don't be fooled, the minimal storage unit at the silicum level for a bit from a bit field will still be a byte. Hence the "field" in "bit fields".

Does using small datatypes reduce memory usage (From memory allocation not efficiency)?

Actually, my question is very similar with this one, but the post is focus on the C# only. Recently I read an article said that java will 'promote' some short types (like short) to 4 bytes in memory even if some bits are not used, so it can't reduce usage. (is it true ?)
So my question is how languages, especially C, C++ and java (as Manish said in this post talked about java), handles memory allocation of small datatypes. References or any approaches to figure out it are preferred. Thanks

C/C++ uses only the specified amount of memory but aligns the data (by default) to an address that is a multiple of some value, typically 4 bytes for 32 bit applications or 8 bytes for 64 bit.
So for example if the data is aligned on a 4 or 8 byte boundary then a "char" uses only one byte. An array of 5 chars will use 5 bytes. But the data item that is allocated after the 5 byte char array is placed at an address that skips 3 bytes to keep it correctly aligned.
This is for performance on most processors. There are usually pragmas like "pack" and "align" that can be used to change the alignment or disable it.

In C and C++, different approaches may be taken depending on how you've requested the memory.
For T* p = (T*)malloc(n * sizeof(T)); or T* p = new T[n]; then the data will occupy sizeof(T)*n bytes of memory, so if sizeof(T) is reduced (e.g. to int16_t instead of int32_t) then that space is reduced accordingly. That said, heap allocations tend to have some overheads, so few large allocations are better than a great many allocations for individual data items or very small arrays, where the overheads may be much more significant than small differences in sizeof(T).
For structures, static and stack usage, padding is more significant than for large arrays, as the following data item might be of a different type with different alignment requirements, resulting in more padding.
At the other extreme, you can apply bitfields to effectively pack values into the minimum number of bits they need - very dense compression indeed, though you need to rely on compiler pragmas/attributes if you want explicit control - the Standard leaves it unspecified when a bitfield might start in a new memory "word" (e.g. 32 bit memory word for a 32 bit process, 64 for 64) or wrap across separate words, where in the words the bits hold data vs padding etc.). Data types like C++ bitsets and vector<bool> may be more efficient than arrays of bool (which may well use an int for each element, but it's unspecified in the C++03 Standard).`

Anyone using short and byte primitive types, in real apps?

I have been programming in Java since 2004, mostly enterprise and web applications. But I have never used short or byte, other than a toy program just to know how these types work. Even in a for loop of 100 times, we usually go with int. And I don't remember if I have ever came across any code which made use of byte or short, other than some public APIs and frameworks.
Yes I know, you can use a short or byte to save memory in large arrays, in situations where the memory savings actually matters. Does anyone care to practice that? Or its just something in the books.
[Edited]
Using byte arrays for network programming and socket communication is a quite common usage. Thanks, Darren, to point that out. Now how about short? Ryan, gave an excellent example. Thanks, Ryan.

I use byte a lot. Usually in the form of byte arrays or ByteBuffer, for network communications of binary data.
I rarely use float or double, and I don't think I've ever used short.

Keep in mind that Java is also used on mobile devices, where memory is much more limited.

I used 'byte' a lot, in C/C++ code implementing functionality like image compression (i.e. running a compression algorithm over each byte of a black-and-white bitmap), and processing binary network messages (by interpreting the bytes in the message).
However I have virtually never used 'float' or 'double'.

The primary usage I've seen for them is while processing data with an unknown structure or even no real structure. Network programming is an example of the former (whoever is sending the data knows what it means but you might not), something like image compression of 256-color (or grayscale) images is an example of the latter.
Off the top of my head grep comes to mind as another use, as does any sort of file copy. (Sure, the OS will do it--but sometimes that's not good enough.)

The Java language itself makes it unreasonably difficult to use the byte or short types. Whenever you perform any operation on a byte or short value, Java promotes it to an int first, and the result of the operation is returned as an int. Also, they're signed, and there are no unsigned equivalents, which is another frequent source of frustration.
So you end up using byte a lot because it's still the basic building block of all things cyber, but the short type might as well not exist.

Until today I haven't notice how seldom I use them.
I've use byte for network related stuff, but most of the times they were for my own tools/learning. In work projects these things are handled by frameworks ( JSP for instance )
Short? almost never.
Long? Neither.
My preferred integer literals are always int, for loops, counters, etc.
When data comes from another place ( a database for instance ) I use the proper type, but for literals I use always int.

I use bytes in lots of different places, mostly involving low-level data processing. Unfortunately, the designers of the Java language made bytes signed. I can't think of any situation in which having negative byte values has been useful. Having a 0-255 range would have been much more helpful.
I don't think I've ever used shorts in any proper code. I also never use floats (if I need floating point values, I always use double).
I agree with Tom. Ideally, in high-level languages we shouldn't be concerned with the underlying machine representations. We should be able to define our own ranges or use arbitrary precision numbers.

when we are programming for electronic devices like mobile phone we use byte and short.In this case we should take care on memory management.

It's perhaps more interesting to look at the semantics of int. Are those arbitrary limits and silent truncation what you want? For application-level code really wants arbitrary sized integers, it's just that Java has no way of expressing those reasonably.

I have used bytes when saving State while doing model checking. In that application the space savings are worth the extra work. Otherwise I never use them.

I found I was using byte variables when doing some low-level image processing. The .Net GDI+ draw routines were really slow so I hand-rolled my own.
Most times, though, I stick with signed integers unless I am forced to use something larger, given the problem constraints. Any sort of physics modeling I do usually requires floats or doubles, even if I don't need the precision.

Apache POI was using short quite a few times. Probably because of Excel's row/column number limitation.
A few months ago they changed to int replacing
createCell(short columnIndex)
with
createCell(int column).

On in-memory datagrids, it can be useful.
The concept of a datagrid like Gemfire is to have a huge distributed map.
When you don't have enough memory you can overflow to disk with LRU strategy, but the keys of all entries of your map remains in memory (at least with Gemfire).
Thus it is very important to make your keys with a small footprint, particularly if you are handling very large datasets.
For the entry value, when you can it's also better to use the appropriate type with a small memory footprint...

I have used shorts and bytes in Java apps communicating to custom usb or serial micro-controllers to receive 10bit values wrapped in 2 bytes as shorts.

bytes and shorts are extensively used in Java Card development. Take a look at my answer to Are there any real life uses for the Java byte primitive type?.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.