Java add two byte arrays to emulate x86 instruction - java

I am experimenting with x86 instructions emulation under java (just for fun) and ran into the problem with "override prefixes" which an instruction may have.
A prefix can change the behavior of an instruction
For examle with "operand size override prefix" you can change the size of the operands. 16 bit to 32 bit or vice versa.
the problem is now: when the program runs in 16 bit mode all the operations are done with chars (char is 16 bit wide), when the operand size changes to 32 bit, I would like to run the operations with integers. So I have redundant code. My idea is now to implement a byte array operations, for example I could implement an algorithm for addition of two byte-arrays. The advantage here would be: you could simply switch between different modes even in 128 bit and so on. But on the other side an addition of a bytearray may be not very performant as an addition of two integers...
Do you know a better way to do this?
What do you think about it?

I think you need to model memory as an array of bytes, because x86 supports unaligned loads / stores. You should probably decode instructions into load / ALU / store (where each part is optional, e.g. add eax, ecx only need ALU, not load or store).
You only have to write the code once to make an int32 from 4 bytes, or to store 4 bytes from an int32. Or if Java lets you get an Int reference to an arbitrarily-aligned 4 bytes, then you could use that as a source or destination operand when the operand-size is 32 bits.
If you can write type-generic versions of add, sub, etc., in Java, you can reuse the same code for each operand-size. So you'd have one switch() on the operand-size in the decoder, and dispatch from there to the handler functions for each instruction. If you use a table of pointers (or of Objects with methods), the same object could appear in the 8-bit table and the 32-bit table if it's generic. (unlike div or mul where they use AH:AL for 8-bit but all wider operand sizes use (E|R)DX:(E|R)AX.
BTW, the possible load/store sizes x86 supports are byte/word/dword/qword (x87 and i486 cmpxchg8b) / xmm / ymm / zmm, and 6-byte (segment + 32-bit pointer les or far jmp [mem]). And also 10-byte x87 or segment + 64-bit pointer (e.g. far jmp).
The last two are handled internally as two separate loads, e.g. a 6-byte load isn't guaranteed to be atomic: Why is integer assignment on a naturally aligned variable atomic on x86?. Only power-of-2 sizes up to 8 bytes are guaranteed atomic (with some alignment restrictions).
For more ideas about emulating x86, see some BOCHS design documents, e.g.
How Bochs Works Under the Hood. It's an interpreting emulator, no JIT / dynamic recompilation, like you're writing.
It covers some important ideas like lazy flag handling. Some of the ideas there make the emulator's overall design more complex to gain performance, but lazy flags is pretty limited complexity and should help a lot.

Related

Why is byte map faster than bit map?

I have read the The Garbage Collection Handbook. It says when doing card table,they use bytemap instead of bitmap, and said it is faster than bitmap, is it due to high speed cache line ? But as what i know ,the cache line normally is 64 bytes, if we make change on byte, race contention still exists , other cpu will still make the line invalidate ,it is the same as bit map ,anyone can help me on this ?
Not sure I got the context right but in general:
bit map access
requires address manipulation and read write of the whole BYTE/WORD/... as most architectures does not support bit read/write memory access.
So for 8 bit bit map like:
BYTE map[];
the code for read it is:
readed_bit=(map[bit>>3]>>(bit&7))&1;
set:
map[bit>>3]|=1<<(bit&7);
clear:
map[bit>>3]&=255^(1<<(bit&7));
where bit is the bit you want to access. As you can see there is masking and bit shifts needed.
BYTE map access
this can be accessed directly on most architectures
readed_byte=map[byte];
set:
map[byte]=1;
clear:
map[byte]=0;
Where byte is BYTE you want to access. As you can see the memory space is wasted if just boolean value is stored in single BYTE.
So unless you got specific HW designed to work with bit maps and planes then BYTE maps are faster ... but to every rule there is an exception so there are algorithms where you already got the masked address and bit masks in such cases bit maps are faster or as fast as byte maps ...

Why java socket use Byte as base structure rather then Bit?

I writing a protocol use Bit to represent Boolean, but java network not use the smaller structure. I want to know why network data designed as byte rather then bit? expose atom structure isn't better?
Because the fundamental packets of IP are defined in terms of bytes rather than bits. All packets are a whole number of bytes rather than bits. Even bit fields are part of a multi-byte field. Bytes, rather than bits, are fundamental to IP.
This is ultimately because computer memory does not have addressable bits, but rather has addressable bytes. Any implementation of a networking protocol would have to be based on bytes, not bits. And that that is also why Java does not provide direct access to bits.
The network bandwidth saving that could be achieved by carrying single bits of payload, compared to the added complexity both at the hardware and software level, simply does not worth it.
Fundamentally, both at hardware level (registers) and software level, the minimal unit for data handling is byte, 8 bits (or octet, if you want to be nitpicking) or multiple of that. You cannot address memory at the bit level, only at the multiple of a byte level. Doing otherwise would be very complicated, down to the silicium level, without added value.
Whatever the programing language, when you declare and use a boolean, a byte (or a power of 2 multiple number of bytes, why not as long ass I can load it from memory to a CPU register) will actually be used to store it and the language will take care that only 2 cases when using it: is this byte all 0 bits, or not. At the machine code/assembly level: load this byte from its memory address to register FOO, or multiple bytes (if for example 32 bits wide register), cmp FOO to 0, depending on the result JE (Jump If Egal) to code address BAR, else go on with next machine code line. Or JNE (Jump if Not Equal) to such other code address. So your Javan boolean is not actually stored as a bit. It's, at minimum, a byte.
Even the good old Ethernet frame, not even looking at the actual useful payload, starts by a 56-bit preamble to synchronize devices. 56 bits is 7 bytes. Could the synchronization be done with less than that? Not a number of bytes? Maybe, but that does not worth the effort.
https://en.wikipedia.org/wiki/Ethernet_frame#Preamble_and_start_frame_delimiter
Pedantic edit for nitpickers:
A language such as C have a bit field facility:
https://en.wikipedia.org/wiki/Bit_field
...but don't be fooled, the minimal storage unit at the silicum level for a bit from a bit field will still be a byte. Hence the "field" in "bit fields".

Does using small datatypes reduce memory usage (From memory allocation not efficiency)?

Actually, my question is very similar with this one, but the post is focus on the C# only. Recently I read an article said that java will 'promote' some short types (like short) to 4 bytes in memory even if some bits are not used, so it can't reduce usage. (is it true ?)
So my question is how languages, especially C, C++ and java (as Manish said in this post talked about java), handles memory allocation of small datatypes. References or any approaches to figure out it are preferred. Thanks
C/C++ uses only the specified amount of memory but aligns the data (by default) to an address that is a multiple of some value, typically 4 bytes for 32 bit applications or 8 bytes for 64 bit.
So for example if the data is aligned on a 4 or 8 byte boundary then a "char" uses only one byte. An array of 5 chars will use 5 bytes. But the data item that is allocated after the 5 byte char array is placed at an address that skips 3 bytes to keep it correctly aligned.
This is for performance on most processors. There are usually pragmas like "pack" and "align" that can be used to change the alignment or disable it.
In C and C++, different approaches may be taken depending on how you've requested the memory.
For T* p = (T*)malloc(n * sizeof(T)); or T* p = new T[n]; then the data will occupy sizeof(T)*n bytes of memory, so if sizeof(T) is reduced (e.g. to int16_t instead of int32_t) then that space is reduced accordingly. That said, heap allocations tend to have some overheads, so few large allocations are better than a great many allocations for individual data items or very small arrays, where the overheads may be much more significant than small differences in sizeof(T).
For structures, static and stack usage, padding is more significant than for large arrays, as the following data item might be of a different type with different alignment requirements, resulting in more padding.
At the other extreme, you can apply bitfields to effectively pack values into the minimum number of bits they need - very dense compression indeed, though you need to rely on compiler pragmas/attributes if you want explicit control - the Standard leaves it unspecified when a bitfield might start in a new memory "word" (e.g. 32 bit memory word for a 32 bit process, 64 for 64) or wrap across separate words, where in the words the bits hold data vs padding etc.). Data types like C++ bitsets and vector<bool> may be more efficient than arrays of bool (which may well use an int for each element, but it's unspecified in the C++03 Standard).`

How Does Java Interpret Bytecode Constants Larger Than One Byte, Unaligned Efficiently?

While all java byte codes are 1 byte wide, at points there are variable sized constants which could range from 1 byte to 8 bytes. How does java fetch these instructions with operands larger than one byte efficiently since this data access would be done unaligned. And lastly how does java virtual machine preform this operation on platforms that do not support unaligned data access (ex ARM,Alpha)?
It cannot be done efficiently. The solutions are (as you are likely to be aware of):
micro-assembler if feasibly (not ARM), reprogramming the CPU;
load time transformation: a better, faster interpreted byte code (unlikely);
load time transformation: a simple compilation;
extending the Just-In-Time compilation;
pre-compilation.
Mind, that the interpretation overhead of byte code is not that much higher than of word code, especially as the interpretation cycle itself has the largest overhead.
Nevertheless I did some work on older processors, with customizable optimizable interpreters, and there it helped.
The GNU java commpiler might be mentioned.

Who decides data types size in Java

Who decides the size of data types such as int in Java? JVM or OS or Processor?
int size is 4 bytes..Will it be always 4 bytes irrespective of OS or processor?
The Java Language Specification decides them. They're the same size on all VMs, on all OSes, on all processors. If they're not, it's not Java anymore.
It is the JVM specification that drives JVM implementations to decide the size of data-types. Refer http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-2.html#jvms-2.3
While the Java spec decides how many bits each type actually uses, the default 32 bit JVM actually pads some types in memory, using 32 bits of space to store values, even ones that don't need that much space. They still all behave (regarding the program), as if they took up their real amount of storage, but the amount of space used is still much larger.
Other JVMs can do this differently, for instance they could store an array of booleans with only one bit per value, rather than 32 bits per value.
So while the size of an int will always appear to be 32 bits, because it will always wrap around at 32 bits and can only hold 32 bit values, the JVM does have the option of internally storing it using 64 bits of space, or some other number depending on the hardware and implementation.
Also, floating point values are actually much less restricted in this regard - while the spec for float and double does indeed require they use a particular format, the presence of the two different libraries java.lang.Math and java.lang.StrictMath (see What's the difference between java.lang.Math and java.lang.StrictMath? for an explanation) is a clear example. Using java.lang.Math does not require that the intermediate calculations for those functions be stored in any particular way, or that it use a particular number of bits to compute everything.

Categories