Why are these signed bytes being read as unsigned bytes with LWJGL?

Why are these signed bytes being read as unsigned bytes with LWJGL? - java

When you upload a ByteBuffer (java lang object) which stores signed bytes, with the LJWGL function, glBufferData(), it turns out the correct way for openGL to interpret the data on the corresponding buffer is with GL_UNSIGNED_BYTE.
Why is this? LWJGL does not seem to be converting the ByteBuffer to some other format, here is the source for the glBUfferData() function.
public static void glBufferData(int target, ByteBuffer data, int usage) {
ContextCapabilities caps = GLContext.getCapabilities();
long function_pointer = caps.glBufferData;
BufferChecks.checkFunctionAddress(function_pointer);
BufferChecks.checkDirect(data);
nglBufferData(target, data.remaining(), MemoryUtil.getAddress(data), usage, function_pointer);
}
Any idea why?
Edit:
I see why you guys may think there needs to be no conversion because unsigned bytes and bytes are stored the same way. But let me clarify, I put integral values of 1 2 3 4 5, etc, into this bytebuffer, presumably as signed bytes because that's what java handles. So these bytes are storing 12345 when using a signed interpretation, presumably. So why does openGL read 12345 with a unsigned interpretation instead of the signed interpretation, is the question.
note that the significance of the data is an index buffer.

To begin with, do not use GL_UNSIGNED_BYTE for vertex buffer indices. OpenGL supports this at the API level, but desktop GPU hardware manufactured in the past ~14 years generally do not support it at the hardware level. The driver will convert the indices to 16-bit in order to satisfy hardware constraints, so all you are actually doing is increasing the work-load on your driver. GL_UNSIGNED_SHORT is really the smallest index size you should use if you do not want to unnecessarily burden your driver. What it boils down to is unaligned memory access, you can use 8-bit indices if you want, but you will get better vertex performance if you use 16/32-bit instead.
To address the actual issue in this question, you are using GL_UNSIGNED_BYTE to interpret the vertex indices and in this case the range of the data type is irrelevant for values < 128. GL_UNSIGNED_BYTE vs. GL_SIGNED_BYTE really only matters for interpreting color values, because GL does fixed-point scaling in order to re-map the values from [-128, 127] -> [-1.0, 1.0] (signed) or [0, 255] -> [0.0, 1.0] (unsigned) for internal representation.
In the case of a vertex index, however, the number 5 is still 5 after it is converted from unsigned to signed or the other way around. There is no fixed-point to floating-point conversion necessary to interpret vertex indices, and so the range of values is not particularly important (assuming no overflow).
To that end, you have no choice in the matter when using vertex indices. The only valid enums are GL_UNSIGNED_BYTE, GL_UNSIGNED_SHORT and GL_UNSIGNED_INT. If your language cannot represent unsigned values, then the language binding for OpenGL will be responsible for figuring out exactly what these enums mean and how to handle them.

The main difference between signed and unsigned bytes is how you interpret the bits: negative values have the same bit patterns as values over 127. You don't need different types of storage for the two, and the conversion (which is really a no-op) works automatically using the two's complement system.

Related

Combining elements of a byte[] array into 16-bit numbers

This is an excerpt of code from a music tuner application. A byte[] array is created, audio data is read into the buffer arrays, and then the for loop iterates through buffer and combines the values at indices n,n+1, to create an array of 16-bit numbers that is half the length.
byte[] buffer = new byte[2*1200];
targetDataLine.read(buffer, 0, buffer.length)
for ( int i = 0; i < n; i+=2 ) {
int value = (short)((buffer[i]&0xFF) | ((buffer[i+1]&0xFF) << 8)); //**Don't understand**
a[i >> 1] = value;
}
So far, what I have is this:
From a different SO post, I learned that every byte being stored in a larger type must be & with 0xFF, due to its conversion to a 32-bit number. I guess the leading 24 bits are filled with 1s (though I don't know why it isn't filled with zeros... wouldn't leading with 1s change the value of the number? 000000000010 (2) is different from 111111110010 (-14), after all.), so the purpose of 0xff is to only grab the last 8 bits (which is the whole byte).
When buffer[i+1] is shifted left by 8 bits, this makes it so that, when ORing, the eight bits from buffer[i+1] are in the most significant positions, and the eight bits from buffer[i] are in the least significant eight bits. We wind up with a 16-bit number that is of the form buffer[i+1] + buffer[i]. (I'm using + but I understand it's closer to concatenation.)
First, why are we ORing buffer[i] | buffer[i+1] << 8? This seems to destroy the original sound information unless we pull it back out in the same way; while I understand that OR will combine them into one value, I don't see how that value can be useful or used in calculations later. And the only way this data is accessed later is as its literal values:
diff += Math.abs(a[j]-a[i+j];
If I have 101 and 111, added together I should get 12, or 1100. Yet 101 | 111 << 3 gives 111101, which is equal to 61. The closest I got to understanding was that 101 (5) | 111000 (56) is the same as adding 5+56=61. But the order matters -- doing the reverse 101 <<3 | 111 is completely different. I really don't understand how the data can remain useful, when it is OR'd in this way.
The other problem I'm having is that, because Java uses signed bytes, the eighth position doesn't indicate the value, but the sign. If I'm ORing two binary signed numbers, then in the resulting 16-bit number, the bit at 2⁷ is now acting as a value instead of a placeholder. If I had a negative byte before running the OR, then in my final value post-operation, it would now erroneously be acting as though the original number had a positive 2⁷ in it. 0xff doesn't get rid of this, because it preserves the eighth, signed byte, so shouldn't this be a problem?
For example, 1111 (-1) and 0101, when OR'd, might give 01011111. But 1111 wasn't representing POSITIVE 1111, it was representing the signed version; yet in the final answer, it now is acting as a positive 2³.
UPDATE: I marked the accepted answer, but it took that + a little extra work to figure out where I went wrong. For anyone who may read this in the future:
As far as the signing goes, the code I have uses signed bytes. My only guess as to why this doesn't mess anything up is because all of the values received might be of positive sign. Except that this doesn't make sense, given a waveform varies amplitude from [-1,1]. I'm going to play around with this to try and figure it out. If there are negative signs, the implementation of code here doesn't seem to remove the 1 when ORing, so I suspect that it doesn't affect the computation too much (given that we're dealing with really large values (diff += means diff will be really large -- a few extra 1s shouldn't hurt the outcome given the code and the comparisons it relies on. So this was all wrong. I gave it some more thought and it's really simple, actually -- the only reason this was such a problem is because I didn't know about big-endian, and then once I read about it, I misunderstood exactly how it is implemented. Endian-ness explained in the next bulletpoint.
Regarding the order in which the bits are placed, destroying the sound, etc. The code I'm using sets bigEndian=false, meaning that the byte order goes from least significant byte to most significant byte. For this reason, combining the two indices of buffer requires taking the second index, placing its bits first, and placing the first index as second (so we are now in big-endian byte order). One of the problems I had was the impression that "endian-ness" determines the bit order. I thought 10010101 big-endian would become 10101001 small-endian. Turns out this is not the case -- the bits in each byte remain in their original order; the difference is that the bytes are ordered "backward". So 10110101 111000001 big-endian becomes 11100001 10110101 -- same bit order within each byte; however, different byte order.
Finally, I'm not sure why, but the accepted answer is correct: targetDataLine.read() may place the bits into a byte array only (not just in my code, but in all Java code using targetDataLine -- read() only accepts arguments where the destination var is a byte array), but the data is in fact one short split into two bytes. It is for this reason that every two indices must be combined together.
Coming back to the signing goes, it should be obvious by now why this isn't an issue. This is the commenting that I now have in the code, which more coherently explains what it took all of this^ to explain before:
/* The Javadoc explains that the targetDataLine will only read to a byte-typed array.
However, because the sample size is 16-bit, it is actually storing 16-bit numbers
there (shorts), auto-parsing them every eight bits. Additionally, because it is storing
them in little-endian, bits [2^0,2^7] are stored in index[i] in normal order (powers 76543210)
while bits [2^8,2^15] are stored in index[i+1]. So, together they currently read as [7-6-5-4-3-2-1-0 15-14-13-12-11-10-9-8],
which is a problem. In the next for loop, we take care of this and re-organize the bytes by swapping every pair (remember the bits are ok, but the bytes are out of order).
Also, although the array is signed, this will not matter when we combine bytes, because the sign-bit (2^15) will be placed
back at the beginning like it normally is; although 2^7 currently exists as the most significant bit in its byte,
it is not a sign-indicating bit,
because it is really the middle of the short which was split. */

This is combining the byte stream from input in low bytes first byte order to a stream of shorts in internal byte order.
With sign extesion it is more a question of the sign encoding of the original byte stream. If the original byte stream is unsigned (coding values from 0 to 255), then the overcomes the then unwanted effects of java treating values as signed. So educated guess is taht the external byte strem encodes unsigned bytes.
Judging whether the code is plausible needs information on what externel encoding is being treated and what internal encoding is used. E.g. (wild guess could be totally wrong!): the two byte junks read coud belong to 2 channels of a stereo sound encoding and are put into a single short for ease of internal processing. You should look at the encoding being read and the use of the converted data within the application.

somehow working with unsigned bytes in Java

I'm trying to create a Java program that writes files for my Adruino to read. The Arduino is a simple 8 bit microcontroller board, and with some extra hardware, can read text files from SD cards, byte by byte.
Turns out this was a whole lot harder than I thought. Firstly, there are no unsigned values in Java. Not even bytes for some reason! Even trying to set a byte to 0xFF gives a possible loss of precision error! This isn't very useful for this low-level code..
I would use ints and only use the positive values, but I like using byte overflow to my advantage in a lot of my code (though I could probably do this with modulus right after the math operation or something) and the biggest problem of all is I have no idea how to add an int as an 8 bit character to a String that gets written to a file later. Output is currently my biggest problem.
So, what would be the best way to do unsigned bit math based on some user input and then write those bits to a file as if each one was an ASCII character?

So, here's how it works.
You can treat Java bytes as unsigned. The only places where signs make a difference are
constants: just cast them to bytes
toString and parseInt
division
<, >, >=, <=
Operations where signedness does not matter:
addition
subtraction
multiplication
bit arithmetic (except for >>, just use >>> instead)
To convert bytes to their unsigned values as ints, just use & 0xFF, and to convert those to bytes use (byte).
Alternatively, if third-party libraries are acceptable, you might be interested in Guava's UnsignedBytes utility class. (Disclosure: I contribute to Guava.)

Why would you need unsigned types in Java?

I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?

Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.

If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.

The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.

Objective-C and Java Primitive Data Types

I need to convert a piece of code from Objective-C to Java, but I have a problem understanding the Primitive Types in Objective-C. So I had their data types in my Objective-C code :
UInt64, Uint32, UInt8 ,
which are unsigned integers (as I understand from internet). So my question is, can I use Java primitive types like byte (8bit) - instead of UInt8, int (32bit) - instead of UInt32, and long (64bit) - instead of UInt64.

Unfortunately, it isn't a straight translation and without knowing more about your program, its hard to suggest what the "right" approach is.
If your UInt8 values really range from 0-255, you may have to use Java signed int to be able to hold the entire range.
If you are dealing with byte streams or memory layouts and really need to use just a single byte of memory, than you could try byte, but you may have to test and handle cases to handle when the high-bit is set (value > 127). Ditto with the other unsigned types.
Ideally, if your code just kind of "defaulted" to the unsigned types, but really the signed versions would have worked fine too (i.e. the ranges of your values never equal or exceed 2^7, 2^15, or 2^31 respectively), then you may be fine with the "straight" translation to byte, int, and long.

Yes, those are the correctly sized data types to use in Java. Make sure you take into account that Java does not have unsigned types and the trick is to use the next largest size. 64 bit unsigned arithmetic requires special consideration.

What is the purpose of long, double, byte, char in Java?

So I'm learning java, and I have a question. It seems that the types int, boolean and string will be good for just about everything I'll ever need in terms of variables, except perhaps float could be used when decimal numbers are needed in a number.
My question is, are the other types such as long, double, byte, char etc ever used in normal, everyday programming? What are some practical things these could be used for? What do they exist for?

With the possible exception of "short", which arguably is a bit of a waste of space-- sometimes literally, they're all horses for courses:
Use an int when you don't need fractional numbers and you've no reason to use anything else; on most processors/OS configurations, this is the size of number that the machine can deal with most efficiently;
Use a double when you need fractional numbers and you've no reason to use anything else;
Use a char when you want to represent a character (or possibly rare cases where you need two-byte unsigned arithmetic);
Use a byte if either you specifically need to manipulate a signed byte (rare!), or when you need to move around a block of bytes;
Use a boolean when you need a simple "yes/no" flag;
Use a long for those occasions where you need a whole number, but where the magnitude could exceed 2 billion (file sizes, time measurements in milliseconds/nanoseconds, in advanced uses for compacting several pieces of data into a single number);
Use a float for those rare cases where you either (a) are storing a huge number of them and the memory saving is worthwhile, or (b) are performing a massive number of calculations, and can afford the loss in accuracy. For most applications, "float" offers very poor precision, but operations can be twice as fast -- it's worth testing this on your processor, though, to find that it's actually the case! [*]
Use a short if you really need 2-byte signed arithmetic. There aren't so many cases...
[*] For example, in Hotspot on Pentium architectures, float and double operations generally take exactly the same time, except for division.
Don't get too bogged down in the memory usage of these types unless you really understand it. For example:
every object size is rounded to 16 bytes in Hotspot, so an object with a single byte field will take up precisely the same space as a single object with a long or double field;
when passing parameters to a method, every type takes up 4 or 8 bytes on the stack: you won't save anything by changing a method parameter from, say, an int to a short! (I've seen people do this...)
Obviously, there are certain API calls (e.g. various calls for non-CPU intensive tasks that for some reason take floats) where you just have to pass it the type that it asks for...!
Note that String isn't a primitive type, so it doesn't really belong in this list.

A java int is 32 bits, while a long is 64 bits, so when you need to represent integers larger than 2^31, long is your friend. For a typical example of the use of long, see System.currentTimeMillis()
A byte is 8 bits, and the smallest addressable entity on most modern hardware, so it is needed when reading binary data from a file.
A double has twice the size of a float, so you would usually use a double rather than a float, unless you have some restrictions on size or speed and a float has sufficient capacity.
A short is two bytes, 16 bits. In my opinion, this is the least necessary datatype, and I haven't really seen that in actual code, but again, it might be useful for reading binary file formats or doing low level network protocols. For example ip port numbers are 16 bit.
Char represents a single character, which is 16 bits. This is the same size as a short, but a short is signed (-32768 to 32767) while a char is unsigned (0 to 65535). (This means that an ip port number probably is more correctly represented as a char than a short, but this seems to be outside the intended scope for chars...)
For the really authorative source on these details, se the java language specification.

You can have a look here about the primitive types in Java.
The main interest between these types are the memory usage. For example, int uses 32bits while byte only uses 8bits.
Imagine that you work on large structure (arrays, matrices...), then you will better take care of the type you are using in order to reduce the memory usage.

I guess there are several purposes to types of that kind:
1) They enforce restrictions on the size (and sign) of variables that can be stored in them.
2) They can add a bit of clarity to code (e.g. if you use a char, then anyone reading the code knows what you plan to store in it).
3) They can save memory. if you have a large array of numbers, all of which will be unsigned and below 256, you can declare it as an array of bytes, saving some memory compared with if you declared an array of ints.
4) You need long if the numbers you need to store are larger than 2^32 and a double for very large floating point numbers.

The primitive data types are required because they are the basis of every complex collection.
long, double, byte etc. are used if you need only a small integer (or whatever), that does not waste your heap space.
I know, there's enough of RAM in our times, but you should not waste it.
I need the "small ones" for database and stream operations.

Integers should be used for numbers in general.
Doubles are the basic data type used to represent decimals.
Strings can hold essentially any data type, but it is easier to use ints and is confusing to use string except for text.
Chars are used when you only wish to hold one letter, although they are essentially only for clarity.
Shorts, longs, and floats may not be necessary, but if you are, for instance, creating an array of size 1,00000 which only needed to hold numbers less than 1,000, then you would want to use shorts, simply to save space.

It's relative to the data you're dealing with. There's no point using a data type which reserves a large portion of memory when you're only dealing with a small amount of data. For example, a lot of data types reserve memory before they've even been used. Take arrays for example, they'll reserve a default amount (say, 256 bytes <-- an example!) even if you're only using 4 bytes of that.
See this link for your answer

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.