What is the best way to extract bit fields in Java prior to 1.7? I know I can use bitwise operations, but isn't it better methods?
I need to extract the portion of bit array with endianness control as integer values.
Related
While going through the ways of converting primitive arrays to Streams, I found that char[] are not supported while other primitive array types are supported. Any particular reason to leave them out in the stream?
Of course, the answer is "because that's what the designers decided". There is no technical reason why CharStream could not exist.
If you want justification, you usually need to turn the the OpenJDK mailing list*. The JDK's documentation is not in the habit of justifying why anything is why it is.
Someone asked
Using IntStream to represent char/byte stream is a little
inconvenient. Should we add CharStream and ByteStream as well?
The reply from Brian Goetz (Java Language Architect) says
Short answer: no.
It is not worth another 100K+ of JDK footprint each for these forms
which are used almost never. And if we added those, someone would
demand short, float, or boolean.
Put another way, if people insisted we had all the primitive
specializations, we would have no primitive specializations. Which
would be worse than the status quo.
Source
He also says the same elsewhere
If you want to deal with them as chars, you can downcast them to
chars easily enough. Doesn't seem like an important enough use case
to have a whole 'nother set of streams. (Same with Short, Byte,
Float).
Source
TL;DR: Not worth the maintenance cost.
*In case you're curious, the google query I used was
site:http://mail.openjdk.java.net/ charstream
As Eran said, it's not the only one missing.
A BooleanStream would be useless, a ByteStream (if it existed) can be handled as an InputStream or converted to IntStream (as can short), and float can be handled as a DoubleStream.
As char is not able to represent all characters anyway (see linked), it would be a bit of a legacy stream. Although most people don't have to deal with codepoints anyway, so it can seem strange. I mean you use String.charAt() without thinking "this doesn't actually work in all cases".
So some things were left out because they weren't deemed that important. As said by JB Nizet in the linked question:
The designers explicitly chose to avoid the explosion of classes and
methods by limiting the primitive streams to 3 types, since the other
types (char, short, float) can be represented by their larger
equivalent (int, double) without any significant performance penalty.
The reason BooleanStream would be useless, is because you only have 2 values and that limits the operations a lot. There's no mathematical operations to do, and how often are you working with lots of boolean values anyway?
As can be seen from the comments, a BooleanStream is not needed. If it were, there would be a lot of actual use cases instead of theoretical situations, a use case going back to Java 1.4, and a fallacious comparison to while loop.
It's not only char arrays that are not supported.
There are only 3 types of primitive streams - IntStream, LongStream and DoubleStream.
As a result, Arrays has methods that convert int[], long[] and double[] to the corresponding primitive streams.
There are no corresponding methods for boolean[], byte[], short[], char[] and float[], since these primitive types have no corresponding primitive streams.
char is a dependent part of String - storing UTF-16 values. A Unicode symbol, a code point, is sometimes a surrogate pair of chars. So any simple solution with chars only covers part of the Unicode domain.
There was a time that char had its own right to be a public type. But nowadays it is better to use code points, an IntStream. A stream of char could not straightforwardly handle surrogate pairs.
The other more prosaic reason is that the JVM "processor" model uses an int as smallest "register", keeping booleans, bytes, shorts and also chars in such an int sized storage location. To not necessarily bloat java classes, one refrained from all possible copy variants.
In the far future one might expect primitive types allowed to function as generic type parameters, providing a List<int>. Then we might see a Stream<char>.
For the moment better avoid char, and maybe use java.text.Normalizer for a unique canonical form of code points / Unicode strings.
I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.
I need direction to get started with a project. I have a binary file. Out of this file, I need to parse data. Now, I know length of each data; however, there are a few fields that are tricky. For example, there is an 8-bit field, in which the 4 least significant bits and 4 most significant bits represent different data. Are there any built libs available? Or do I have to use java bytebuffer and use bitwise operations?
Bitwise Operators: http://leepoint.net/notes-java/data/expressions/bitops.html
I need to convert a piece of code from Objective-C to Java, but I have a problem understanding the Primitive Types in Objective-C. So I had their data types in my Objective-C code :
UInt64, Uint32, UInt8 ,
which are unsigned integers (as I understand from internet). So my question is, can I use Java primitive types like byte (8bit) - instead of UInt8, int (32bit) - instead of UInt32, and long (64bit) - instead of UInt64.
Unfortunately, it isn't a straight translation and without knowing more about your program, its hard to suggest what the "right" approach is.
If your UInt8 values really range from 0-255, you may have to use Java signed int to be able to hold the entire range.
If you are dealing with byte streams or memory layouts and really need to use just a single byte of memory, than you could try byte, but you may have to test and handle cases to handle when the high-bit is set (value > 127). Ditto with the other unsigned types.
Ideally, if your code just kind of "defaulted" to the unsigned types, but really the signed versions would have worked fine too (i.e. the ranges of your values never equal or exceed 2^7, 2^15, or 2^31 respectively), then you may be fine with the "straight" translation to byte, int, and long.
Yes, those are the correctly sized data types to use in Java. Make sure you take into account that Java does not have unsigned types and the trick is to use the next largest size. 64 bit unsigned arithmetic requires special consideration.
My bit masks are bytes, and I'd like to keep them exactly as they are, but I think they're sign extended. I don't care if the byte is considered positive or negative, as long as it has the same bits set. I just spent a few hours debugging my code, and then I found I'm only having a problem with my byte bit masks when they happen to be negative, it took a while to find out. I can't be the only one who's had a problem with this. Is there a way to make a byte behave as if it was unsigned?
If you don't want a byte to sign extend when you use it in arithmetic (or bitwise) operators, you need to explicitly bitwise-and it with 0xFF. It looks slightly ugly but is unavoidable if what you have is a byte (and hopefully a decent JIT will be able to recognize the idiom and make efficient code out of it anyway).
Do you have right-shift in your code? do you use '>>' instead of '>>>'? There is your problem.