Does python have an equivalence to java's Byte.MAX_VALUE representing the max byte? I had a look at python sys module, I only managed to find sys.maxint. Does it have anything like sys.maxbyte?
UPDATE:
In my case, I am doing a Hbase Rowkey scan, My rowkey looks like rk1_rk2. In order to scan all results for rk1 without knowing exact rk2, My java code looks like:
byte[] startRowBytes = "rk1".getBytes();
byte[] endRowBytes = ("rk1" + (char) Byte.MAX_VALUE).getBytes();
HbaseScanQuery query = new HbaseScanQuery(tableName, colFamily);
query.setStartRow(startRowBytes).setStopRow(endRowBytes);
I am just trying to work out the python equivalence of Byte.MAX_VALUE part.
I think you will have to define the value yourself. A byte has 2^8 = 256 unique states and so the largest integer it can represent is 255. java's byte type, however, is a signed byte, so half the states are reserved for positives(and 0) and the other half is used for negatives. therefore the the equivalent of java's Byte.MAX_VALUE is 127, and the equivalent of java's Byte.MIN_VALUE is -128
Since python bytes are unsigned, the equivalent of java's Byte.MIN_VALUE would be 128 which is the representation of -128 in 2's compliment notation(the defacto standard for representing signed integers) thanks to Ignacio Vazquez-Abrams for pointing that out.
I haven't dealt with python in a while, but i believe what you want is ("rk1"+chr(127))
Given your update, there is an even better answer: Don't worry about what the max byte value is. According to the HBase documentation, the setStartRow and setStopRow methods work just like Python's slicing; namely, the start is inclusive, but the stop is exclusive, meaning your endRowBytes should simply be 'rk2'.
Also, the documentation mentions that you can make the stop row inclusive by adding a zero byte, so another alternative is 'rk1' + chr(0) (or 'rk1\0' or 'rk1\x00', whichever is clearest to you). In fact, the example used to explain HBase scans in the linked documentation illustrates exactly your use case.
Related
For example...
Apply the hexadecimal "DE82C38142C69491" to Oracle TO_NUMBER
The results are as follows.
select TO_NUMBER('DE82C38142C69491', 'XXXXXXXXXXXXXXXX') from dual
/*
result :
16033592583330894993
*/
I tried this in Java and the code is as below.
Long.parseUnsignedLong("DE82C38142C69491", 16);
/*
result :
-2413151490378656623
*/
I'm understanding something wrong.
Is there a way to use Oracle's TO_NUMBER in Java?
The hexadecimal number DE82C38142C69491 is -2413151490378656623.
Java longs are signed, and D has the top bit set, so DE82C38142C69491 represents a negative number.
Run System.out.println(Long.toHexString(-2413151490378656623L)); and you'll see you get DE82C38142C69491 back.
So you have all the bits of your original hex number.
You can use new BigInteger("DE82C38142C69491", 16); which will get you a BigInteger containing 16033592583330894993, but -2413151490378656623 already contains all your bits.
Java's long is a signed 64-bit integer type supporting values in the range of -263 to 263-1.
In Java 8 however, some methods were added to the type long for unsigned handling, for example Long.compareUnsigned(long x, long y). While the older methods will continue to consider long values as signed, these new ones work with a range of 0 to 264-1. The documentation states:
In Java SE 8 and later, you can use the long data type to represent an
unsigned 64-bit long, which has a minimum value of 0 and a maximum
value of 264-1.
It's just a different interpretation. For that reason...
System.out.println(Long.parseUnsignedLong("DE82C38142C69491", 16));
...which is effectively calling Long.toString(), will output:
-2413151490378656623
But if you do...
System.out.println(Long.toUnsignedString(Long.parseUnsignedLong("DE82C38142C69491", 16)));
...you will get:
16033592583330894993
That's because 16033592583330894993 is bigger than 263-1 but also smaller than 264-1.
Still, if you're dealing with numbers that big, you're probably better off using BigInteger.
In the Random class, define a nextByte method that returns a value of the primitive type
byte. The values returned in a sequence of calls should be uniformly distributed over all the
possible values in the type.
In the Random class, define a nextInt method that returns a value of the primitive type
int. The values returned in a sequence of calls should be uniformly distributed over all the possible
values in the type.
(Hint: Java requires implementations to use the twos-complement representation for integers.
Figure out how to calculate a random twos-complement representation from four random byte
values using Java’s shift operators.)
Hi I was able to do part 3 and now I need to use 3. to solve 4. but I do not know what to do. I was thinking of using nextByte to make an array of 4 bytes then would I take twos complement of each so I wouldn't have negative numbers and then I would put them together into one int.
byte[] bytes = {42,-15,-7, 8} Suppose nextByte returns this bytes.
Then I would take the twos complement of each which i think would be {42, 241, 249, 8}. Is this what it would look like and why doesn't this code work:
public static int twosComplement(int input_value, int num_bits){
int mask = (int) Math.pow(2, (num_bits - 1));
return -(input_value & mask) + (input_value & ~mask);
}
Then I would use the following to put all four bytes into an int, would this work:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Please be as specific as possible.
The assignment says that Java already uses two's complement integers. This is a useful property that simplifies the rest of the code: it guarantees that if you group together 32 random bits (or in general however many bits your desired output type has), then this covers all possible values exactly once and there are no invalid patterns.
That might not be true of some other integer representations, which might only have 2³²-1 different values (leaving an invalid pattern that you would have to avoid) or have 2³² valid patterns but both a "positive" and a "negative" zero, which would cause a random bit pattern to have a biased "interpreted value" (with zero occurring twice as often as it should).
So that it not something for you to do, it is a convenient property for you to use to keep the code simple. Actually you already used it. This code:
int i= (bytes[0]<<24)&0xff000000|
(bytes[1]<<16)&0x00ff0000|
(bytes[2]<< 8)&0x0000ff00|
(bytes[3]<< 0)&0x000000ff;
Works properly thanks to those properties. By the way it can be simplified a bit: after shifting left by 24, there is no more issue with sign-extension, all the extended bits have been shifted out. And shifting left by 0 is obviously a no-op. So (bytes[0]<<24)&0xff000000 can be written as (bytes[0]<<24), and (bytes[3]<< 0)&0x000000ff as bytes[3]&0xff. But you can keep it as it was, with the nice regular structure.
The twosComplement function is not necessary.
I'm reading a file format that specifies some types are unsigned integers and shorts. When I read the values, I get them as a byte array. The best route to turning them into shorts/ints/longs I've seen is something like this:
ByteBuffer wrapped = ByteBuffer.wrap(byteArray);
int x = wrapped.getInt();
That looks like it could easily overflow for unsigned ints. Is there a better way to handle this scenario?
Update: I should mention that I'm using Groovy, so I absolutely don't care if I have to use a BigInteger or something like that. I just want the maximum safety on keeping the value intact.
A 32bit value, signed or unsigned, can always be stored losslessly in an int*. This means that you never have to worry about putting unsigned values in signed types from a data safety point of view.
The same is true for 8bit values in bytes, 16bit values in shorts and 64bit values in longs.
Once you've read an unsigned value into the corresponding signed type, you can promote them to signed values of a larger types to more easily work with the intended value:
Integer.toUnsignedLong(int)
Short.toUnsignedInt(short)
Byte.toUnsignedInt(byte)
Since there's no primitive type larger than long, you can either go via BigInteger, or use the convenience methods on Long to do unsigned operations:
BigInteger.valueOf(Long.toUnsignedString(long))
Long.divideUnsigned(long,long) and friends
* This is thanks to the JVM requiring integer types to be two's complement.
To hold an unsigned int/short/byte, you need to use the next "bigger" type, i.e. long/int/short. If you already hold the value in the signed type that can overflow, the conversion can be done by doing the following:
int unsignedVal = byteVal & 0xff
If you just cast them, the negative-bit will be regarded and you will still end up with the negative value.
If you have to handle unsigned longs you need to "switch" to java.math.BigInteger.
Unsigned primitives are a pain in Java.
There's no clean way of handing them, except using larger types with more bits, and taking care to avoid automatic sign extension when casting.
In your case, you can do something like this:
ByteBuffer wrapped = ByteBuffer.wrap(byteArray);
int signedInt = wrapped.getInt();
long unsigned = signedInt & 0xffffffffL;
I usually write the required conversion(s) in a utility class someplace, since they're easy to get wrong. If you copy & paste that one liner conversion everywhere, eventually one will be wrong.
Note that if you need unsigned longs, the only larger type is BigInteger.
If you need anything more than simple conversions, I suggest using Guava since it has some nice classes for dealing with unsigned types. See documentation here.
I'm trying to create a Java program that writes files for my Adruino to read. The Arduino is a simple 8 bit microcontroller board, and with some extra hardware, can read text files from SD cards, byte by byte.
Turns out this was a whole lot harder than I thought. Firstly, there are no unsigned values in Java. Not even bytes for some reason! Even trying to set a byte to 0xFF gives a possible loss of precision error! This isn't very useful for this low-level code..
I would use ints and only use the positive values, but I like using byte overflow to my advantage in a lot of my code (though I could probably do this with modulus right after the math operation or something) and the biggest problem of all is I have no idea how to add an int as an 8 bit character to a String that gets written to a file later. Output is currently my biggest problem.
So, what would be the best way to do unsigned bit math based on some user input and then write those bits to a file as if each one was an ASCII character?
So, here's how it works.
You can treat Java bytes as unsigned. The only places where signs make a difference are
constants: just cast them to bytes
toString and parseInt
division
<, >, >=, <=
Operations where signedness does not matter:
addition
subtraction
multiplication
bit arithmetic (except for >>, just use >>> instead)
To convert bytes to their unsigned values as ints, just use & 0xFF, and to convert those to bytes use (byte).
Alternatively, if third-party libraries are acceptable, you might be interested in Guava's UnsignedBytes utility class. (Disclosure: I contribute to Guava.)
I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.