What is the difference between Unsafe.putAddress(long address, long x) method and Unsafe.putLong(long address, long x) method?
The javadoc is pretty clear on that. For putAddress() it says:
The number of bytes actually written at the target address may be determined by consulting #addressSize.
Whereas putLong() puts all bits of the long value.
In other words: an address might consume all bits of a long value, but doesn't necessarily have to! And then writing 64 bit to somewhere in memory is most likely not a good idea!
Therefore you need to able to distinguish these two use cases (writing n bits out of 64 bits and writing exactly 64 bits).
But to be specific, the javadoc for address size says:
Report the size in bytes of a native pointer ... This value will be either 4 or 8.
So I guess, for all practical purpose, the two methods do the same. Because these days, (almost?!) all exist JVMs A) implement these methods and B) are 64 bit JVMs. ( so I assume that a 32 bit JVM would return 4 instead of 8 )
Related
I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.
I was wondering if there is a difference in the memory occupied by
Integer n, and int n.
I know int n occupies 4 bytes normally, how about Integer n
In general, the heap memory used by a Java object in Hotspot consists of:
an object header, consisting of a few bytes of "housekeeping" information;
memory for primitive fields, according to their size (int n->32 bits)
memory for reference fields (4 bytes each) (Integer n ->32 bits)
padding: potentially a few "wasted" unused bytes after the object data, to make every object start at an address that is a convenient multiple of bytes and reduce the number of bits required to represent a pointer to an object.
as per the suggestion of Mark Peters I would like add the link below
http://www.javamex.com/tutorials/memory/object_memory_usage.shtml
An Integer object in Java occupies 16 bytes.
I don't know whether running a 64- vs 32-bit JVM makes a difference. For primitive types, it does not matter. But I can not say for certain how the memory footprint of an object changes (if at all) under a 64-bit system.
You can test this for yourself here:
Java Tip 130: Do you know your data size?
int is a primitive data type which takes 32 bits(4 bytes) to store.
When your Java code uses the new operator to create an instance of a Java object, much more
data is allocated than you might expect.
For example, it might surprise you to know that the size ratio of an int value to an Integer object — the smallest object that can hold an int value — is
typically 1:4.
Integer is an object which takes 128 bits (16 bytes) to store int value.
When we creates new Integer using new Operator it allocates memory as per follows.
Class Object(32 bits) - which consist of a pointer to the class information, which describes the object in our case its point to java.lang.Integer class
Flags (32 bits)- It is collection of flags that describes the state of object.
Like is it has hash-code, is it array or not i.e. its Shape.
Lock (32 bits) - It stores synchronization information of object. whether the object currently synchronized or not.
Above 3 points are called as metadata of an Object.
Lastly metadata is followed by the Object data (32 bits) itself. In case of Integer its single int value.
All the above explanation is as per 32 bit processor architecture. It can differ from JVM version and vendor.
For int: 4 bytes used per element without wrappers, and 16 per element with a wrapper.
A wrapped double reports as 24 bytes per element, with the actual double value as 64 bits (8 bytes).
For more details here
In the guidelines to write a good hashCode() written in Effective java, the author mentions the following step if the field is long.
If the field is a long, compute (int) (f ^ (f >>> 32)).
I am not able to get why this is done. Why are we doing this ?
In Java, a long is 64-bit, and an int is 32-bit.
So this is simply taking the upper 32 bits, and bitwise-XORing them with the lower 32 bits.
Because hasCode is 32-bit integer value and long 64-bit. You need hashCode to differ for values with same lower 32-bit for each long and this function should ensure it.
Just to be clear, you're hashing a 64-bit value into a 32-bit one. Also, a good hash function will produce an even distribution of values (for hopefully obvious reasons!).
You could ignore half the bits, but that would leave you with half the possible values producing one single. So, you want to take all the bits into account somehow when producing the hashcode.
Options for mashing the bits together are: AND, OR, XOR. If you think about it, AND and OR aren't going to produce an even distribution of values at all. XOR does, so it's the only good choice.
hashCode returns an int not long. A good hashCode algorithm tries to have different values for different inputs.
Given Java's "write once, run anywhere" paradigm and the fact that the Java tutorials give explicit bit sizes for all the primitive data types without the slightest hint that this is dependent on anything, I would say that, yes, an int is always 32 bit.
But are there any caveats? The language spec defines the value range, but says nothing about the internal representation, and I guess that it probably shouldn't. However, I have some code which does bitwise operations on int variables that assume 32 bit width, and I was wondering whether that code is safe on all architectures.
Are there good in-depth resources for this type of question?
Java code always works as though ints are 32-bit, regardless of the native architecture.
In the specification, there's also a part that is definitive about representation:
The integral types are byte, short, int, and long, whose values are 8-bit, 16-bit, 32-bit and 64-bit signed two's-complement integers, respectively, and char, whose values are 16-bit unsigned integers representing UTF-16 code units
While the behaviour of Java's primitives is specified completely and exactly in the language spec, there is one caveat: on a 64bit architetcture, it's possible that ints will be word-aligned, which means that an array of ints (or any non-64bit primitive type) could take twice as much memory as on a 32bit achitecture.
you may be check also the JVM specs: each bitwise operation gets it's opcode (ISHL, IOR, IAND, etc)
Yes, there is no sizeof operator in Java.
According to Bruce Eckel's Thinking in Java:
short: 16 bits
int: 32 bits
long : 64 bits
These values don't vary between architectures.
This isn't an answer because there is already a good answer but I thought I'd point out that the reason this is so for Java but it wasn't for C or C++ is that Java compiles to a virtual machine (the Java VM or JVM). Because the JVM runs the same bytecode and has the same internal structure no matter which machine it is on, it seems to have the same size for primitive types on every machine. C and C++ did not try to emulate any particular behaviors and were subject to the whims of processor implementations on a variety of machines.
So I'm learning java, and I have a question. It seems that the types int, boolean and string will be good for just about everything I'll ever need in terms of variables, except perhaps float could be used when decimal numbers are needed in a number.
My question is, are the other types such as long, double, byte, char etc ever used in normal, everyday programming? What are some practical things these could be used for? What do they exist for?
With the possible exception of "short", which arguably is a bit of a waste of space-- sometimes literally, they're all horses for courses:
Use an int when you don't need fractional numbers and you've no reason to use anything else; on most processors/OS configurations, this is the size of number that the machine can deal with most efficiently;
Use a double when you need fractional numbers and you've no reason to use anything else;
Use a char when you want to represent a character (or possibly rare cases where you need two-byte unsigned arithmetic);
Use a byte if either you specifically need to manipulate a signed byte (rare!), or when you need to move around a block of bytes;
Use a boolean when you need a simple "yes/no" flag;
Use a long for those occasions where you need a whole number, but where the magnitude could exceed 2 billion (file sizes, time measurements in milliseconds/nanoseconds, in advanced uses for compacting several pieces of data into a single number);
Use a float for those rare cases where you either (a) are storing a huge number of them and the memory saving is worthwhile, or (b) are performing a massive number of calculations, and can afford the loss in accuracy. For most applications, "float" offers very poor precision, but operations can be twice as fast -- it's worth testing this on your processor, though, to find that it's actually the case! [*]
Use a short if you really need 2-byte signed arithmetic. There aren't so many cases...
[*] For example, in Hotspot on Pentium architectures, float and double operations generally take exactly the same time, except for division.
Don't get too bogged down in the memory usage of these types unless you really understand it. For example:
every object size is rounded to 16 bytes in Hotspot, so an object with a single byte field will take up precisely the same space as a single object with a long or double field;
when passing parameters to a method, every type takes up 4 or 8 bytes on the stack: you won't save anything by changing a method parameter from, say, an int to a short! (I've seen people do this...)
Obviously, there are certain API calls (e.g. various calls for non-CPU intensive tasks that for some reason take floats) where you just have to pass it the type that it asks for...!
Note that String isn't a primitive type, so it doesn't really belong in this list.
A java int is 32 bits, while a long is 64 bits, so when you need to represent integers larger than 2^31, long is your friend. For a typical example of the use of long, see System.currentTimeMillis()
A byte is 8 bits, and the smallest addressable entity on most modern hardware, so it is needed when reading binary data from a file.
A double has twice the size of a float, so you would usually use a double rather than a float, unless you have some restrictions on size or speed and a float has sufficient capacity.
A short is two bytes, 16 bits. In my opinion, this is the least necessary datatype, and I haven't really seen that in actual code, but again, it might be useful for reading binary file formats or doing low level network protocols. For example ip port numbers are 16 bit.
Char represents a single character, which is 16 bits. This is the same size as a short, but a short is signed (-32768 to 32767) while a char is unsigned (0 to 65535). (This means that an ip port number probably is more correctly represented as a char than a short, but this seems to be outside the intended scope for chars...)
For the really authorative source on these details, se the java language specification.
You can have a look here about the primitive types in Java.
The main interest between these types are the memory usage. For example, int uses 32bits while byte only uses 8bits.
Imagine that you work on large structure (arrays, matrices...), then you will better take care of the type you are using in order to reduce the memory usage.
I guess there are several purposes to types of that kind:
1) They enforce restrictions on the size (and sign) of variables that can be stored in them.
2) They can add a bit of clarity to code (e.g. if you use a char, then anyone reading the code knows what you plan to store in it).
3) They can save memory. if you have a large array of numbers, all of which will be unsigned and below 256, you can declare it as an array of bytes, saving some memory compared with if you declared an array of ints.
4) You need long if the numbers you need to store are larger than 2^32 and a double for very large floating point numbers.
The primitive data types are required because they are the basis of every complex collection.
long, double, byte etc. are used if you need only a small integer (or whatever), that does not waste your heap space.
I know, there's enough of RAM in our times, but you should not waste it.
I need the "small ones" for database and stream operations.
Integers should be used for numbers in general.
Doubles are the basic data type used to represent decimals.
Strings can hold essentially any data type, but it is easier to use ints and is confusing to use string except for text.
Chars are used when you only wish to hold one letter, although they are essentially only for clarity.
Shorts, longs, and floats may not be necessary, but if you are, for instance, creating an array of size 1,00000 which only needed to hold numbers less than 1,000, then you would want to use shorts, simply to save space.
It's relative to the data you're dealing with. There's no point using a data type which reserves a large portion of memory when you're only dealing with a small amount of data. For example, a lot of data types reserve memory before they've even been used. Take arrays for example, they'll reserve a default amount (say, 256 bytes <-- an example!) even if you're only using 4 bytes of that.
See this link for your answer