So I'm learning java, and I have a question. It seems that the types int, boolean and string will be good for just about everything I'll ever need in terms of variables, except perhaps float could be used when decimal numbers are needed in a number.
My question is, are the other types such as long, double, byte, char etc ever used in normal, everyday programming? What are some practical things these could be used for? What do they exist for?
With the possible exception of "short", which arguably is a bit of a waste of space-- sometimes literally, they're all horses for courses:
Use an int when you don't need fractional numbers and you've no reason to use anything else; on most processors/OS configurations, this is the size of number that the machine can deal with most efficiently;
Use a double when you need fractional numbers and you've no reason to use anything else;
Use a char when you want to represent a character (or possibly rare cases where you need two-byte unsigned arithmetic);
Use a byte if either you specifically need to manipulate a signed byte (rare!), or when you need to move around a block of bytes;
Use a boolean when you need a simple "yes/no" flag;
Use a long for those occasions where you need a whole number, but where the magnitude could exceed 2 billion (file sizes, time measurements in milliseconds/nanoseconds, in advanced uses for compacting several pieces of data into a single number);
Use a float for those rare cases where you either (a) are storing a huge number of them and the memory saving is worthwhile, or (b) are performing a massive number of calculations, and can afford the loss in accuracy. For most applications, "float" offers very poor precision, but operations can be twice as fast -- it's worth testing this on your processor, though, to find that it's actually the case! [*]
Use a short if you really need 2-byte signed arithmetic. There aren't so many cases...
[*] For example, in Hotspot on Pentium architectures, float and double operations generally take exactly the same time, except for division.
Don't get too bogged down in the memory usage of these types unless you really understand it. For example:
every object size is rounded to 16 bytes in Hotspot, so an object with a single byte field will take up precisely the same space as a single object with a long or double field;
when passing parameters to a method, every type takes up 4 or 8 bytes on the stack: you won't save anything by changing a method parameter from, say, an int to a short! (I've seen people do this...)
Obviously, there are certain API calls (e.g. various calls for non-CPU intensive tasks that for some reason take floats) where you just have to pass it the type that it asks for...!
Note that String isn't a primitive type, so it doesn't really belong in this list.
A java int is 32 bits, while a long is 64 bits, so when you need to represent integers larger than 2^31, long is your friend. For a typical example of the use of long, see System.currentTimeMillis()
A byte is 8 bits, and the smallest addressable entity on most modern hardware, so it is needed when reading binary data from a file.
A double has twice the size of a float, so you would usually use a double rather than a float, unless you have some restrictions on size or speed and a float has sufficient capacity.
A short is two bytes, 16 bits. In my opinion, this is the least necessary datatype, and I haven't really seen that in actual code, but again, it might be useful for reading binary file formats or doing low level network protocols. For example ip port numbers are 16 bit.
Char represents a single character, which is 16 bits. This is the same size as a short, but a short is signed (-32768 to 32767) while a char is unsigned (0 to 65535). (This means that an ip port number probably is more correctly represented as a char than a short, but this seems to be outside the intended scope for chars...)
For the really authorative source on these details, se the java language specification.
You can have a look here about the primitive types in Java.
The main interest between these types are the memory usage. For example, int uses 32bits while byte only uses 8bits.
Imagine that you work on large structure (arrays, matrices...), then you will better take care of the type you are using in order to reduce the memory usage.
I guess there are several purposes to types of that kind:
1) They enforce restrictions on the size (and sign) of variables that can be stored in them.
2) They can add a bit of clarity to code (e.g. if you use a char, then anyone reading the code knows what you plan to store in it).
3) They can save memory. if you have a large array of numbers, all of which will be unsigned and below 256, you can declare it as an array of bytes, saving some memory compared with if you declared an array of ints.
4) You need long if the numbers you need to store are larger than 2^32 and a double for very large floating point numbers.
The primitive data types are required because they are the basis of every complex collection.
long, double, byte etc. are used if you need only a small integer (or whatever), that does not waste your heap space.
I know, there's enough of RAM in our times, but you should not waste it.
I need the "small ones" for database and stream operations.
Integers should be used for numbers in general.
Doubles are the basic data type used to represent decimals.
Strings can hold essentially any data type, but it is easier to use ints and is confusing to use string except for text.
Chars are used when you only wish to hold one letter, although they are essentially only for clarity.
Shorts, longs, and floats may not be necessary, but if you are, for instance, creating an array of size 1,00000 which only needed to hold numbers less than 1,000, then you would want to use shorts, simply to save space.
It's relative to the data you're dealing with. There's no point using a data type which reserves a large portion of memory when you're only dealing with a small amount of data. For example, a lot of data types reserve memory before they've even been used. Take arrays for example, they'll reserve a default amount (say, 256 bytes <-- an example!) even if you're only using 4 bytes of that.
See this link for your answer
Related
I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
I don't understand how an int 63823, takes up less space than a double 1.0. Is there not more information stored in the int, in this particular instance?
Good question. What you're seeing when you see 63823 and 1.0 is a representation of the underlying data, you are not seeing the underlying data. It is specially formatted so that you can read it, but it is not how the machine sees it.
Java uses very special formats for representing int and double. You need to look at those representations to understand why 63823 takes thirty-two bits when represented as a Java int and 1.0 takes sixty-four bits when represented as a Java double.
In particular, 63823 as an int in Java is represented as:
00000000000000001111100101001111
and 1.0 as a double is represented in Java as:
0011111111110000000000000000000000000000000000000000000000000000
If you want to explore more, I recommend Two's Complement and What Every Computer Scientist Should Know About Floating-Point Arithmetic.
Not exactly. The double 1.0 represents more information because, by the definition of a double as a 64 bit float, there are more values that it could be. To use your example, if you had a special data type that could only have two values, 63823 and 98321234213474932, then it would only take 1 bit to represent the number 63823, though it would be far less useful than an int.
In terms of implementation, it's often a lot easier and faster to work with fixed-size data types, so that you can allocate a fixed chunk of memory (that's what a variable is) without having to know it's value and constantly reallocate space. Examples of a variables with a different approach would be String and BigInteger, which do allocate space to accommodate their values. Note that both are immutable in Java -- that's not a coincidence.
These primitive datatypes need to be defined somewhere for you to use them. It is not a flexible container where you can stuff in whatever you want, rather more like a bottle which takes the same space no matter if full or empty. And they also have a maximum they can contain.
Read more yourself here.
The zeros that are not shown also count. Approximately, ignoring the fact that the numbers are actually stored in binary and not in decimal, when you write both numbers with the implied zero digits included, you get:
1.0 = 1.00000000000000000*10^0000
63823 = 0000063823
As you can see, 1.0 is twice as long as 63823. Therefore it requires twice as much storage.
The int and double don't have decimal digits at all. The decimal representation of the int has 8 decimal digits after removing leading zeros. The int itself has room for 32 binary digits. The double has room for 53 binary digits in the mantissa and a 10-bit exponent, and a sign bit.
how do algorithms with double compete compared to int values? Is there much difference, or is it neglectable?
In my case, I have a canvas that uses Integers so far. But now as I'm implementing a scaling, I'm probably going to switch everything to Double. Would this have a big impact on calculations?
If so, would maybe rounding doubles to only a few fractions optimize performance?
Or am I totally on the path of over-optimization and should just use doubles without any headache?
You're in GWT, so ultimately your code will be JavaScript, and JavaScript has a single type for numeric data: Number, which corresponds to Java's Double.
Using integers in GWT can either mean (I have no idea what the GWT compiler exactly does, it might also be dependent on context, such as crossing JSNI boundaries) that the generated code is doing more work than with doubles (doing a narrowing conversion of numbers to integer values), or that the code won't change at all.
All in all, expect the same or slightly better performance using doubles (unless you have to later do conversions to integers, of course); but generally speaking you're over-optimizing (also: optimization needs measurement/metrics; if you don't have them, you're on the “premature optimization” path)
There is a sizable difference between integers and doubles, however generally doubles are also very fast.
The difference is that integers are still faster than doubles, because it takes very few clock cycles to do arithmetic operations on integers.
Doubles are also fast because they are generally natively supported by a floating-point unit, which means that it is calculated by dedicated hardware. Unfortunately, it generally is usually 2x to many 40x slower.
Having said this, the CPU will usually spend quite a bit of time on housekeeping like loops and function calls, so if it is fast enough with integers, most of the time (perhaps even 99% of the time), it will be fast enough with doubles.
The only time floating point numbers are orders of magnitude slower is when they must be emulated, because there is no hardware support. This generally only occurs on embedded platforms, or where uncommon floating point types are used (eg. 128-bit floats, or decimal floats).
The result of some benchmarks can be found at:
http://pastebin.com/Kx8WGUfg
https://stackoverflow.com/a/2550851/1578925
but generally,
32-bit platforms have a greater disparity between doubles and integers
integers are always at least twice as fast on adding and subtracting
If you are going to change the type integer to double in your program you must also have to rewrite those lines of code that comparing two integers. Like a and b are two integers and you campare if ( a == b) so after changing a, b type to double you also have to change this line and have to use compare method of the double.
Not knowing the exact needs of your program, my instinct is that you're over-optimizing. When choosing between using ints or doubles, you usually base the decision on what type of value you need over which will run faster. If you need floating point values that allow for (not necessarily precise) decimal values, go for doubles. If you need precise integer values, go for ints.
A couple more points:
Rounding your doubles to certain fractions should have no impact on performance. In fact, the overhead required to round them in the first place would probably have a negative impact.
While I would argue not to worry about the performance differences between int and double, there is a significant difference between int and Integer. While an int is a primitive data type that can be used efficiently, an Integer is an object that essentially just holds an int. This incurs a significant overhead. Integers are useful in that they can be stored in collections like Vectors while ints cannot, but in all other cases its best to use ints.
In general maths that naturally fits as an integer will be faster than maths that naturally fits as a double, BUT trying to force double maths to work as an integer is almost always slower, moving back and forth between the two costs more than the speed boost you get.
If you're considering something like:
I only want 1 decimal places within my 'quazi integer float' so i'll just multiply everything by 10;
5.5*6.5
so 5.5 --> 55 and
so 6.5 --> 65
with a special multiplying function
public int specialIntegerMultiply(int a, int b){
return a*b/10;
}
Then for the love of god don't, it'll probably be slower with all the extra overhead and it'll be really confusing to write.
p.s. rounding the doubles will make no difference at all as the remaining decimal places will still exist, they'll just all be 0 (in decimal that is, in binary that won't even be true).
Is it a good idea to hold the numeric values in String variable in an object (also passing in methods and returning as String) and later while only operating it convert in BigDecimal and operate?
The intention to use in method signature and pojo's is that operation can be performed in double or any other type the implementor prefers to
For e.g.
public String operate(String value1, String value2){
BigDecimal val1 = new BigDecimal(value1);
BigDecimal val2 = new BigDecimal(value2);
return val1.multiply(val2).toString();
}
Or it is just an overhead?
Would it affect performance?
Converting from one format to other format is always overhead. Store it in the format so you can perform all operation on it
It's a bad idea from two standpoints:
All of the operation will require you to translate from/to string representation
Memory overhead is enormous.
Consider this example. An ordinary int occupies 4 bytes in memory. An int represented as a String can occupy up to 72 bytes:
16 bytes -- object overhead,
4 bytes -- cached hash code
8 bytes -- reference to char array
24 + 2 * 10 bytes -- char array itself (10 digits max)
up to 8 bytes for padding
It is 18 (!) times more than a primitive representation of an int.
Not sure why could it be good idea. "Stringly Typed" is not very good thing to do. And yes, BigDecimal holds a number more compact, as byte arrays, and also optimises to hold small values in a long value. String would be as two-byte character decimal representation.
There are a lot of problems with doing this. First, there is the overhead of creating and destroying extra objects, at least one for ever conversion to/from a String. Depending on how many times this happens, it could be significant. Second, every time you convert from String-to-numeric and back, you risk parsing issues and such. Last, you circumvent any build-in checks and balances Java provides in regards to type casting and such.
My personal opinion is that you should minimize conversion when possible, and store data in the most specific object/primitive you can. I think this makes the code easier to read, easier to maintain, and easier to troubleshoot when there is a bug.
I have often heard complaints against Java for not having unsigned data types. See for example this comment. I would like to know how is this a problem? I have been programming in Java for 10 years more or less and never had issues with it. Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Since unsigned and signed numbers are represented with the same bit values, the only places I can think of where signedness matters are:
When converting the numbers to other bit representation. Between 8, 16 and 32 bit integer types you can use bitmasks if needed.
When converting numbers to decimal format, usually to Strings.
Interoperating with non-Java systems through API's or protocols. Again the data is just bits, so I don't see the problem here.
Using the numbers as memory or other offsets. With 32 bit ints this might be problem for very huge offsets.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those. What am I missing? What are the actual benefits of having unsigned types in a programming language and how would having those make Java better?
Occasionally when converting bytes to ints a & 0xFF is needed, but I don't consider that as a problem.
Why not? Is "applying a bitwise AND with 0xFF" actually part of what your code is trying to represent? If not, why should it have to be part of have you write it? I actually find that almost anything I want to do with bytes beyond just copying them from one place to another ends up requiring a mask. I want my code to be cruft-free; the lack of unsigned bytes hampers this :(
Additionally, consider an API which will always return a non-negative value, or only accepts non-negative values. Using an unsigned type allows you to express that clearly, without any need for validation. Personally I think it's a shame that unsigned types aren't used more in .NET, e.g. for things like String.Length, ICollection.Count etc. It's very common for a value to naturally only be non-negative.
Is the lack of unsigned types in Java a fatal flaw? Clearly not. Is it an annoyance? Absolutely.
The comment that you quote hits the nail on the head:
Java's lack of unsigned data types also stands against it. Yes, you can work around it, but it's not ideal and you'll be using code that doesn't really reflect the underlying data correctly.
Suppose you are interoperating with another system, which wants an unsigned 16 bit integer, and you want to represent the number 65535. You claim "the data is just bits, so I don't see the problem here" - but having to pass -1 to mean 65535 is a problem. Any impedance mismatch between the representation of your data and its underlying meaning introduces an extra speedbump when writing, reading and testing the code.
Instead I find it easier that I don't need to consider operations between unsigned and signed numbers and the conversions between those.
The only times you would need to consider those operations is when you were naturally working with values of two different types - one signed and one unsigned. At that point, you absolutely want to have that difference pointed out. With signed types being used to represent naturally unsigned values, you should still be considering the differences, but the fact that you should is hidden from you. Consider:
// This should be considered unsigned - so a value of -1 is "really" 65535
short length = /* some value */;
// This is really signed
short foo = /* some value */;
boolean result = foo < length;
Suppose foo is 100 and length is -1. What's the logical result? The value of length represents 65535, so logically foo is smaller than it. But you'd probably go along with the code above and get the wrong result.
Of course they don't even need to represent different types here. They could both be naturally unsigned values, represented as signed values with negative numbers being logically greater than positive ones. The same error applies, and wouldn't be a problem if you had unsigned types in the language.
You might also want to read this interview with Joshua Bloch (Google cache, as I believe it's gone from java.sun.com now), including:
Ooh, good question... I'm going to say that the strangest thing about the Java platform is that the byte type is signed. I've never heard an explanation for this. It's quite counterintuitive and causes all sorts of errors.
If you like, yes, everything is ones and zeroes. However, your hardware arithmetic and logic unit doesn't work that way. If you want to store your bits in a signed integer value but perform operations that are not natural to signed integers, you will usually waste both storage space and processing time.
An unsigned integer type stores twice as many non-negative values in the same space as the corresponding signed integer type. So if you want to take into Java any data commonly used in a language with unsigned values, such as a POSIX date value (unsigned number of seconds) that is normally used with C, then in general you will need to use a wider integer type than C would use. If you are processing many such values, again you will waste both storage space and fetch-execute time.
The times I have used unsigned data types have been when I read in large blocks of data that correspond to images, or worked with openGL. I personally prefer unsigned if I know something will never be negative, as a "safety feature" of sorts.
Unsigned types are useful for bit-by-bit comparisons, and I'm pretty sure they are used extensively in graphics.
I can create a literal long by appending an L to the value; why can't I create a literal short or byte in some similar way? Why do I need to use an int literal with a cast?
And if the answer is "Because there was no short literal in C", then why are there no short literals in C?
This doesn't actually affect my life in any meaningful way; it's easy enough to write (short) 0 instead of 0S or something. But the inconsistency makes me curious; it's one of those things that bother you when you're up late at night. Someone at some point made a design decision to make it possible to enter literals for some of the primitive types, but not for all of them. Why?
In C, int at least was meant to have the "natural" word size of the CPU and long was probably meant to be the "larger natural" word size (not sure in that last part, but it would also explain why int and long have the same size on x86).
Now, my guess is: for int and long, there's a natural representation that fits exactly into the machine's registers. On most CPUs however, the smaller types byte and short would have to be padded to an int anyway before being used. If that's the case, you can as well have a cast.
I suspect it's a case of "don't add anything to the language unless it really adds value" - and it was seen as adding sufficiently little value to not be worth it. As you've said, it's easy to get round, and frankly it's rarely necessary anyway (only for disambiguation).
The same is true in C#, and I've never particularly missed it in either language. What I do miss in Java is an unsigned byte type :)
Another reason might be that the JVM doesn't know about short and byte. All calculations and storing is done with ints, longs, floats and doubles inside the JVM.
There are several things to consider.
1) As discussed above the JVM has no notion of byte or short types. Generally these types are not used in computation at the JVM level; so one can think there would be less use of these literals.
2) For initialization of byte and short variables, if the int expression is constant and in the allowed range of the type it is implicitly cast to the target type.
3) One can always cast the literal, ex (short)10