Java Character vs char: what about memory usage? - java

In one of my classes I have a field of type Character. I preferred it over char because sometimes the field has "no value" and null seams to me the cleanest way to represent this (lack of) information.
However I'm wondering about the memory footprint of this approach. I'm dealing with hundred of thousands of objects and the negligible difference between the two options may now deserve some investigation.
My first bet is that a char takes two bytes whereas a Character is an object, and so it takes much more in order to support its life cycle. But I know boxed primitives like Integer, Character and so on are not ordinary classes (think about boxing and unboxing), so I wonder if the JVM can make some kind of optimization under the hood.
Furthermore, are Characters garbage collected like the other stuff or have a different life cycle? Are they pooled from a shared repository? Is this standard or JVM implementation-dependent?
I wasn't able to find any clear information on the Internet about this issue. Can you point me to some information?

If you are you using Character to create character then prefer to use
Character.valueOf('c'); // it returns cached value for 'c'
Character c = new Character('c');// prefer to avoid
Following is an excerpt from javadoc.
If a new Character instance is not required, this method Character.valueOf() should generally be used in preference to the constructor Character(char), as this method is likely to yield significantly better space and time performance by caching frequently requested values.

Use int instead. Use -1 to represent "no char".
Lots of precedence of this pattern, for example int read() in java.io.Reader

As you stated, a Character object can be null, so it has to take more place in RAM than a regular char which cannot be null : in a way, Characters are a superset of chars.
However, in a given part of your code, the JIT compiler might be able to detect that you Character is never null an is always used as a regular char and optimize that part so that your Character uses no more RAM or execution is no slower. I'm just speculating on this very point, though, I don't know if a JIT can actually perform this precise optimization.

Related

Best way to define a constant identifier?

In my java application one of my objects has exactly one value from a set of values. Now I wonder how to define them to increase the performance:
private static final String ITEM_TYPE1= "type1"
private static final int ITEM_TYPE1= 1
Does defining int better than string? (I should convert the value to string so I like to define as string but just fearing for performance reasons because comparing ints is simpler than srtings maybe)
EDIT: I am aware of enums but I just want to know whether ints has more performance than strings or not? This depends on how JDK and JRE handle the undergoing. (In Android dalvik or ART ..)
In my java application one of my objects has exactly one value from a set of values
That is what java enums are for.
Regarding the question "have ints more performance than strings", that is almost nonsensical.
You are talking about static constants. Even if they are used a 100 or a 1000 times in your app, performance doesn't matter here. What matters is to write code that is easy to read and maintain. Because then the JIT can kick in and turn it into nicely optimized machine code.
Please understand: premature optimisation is the root of all evil! Good or bad performance of your app depends on many other factors, definitely not on representing constants as ints or strings.
Beyond that: the type of some thing in Java should reflect its nature. If it is a string, make it a string (like when you want to mainly use it as string, and concatenate it to other strings). When you have numbers and deal with them as numbers, make it an int.
First of all, an int has always a fixed size which it uses in memory, on most systems it's 4 bytes (I guess on Java always).
A String is a complex type which means that it takes not only the bytes of the actual string data but also additional like the length of the string and so on.
So if you have the choice between String and int, you should always chose int. it does not take so much place and is faster to operate with.

Using chars as digits,

I am writing a simulation that will use a lot of memory and needs to be fast. Instead of using ints I am using chars (8 bits not 32). I need to operate on them as if these chars were ints.
To achieve that I have done something like
char a = 1;
char b = 2;
System.out.println(a*1 + b*1); //it give me 3 in console so it has int-like behavior;
I don't know what's going on "under the mask" when I multiply char with an integer. Is this the fastest way to do it?
Thank you for help!
Performance wise it's not worth using char instead of int, because all modern hardware architectures are optimized for 32- or 64-bit wide memory and register access.
Only reason to use char would be if you want to reduce memory footprint, i.e. if you work with large amount of data.
Additional info: Performance of built-in types : char vs short vs int vs. float vs. double
A char is simply a number (not a 32bit int but a number) which is normally represented ascii-encoded. By multiply it with an integer the compiler does an implicit cast from char to int that's why you get printed 3 on the console instead of the ascii representative.
You should use ints. the size of the character will not affect the fetch speed of the data, nor the processing time unless it is composed of multiple computer words, which is unlikely. There is no reason to do this. Moreover, chars are stored as integers and are therefore the same size. an alternative would be to use small ints but again this is not helpful for what you want.
Also, I notice your code has 'System.Out.Println' which means you are using java. There are several implications of this, the first is that no, you are not going to be going fast or using very little memory. Period, it will not happen. There is a large amount of overhead involved in running the JVM, JIT, garbage collector and other parts of the Java platform. If efficiency is a relavent factor you are starting off wrong. The second implication is that your choice of datatypes will have no impact on processing time because they will be identical to the physical hardware. Only the Virtual machine will distinguish between them, and in the case of primitives, there is no difference anyways.

Java 7 String - substring complexity

Until Java 6, we had a constant time substring on String. In Java 7, why did they decide to go with copying char array - and degrading to linear time complexity - when something like StringBuilder was exactly meant for that?
Why they decided is discussed in Oracle bug #4513622 : (str) keeping a substring of a field prevents GC for object:
When you call String.substring as in the example, a new character array for storage is not allocated. It uses the character array of the original String. Thus, the character array backing the the original String can not be GC'd until the substring's references can also be GC'd. This is an intentional optimization to prevent excessive allocations when using substring in common scenarios. Unfortunately, the problematic code hits a case where the overhead of the original array is noticeable. It is difficult to optimize for both edges cases. Any optimization for space/size trade-offs are generally complex and can often be platform-specific.
There's also this note, noting that what once was an optimization had become a pessimization according to tests:
For a long time preparations and planing have been underway to remove the offset and count fields from java.lang.String. These two fields enable multiple String instances to share the same backing character buffer. Shared character buffers were an important optimization for old benchmarks but with current real world code and benchmarks it's actually better to not share backing buffers. Shared char array backing buffers only "win" with very heavy use of String.substring. The negatively impacted situations can include parsers and compilers however current testing shows that overall this change is beneficial.
If you have a long lived small substring of a short lived, large parent string, the large char[] backing the parent string will not be eligible for garbage collection until the small substring moves out of scope. This means a substring can take up much more memory than people expect.
The only time the Java 6 way performed significantly better was when someone took a large substring from a large parent string, which is a very rare case.
Clearly they decided that the tiny performance cost of this change was outweighed by the hidden memory problems caused by the old way. The determining factor is that the problem was hidden, not that there is a workaround.
This will impact the complexity of data structures like Suffix Arrays by a fair margin. Java should provide some alternate method for getting a part of the original string.
It's just their crappy way of fixing some JVM garbage collection limitations.
Before Java 7, if we want to avoid the garbage collection not working issue, we can always copy the substring instead of keeping the subString reference. It was just an extra call to the copy constructor:
String smallStr = new String(largeStr.substring(0,2));
But now, we can no longer have a constant time subString. What a disaster.
The main motivation, I believe, is the eventual "co-location" of String and its char[]. Right now they locate in a distance, which is a major penalty on cache lines. If every String owns its char[], JVM can merge them together, and reading will be much faster.

Using boolean instead of byte or int in Java

Is using boolean instead of byte(if I need 2 states) in Java useful for performance or it's just illusion... Does all the space profit leveled by alignment?
You should use whichever is clearer, unless you have profiled your code, and decided that making this optimization is worth the cost in readability. Most of the time, this sort of micro-optimization isn't worth the performance increase.
According to Oracle,
boolean: ... This data type represents one bit of information, but its
"size" isn't something that's precisely defined.
To give you an idea, I once consulted in a mini-shop (16-bit machines).
Sometimes people would have a "flag word", a global int containing space for 16 boolean flags.
This was to save space.
Never mind that to test a flag required two 16-bit instructions, and to set a flag required three or more.
Yes, boolean may use only 1 bit. But more important, it makes it clearer for another developer reading your code that there are only two possible states.
The answer depends on your JVM, and on your code. The only way to find out for sure is by profiling your actual code.
If you only have 2 states that you want to represent, and you want to reduce memory usage you can use a java.util.BitSet.
Only most JVMs a boolean uses the same amount of space as a byte. Accessing a byte/boolean can be more work than accessing an int or long, so if performance is the only consideration, a int or long can be faster. When you share a value between threads, there can be an advantage to reserving a whole cache line to the field (in the most extreme cases) This is 64-bytes on many CPUs.

Anyone using short and byte primitive types, in real apps?

I have been programming in Java since 2004, mostly enterprise and web applications. But I have never used short or byte, other than a toy program just to know how these types work. Even in a for loop of 100 times, we usually go with int. And I don't remember if I have ever came across any code which made use of byte or short, other than some public APIs and frameworks.
Yes I know, you can use a short or byte to save memory in large arrays, in situations where the memory savings actually matters. Does anyone care to practice that? Or its just something in the books.
[Edited]
Using byte arrays for network programming and socket communication is a quite common usage. Thanks, Darren, to point that out. Now how about short? Ryan, gave an excellent example. Thanks, Ryan.
I use byte a lot. Usually in the form of byte arrays or ByteBuffer, for network communications of binary data.
I rarely use float or double, and I don't think I've ever used short.
Keep in mind that Java is also used on mobile devices, where memory is much more limited.
I used 'byte' a lot, in C/C++ code implementing functionality like image compression (i.e. running a compression algorithm over each byte of a black-and-white bitmap), and processing binary network messages (by interpreting the bytes in the message).
However I have virtually never used 'float' or 'double'.
The primary usage I've seen for them is while processing data with an unknown structure or even no real structure. Network programming is an example of the former (whoever is sending the data knows what it means but you might not), something like image compression of 256-color (or grayscale) images is an example of the latter.
Off the top of my head grep comes to mind as another use, as does any sort of file copy. (Sure, the OS will do it--but sometimes that's not good enough.)
The Java language itself makes it unreasonably difficult to use the byte or short types. Whenever you perform any operation on a byte or short value, Java promotes it to an int first, and the result of the operation is returned as an int. Also, they're signed, and there are no unsigned equivalents, which is another frequent source of frustration.
So you end up using byte a lot because it's still the basic building block of all things cyber, but the short type might as well not exist.
Until today I haven't notice how seldom I use them.
I've use byte for network related stuff, but most of the times they were for my own tools/learning. In work projects these things are handled by frameworks ( JSP for instance )
Short? almost never.
Long? Neither.
My preferred integer literals are always int, for loops, counters, etc.
When data comes from another place ( a database for instance ) I use the proper type, but for literals I use always int.
I use bytes in lots of different places, mostly involving low-level data processing. Unfortunately, the designers of the Java language made bytes signed. I can't think of any situation in which having negative byte values has been useful. Having a 0-255 range would have been much more helpful.
I don't think I've ever used shorts in any proper code. I also never use floats (if I need floating point values, I always use double).
I agree with Tom. Ideally, in high-level languages we shouldn't be concerned with the underlying machine representations. We should be able to define our own ranges or use arbitrary precision numbers.
when we are programming for electronic devices like mobile phone we use byte and short.In this case we should take care on memory management.
It's perhaps more interesting to look at the semantics of int. Are those arbitrary limits and silent truncation what you want? For application-level code really wants arbitrary sized integers, it's just that Java has no way of expressing those reasonably.
I have used bytes when saving State while doing model checking. In that application the space savings are worth the extra work. Otherwise I never use them.
I found I was using byte variables when doing some low-level image processing. The .Net GDI+ draw routines were really slow so I hand-rolled my own.
Most times, though, I stick with signed integers unless I am forced to use something larger, given the problem constraints. Any sort of physics modeling I do usually requires floats or doubles, even if I don't need the precision.
Apache POI was using short quite a few times. Probably because of Excel's row/column number limitation.
A few months ago they changed to int replacing
createCell(short columnIndex)
with
createCell(int column).
On in-memory datagrids, it can be useful.
The concept of a datagrid like Gemfire is to have a huge distributed map.
When you don't have enough memory you can overflow to disk with LRU strategy, but the keys of all entries of your map remains in memory (at least with Gemfire).
Thus it is very important to make your keys with a small footprint, particularly if you are handling very large datasets.
For the entry value, when you can it's also better to use the appropriate type with a small memory footprint...
I have used shorts and bytes in Java apps communicating to custom usb or serial micro-controllers to receive 10bit values wrapped in 2 bytes as shorts.
bytes and shorts are extensively used in Java Card development. Take a look at my answer to Are there any real life uses for the Java byte primitive type?.

Categories