Why use char[] instead of String?

Why use char[] instead of String? - java

In Thread.java, line 146, I have noticed that the author used a char[] instead of a String for the name field. Are there any performance reasons that I am not aware of? getName() also wraps the character in a String before returning the name. Isn't it better to just use a String?

In general, yes. I suspect char[] was used in Thread for performance reasons, back in the days when such things in Java required every effort to get decent performance. With the advent of modern JVMs, such micro-optimizations have long since become unimportant, but it's just been left that way.
There's a lot of weird code in the old Java 1.0-era source, I wouldn't pay too much attention to it.

Hard to say. Perhaps they had some optimizations in mind, perhaps the person who wrote this code was simply more used to the C-style char* arrays for strings, or perhaps by the time this code was written they were not sure if strings will be immutable or not. But with this code, any time a Thread.getName() is called, a new char array is created, so this code is actually heavier on the GC than just using a string.

Maybe the reason was security protection? String can be changed with reflection, so the author wants copy on read and write. If you are doing that, you might as well use char array for faster copying.

Related

Memory efficient, low overhead replacement for String in Java

After reading answers on this old question, I'm a bit curious to know if there are any frameworks now, that provide for storing large no.(millions) of small size(15-25 chars long) Strings more efficiently than java.lang.String.
If possible I would like to store represent the string using byte[] instead of char[].
My String(s) are going to be constants & I don't really require numerous utility methods as provided by java.lang.String class.

Java 6 does this with -XX:+UseCompressedStrings which is on by default in some updates.
Its not in Java 5.0 or 7. It is still listed as on by default, but its not actually supported in Java 7. :P
Depending on what you want to do you could write your own classes, but if you only have a few 100 MBs of Strings I suspect its not worth it.

Most likely this optimization is not worth the effort and complexity it brings with it. Either live with what the VM offers you (as Peter Lawrey suggests), or go through great lengths to work your own solution (not using java.lang.String).
There is an interface CharSequence your own String class could implement. Unfortunately very few JRE methods accept a CharSequence, so be prepared that toString() will need to be used frequently on your class if you need to pass any of your 'Strings' to any other API.
You could also hack String to create your Strings in a more memory efficient (and less GC friendly way). String has a (package access level) constructor String(offset, count, char[]) that does not copy the chars but just takes the char[] as direct reference. You could put all your strings into one big char[] array and construct the strings using reflection, this would avoid much of the overhead normally introduced by the char[] array in a string. I can't really recommend this method, since it relies on JRE private functionality.

why setName in Thread class assigns to a character array?why not a String?

When i was dealing with threads concept in Java, i have seen Thread.java source file. I noticed when setName() method assigns string to a character array called "name[]". Java has a feature of String data type, then why they are using character array.
In source file it is initialised like,
private char name[]; // why not "private String name;"
In setName() method,
public final void setName(String name) {
checkAccess();
this.name = name.toCharArray();
}
Please help me. Thanks in advance.

This name is accessed from native code, so it is easier to handle char arrays than mangle with Java types. The core-lib-devs mailing list discussed this question a while ago, here's a link to one mail from the thread. The original question stated that "a fair amount of time goes to that Thread.setName call which I believe a significant portion is to do new char allocation and copy char array etc". Quoting bits of the answer:
There was an RFE for this way back in late 2002:
4745629 (thread) Thread.setName does needless string allocations
(don't use char[])
The initial eval in 2002 stated:
"I can't imagine that this seriously impacts the performance of any
real program. Furthermore, changing the fields in Thread is
problematic due to the close relationship of this class with the VM.
That said, it might be worth addressing this in the context of some
Thread code-cleanup."
Then in 2005 it was closed as "will not fix":
"There are dependencies on the name representation being a char array
in the JVM and this RFE must be respectfully rejected."

I think that this is most likely a historical artifact; i.e. something that was done a long time ago for reasons that are no longer relevant.
As Curtisk's comment indicates, it has been suggested that this be fixed. But it sounds like the ideas has been set on one side because the effort to make the fix exceeds the benefit. And it is pretty clear that the benefit of fixing this anomoly is miniscule ... unless you are creating lots of threads that do little real work.
The RFE (4745629) is no longer visible to Google, but this mailing list posting by David Holmes # Oracle refers to it:
Xiaobin Lu said the following on 08/11/10 08:07:
Thanks for your reply. For a lot of enterprise applications (such as the
one I work for), a fair amount of time goes to that Thread.setName call
which I believe a significant portion is to do new char allocation and
copy char array etc. So I think we should give a second thought about
how we can efficiently store that field.
There was an RFE for this way back in late 2002:
4745629 (thread) Thread.setName does needless string allocations (don't
use char[])
The initial eval in 2002 stated:
"I can't imagine that this seriously impacts the performance of any real
program. Furthermore, changing the fields in Thread is problematic due
to the close relationship of this class with the VM. That said, it
might be worth addressing this in the context of some Thread code-cleanup."
Then in 2005 it was closed as "will not fix":
"There are dependencies on the name representation being a char array in
the JVM and this RFE must be respectfully rejected."
Changing both the VM and the Java code is, as you know, a real pain to
coordinate, so there would have to be some compelling performance
evidence to support this (assuming it can be changed). Personally I
agree with the initial eval above - if setName is impacting your overall
performance then your threads can not be doing too much real work and
you would seem to be creating far too many threads - so I'd be
interested to here more about the context in which this occurs.

What is more efficient for printing logs with log4j String or StringBuilder

Guys I have lot of logs printing in my small java utility,so just thought of this question, if its a very big system and efficiency matters, and if we have lot of logs generated (using log4j) which is the better object to hold the logging messages
String or StringBuilder.

If your choice is between
logger.Debug(string1 + string2);
and
logger.Debug(new StringBuilder(string1).append(string2).toString());
Then there is no difference
But if there are lots of checks and constructs like logString += <something> then using a StringBuilder is better.
Note that the biggest efficiency issue with log4j comes from evaluating expressions without checking for log level. You always need
if (logger.isDebug())
logger.Debug(..));
Lot of CPU cycles have been wasted concatenating strings and evaluating other expressions whose results will soon be discarded because logger is set to a higher level.

If you're really logging that much stuff, I would imagine the volume of stuff you're outputting to, say, a file, makes more difference than which in-memory object you pick.
Which is to say: you're maybe logging too much?
That said, StringBuilder appending will be more efficient than just String appending, assuming you keep adding to a log message's contents, but I'd be very surprised if it made any noticeable difference.

Depends. Are you planning to manipulate the strings such as concatenating them together? If so, StringBuilder would be better. If not, String would be better.

Most logging systems supply some kind if mechanism to insert parameters (eg. log.Debug("foo = {}", getFooValue()). This is the preferred and most efficient way.
Some people will suggest to use a stringbuilder like this: stringBuilder.append(foo).append(bar).toString(), however this is not more efficient than "foo" + "bar".
I cannot find a online source for it now, but I remember that if you look at the bytecode of those two code fragments, it will be identical.

If efficiency is your concern, your logs should be minimal required. Logs can prove to be a major overhead in your system it you are logging too much. and when you do thing too much performance difference between string and stringBuffer will fade.

I would think it you could pass it as a String you are better off because you save the call to toString on the StringBuilder.
However, if you really want to improve efficiency you might consider passing something like a String Supplier. This would allow you to check the logging level prior to producing the log message in the cases where this is not simply a String.

String concatination can be very expensive. StringBuilder takes care of part of this. However a much better approach is to make use of String.format. This method take a pattern String and a variable number of arguments which are null safe converted to Strings for substitution.

What's a real-world example of using StringBuffer?

I'm using Java 6.
I've only written a couple of multi-threaded applications so I've never encountered a time when I had several threads accessing the same StringBuffer.
Could somebody give me a real world example when StringBuffer might be useful?
Thanks.
EDIT: Sorry I think I wasn't clear enough. I always use StringBuilder because in my applications, only one thread accesses the string at a time. So I was wondering what kind of scenario would require multiple threads to access StringBuffer at the same time.

The only real world example I can think of is if you are targetting Java versions befere 1.5. The StringBuilder class was introduced in 1.5 so for older versions you have to use StringBuffer instead.
In most other cases StringBuilder should be prefered to StringBuffer for performance reasons - the extra thread safety provided by StringBuffer is rarely required. I can't think of any obvious situations where a StringBuffer would make more sense. Perhaps there are some, but I can't think of one right now.
In fact it seems that even the Java library authors admit that StringBuffer was a mistake:
Evaluation by the libraries team:
It is by design that StringBuffer and StringBuilder share no
common public supertype. They are not intended to be alternatives:
one is a mistake (StringBuffer), and the other (StringBuilder)
is its replacement.
If StringBuilder had been added to the library first StringBuffer would probably never have been added. If you are in the situation that multiple threads appending to the same string seems like a good idea you can easily get thread safety by synchronizing access to a StringBuilder. There's no need for a whole extra class and all the confusion it causes.
It also might be worth noting that the .NET base class library which is heavily inspired by Java's libraries has a StringBuilder class but no StringBuffer and I've never seen anyone complaining about that.

A simple case cane be when you are having a Log file and multiple threads are logging errors or warnings and writing to that log file.

In general, these types of buffered string objects are useful when you are dynamically building strings. They attempt to minimize the amount of memory allocation and deallocation that is created when you continually append strings of a fixed size together.
So a real world example, imagine you are manually building HTML for a page, where you do roughly 100 string appends. If you did this with immutable strings, the JAVA virtual machine would do quite a bit of memory allocation and deallocation where with a StringBuffer it would do far less.

StringBuffer is a very popular choice with programmers.
It has the advantage over standard String objects, in that it is not an immutable object. Therefore, if a value is appended to the StringBuffer, a new object is not created (as it would be with String), but simply appended to the end.
This gives StringBuffers (under certain situations that cannot be compensated by the compiler) a performance advantage.
I tend to use StringBuffers anywhere that I dynamically add data to a string output, such as a log file writer, or other file generation.
The other alternative is StringBuilder. However, this is not thread-safe, as was designed not to be to offer even better performance in single-threaded applications. Apart from method signatures containing the sychronized keyword in StringBuffer, the classes are almost identical.
StringBuilder is recommended over StringBuffer in single threaded applications however, due to the performance gains (or if you look at it the other way around, due to the performance overheads of StringBuffer).

Is encapsulating Strings as byte[] in order to save memory overkill ? (Java)

Was recently reviewing some Java Swing code and saw this:
byte[] fooReference;
String getFoo() {
returns new String(fooReference);
}
void setFoo(String foo) {
this.fooReference = foo.getBytes();
}
The above can be useful to save on your memory foot print or so I'm told.
Is this overkill is anyone else encapsulating their Strings in this way?

That's a really, really bad idea. Don't use the platform default encoding. There's nothing to say that if you call setFoo and then getFoo that you'll get back the same data.
If you must do something like this, then use UTF-8 which can represent the whole of Unicode for certain... but I really wouldn't do it. It potentially saves some memory, but at the cost of performing conversions unnecessarily for most of the time - and being error-prone, in terms of failing to use an appropriate encoding.
I dare say there are some applications where this would be appropriate, but for 99.99% of them, it's a terrible idea.

This is not really useful:
1. You are copying the string every time getFoo or setFoo are called, therefore increasing both CPU and memory usage
2. It's obscure

A little historical excursion...
Using byte arrays instead of String objects actually used to have some considerable advantages in the early days of Java (1.0/1.1) if you could be sure that you would never need anything outside of ISO-8859-1. With the VMs of that time it was more than 10 times faster to use drawBytes() compared to drawString() and it actually does save memory which was still very scarce at that time and applets used to have a hard coded memory barrier of 32 and later 64 MB anyway. Not only is a byte[] smaller than the embedded char[] of String objects but you could also save the comparatively heavy String object itself which did make quite a difference if you had lots of short strings. Besides that accessing a plain byte array is also faster than using the accessor methods of String with all their extra bounds checks.
But since drawBytes ceased to be any faster in Java 1.2 and since current JITs are much better than the Symantec JIT of that time the remaining minimal performance advantage of byte[] arrays over strings is no longer worth the hassle. The memory advantage is still there and it might thus still be an option in some very rare extreme scenarios but nowadays it's nothing that should be considered if it's not really necessary.

It may well be overkill, and it may even consume more memory, since you now have two copies of the string. How long the actual string lives depends upon the client, but as with many such hacks, it smells a lot like premature optimization.

If you anticipate that you'll have a lot of identical strings, another much better way you can save memory is with the String.intern() method.

Each call to getFoo() is instantiating a new String. How is this saving memory? If anything you're adding additional overhead for your garbage collector to go and clean up these new instances when these new references become unreferenced

This does indeed not make any sense. If it were a compile time constant which you don't need to massage back to a String, then it would make a bit more sense. You still have the character encoding problem.
It would make more sense to me if it were a char[] constant. In real world there are several JSP compilers which optimizes String constants away into a char[] which in turn can easily be written to a Writer#write(char[]). This is finally "slightly" more efficient, but those little bits counts a lot in large and heavily used applications like Google Search and so on.
Tomcat's JSP compiler Jasper does this as well. Check the genStringAsCharArray setting. It does then like so
static final char[] text1 = "some static text".toCharArray();
instead of
static final String text1 = "some static text";
which ends up with less overhead. It doesn't need a whole String instance around those characters.

If, after profiling your code, you find that memory usage for strings is a problem, you're much better off using a general string compressor and storing compressed strings, rather than trying to use UTF-8 strings for the minor reduction in space they give you. With English language strings, you can generally compress them to 1-2 bits per character; most other languages are probably similar. Getting to <1 bit per character is hard, but possible if you have a lot of data.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.