I'm using Java 6.
I've only written a couple of multi-threaded applications so I've never encountered a time when I had several threads accessing the same StringBuffer.
Could somebody give me a real world example when StringBuffer might be useful?
Thanks.
EDIT: Sorry I think I wasn't clear enough. I always use StringBuilder because in my applications, only one thread accesses the string at a time. So I was wondering what kind of scenario would require multiple threads to access StringBuffer at the same time.
The only real world example I can think of is if you are targetting Java versions befere 1.5. The StringBuilder class was introduced in 1.5 so for older versions you have to use StringBuffer instead.
In most other cases StringBuilder should be prefered to StringBuffer for performance reasons - the extra thread safety provided by StringBuffer is rarely required. I can't think of any obvious situations where a StringBuffer would make more sense. Perhaps there are some, but I can't think of one right now.
In fact it seems that even the Java library authors admit that StringBuffer was a mistake:
Evaluation by the libraries team:
It is by design that StringBuffer and StringBuilder share no
common public supertype. They are not intended to be alternatives:
one is a mistake (StringBuffer), and the other (StringBuilder)
is its replacement.
If StringBuilder had been added to the library first StringBuffer would probably never have been added. If you are in the situation that multiple threads appending to the same string seems like a good idea you can easily get thread safety by synchronizing access to a StringBuilder. There's no need for a whole extra class and all the confusion it causes.
It also might be worth noting that the .NET base class library which is heavily inspired by Java's libraries has a StringBuilder class but no StringBuffer and I've never seen anyone complaining about that.
A simple case cane be when you are having a Log file and multiple threads are logging errors or warnings and writing to that log file.
In general, these types of buffered string objects are useful when you are dynamically building strings. They attempt to minimize the amount of memory allocation and deallocation that is created when you continually append strings of a fixed size together.
So a real world example, imagine you are manually building HTML for a page, where you do roughly 100 string appends. If you did this with immutable strings, the JAVA virtual machine would do quite a bit of memory allocation and deallocation where with a StringBuffer it would do far less.
StringBuffer is a very popular choice with programmers.
It has the advantage over standard String objects, in that it is not an immutable object. Therefore, if a value is appended to the StringBuffer, a new object is not created (as it would be with String), but simply appended to the end.
This gives StringBuffers (under certain situations that cannot be compensated by the compiler) a performance advantage.
I tend to use StringBuffers anywhere that I dynamically add data to a string output, such as a log file writer, or other file generation.
The other alternative is StringBuilder. However, this is not thread-safe, as was designed not to be to offer even better performance in single-threaded applications. Apart from method signatures containing the sychronized keyword in StringBuffer, the classes are almost identical.
StringBuilder is recommended over StringBuffer in single threaded applications however, due to the performance gains (or if you look at it the other way around, due to the performance overheads of StringBuffer).
Related
Guava contains utilities for splitting and joining Strings, but it requires the instantiation of a Splitter/Joiner object to do so. These are small objects that typically only contain the character(s) on which to split/join. Is it a good idea to maintain references to these objects in order to reuse them, or is it preferable to just create them whenever you need them and let them be garbage collected?
For example, I could implement this method in the following two ways:
String joinLines(List<String> lines) {
return Joiner.on("\n").join(lines);
}
OR
static final Joiner LINE_JOINER = Joiner.on("\n");
String joinLines(List<String> lines) {
return LINE_JOINER.join(lines);
}
I find the first way more readable, but it seems wasteful to create a new Joiner each time the method is called.
To be honest, this sounds like premature optimization to me. I agree with #Andy Turner, write whatever is easiest to understand and maintain.
If you plan to use Joiner.on("\n") in a few places, make it a well named constant; go with option two.
If you only plan to use it in your joinLines method, a constant seems overly verbose; go with option one.
It depends greatly on how often you expect the code to be called and what tradeoffs you want to make between CPU time, memory consumption and readability. Since Joiner is such a small thing, it's not going to make a huge difference either way: if you make it a constant, you save the (fairly minimal) costs of allocating it and GCing it for each call, while adding the (fairly minimal) memory consumption overhead to the program.
It also depends in part on what platform you're running the code on: if you're running on the server, typically you'll have plenty of memory so keeping a constant won't be an issue. On the other hand, if you're running on Android you're more memory constrained, but you also want to avoid unnecessary allocations since garbage collection is going to be worse and more impactful to your performance.
Personally, I tend to allocate a constant unless I know it's only going to be used some fixed number of times as opposed to repeatedly throughout the program.
Java programs can be very memory hungry. For example, a Double object has 24 bytes: 8 bytes of data and 16 bytes of JVM-imposed overhead. In general, the objects that represent the primitive types are very expensive.
The same happens for any collection in the Java Standard Library. There are even some counterintuitive facts such as a HashSet being more memory hungry than a HashMap, since a HashSet contains a HashMap inside (http://docs.oracle.com/javase/7/docs/api/java/util/HashSet.html).
Could you come up with some advice when modeling data and delegation of objects in high performance settings so that these "weaknesses" of Java are mitigated?
Some techniques I use to reduce memory:
Make your own IntArrayList (etc) class that prevents boxing
Make your own IntHashMap (etc) class where keys are primitives
Use nio's ByteBuffer to store large arrays of data efficiently (and in native memory, outside heap). It's like a byte array but contains methods to store/retrieve all primitive types from the buffer at any arbitrary offset (trade memory for speed)
Don't use pooling because pools keep unused instances explicitly alive.
Use threads scarcely, they're super memory hungry (in native memory, outside heap)
When making substrings of big strings, and discarding the original, the substrings still refer to the original. So use new String to dispose of the old big string.
A linear array is smaller than a multidimensional array, and if the size of all but the last dimension is a power of two, calculating indices is fastest: array[x|y<<4] for a 16xN array.
Initialize collections and StringBuilder with an initial capacity chosen such that it prevents internal reallocation in a typical circumstance.
Use StringBuilder instead of string concatenation, because the compiled class files use new StringBuilder() without initial capacity to concatenate strings.
Depends on the application, but generally speaking
Layout data structures in (parallel) arrays of primitives
Try to make big "flat" objects, inlining otherwise sensible sub-structures
Specialize collections of primitives
Reuse objects, use object pools, ThreadLocals
Go off-heap
I cannot say these practices are "best", because they, unfortunately, make you suffer, losing the point why you are using Java, reduce flexibility, supportability, reliability, testability and other "good" properties of the codebase.
But, they certainly allow to lower memory footprint and GC pressure.
One of the memory problems that are easy to overlook in Java is memory leakage. Nicholas Greene already pointed you to memory profiling.
Many people assume that Java's garbage collection prevents memory leaks, but that is not actually true - all it takes is one forgotten reference somewhere to keep an object around in perpetuity. Paradoxically, trying to optimize your program may introduce more opportunities for memory leaks because you end up with more complex data structures.
One example for a memory leak if you are implementing, for instance, a stack:
Integer stack[];
stack = new Integer[10];
int stackPtr = 0;
// a few push operation on our stack.
stack[stackPtr++] = new Integer(5);
stack[stackPtr++] = new Integer(3);
// and pop from the stack again
--stackPtr;
--stackPtr;
// at this point, the stack is logically empty, but
// the Integer objects are still referenced by the array,
// and are basically leaked.
The correct solution would have been:
stack[--stackPtr] = null;
If you have high performance constraints and need to use collections for simple types, you might take a look on some implementations of Primitive Collections for Java.
Some are:
HPPC
GNU Trove
Apache Commons Primitives
Also, as a reference take a look at this question: Why can Java Collections not directly store Primitives types?
Luís Bianchin already gave you a few libraries which implement optimal collections in Java.
Nevertheless, it seems that you are specially concerned about Java collections' memory allocation. In that case, there are a few alternatives which are quite straight forward.
Cache
You could use a cache to limit the memory the collection (the cache) can allocate. By doing that, you only load in main memory the most frequently used entries and you don't need to load the whole data set form disk/network/whatever. I highly recommend Guava Cache as it's very well documented and pretty mature.
Persistent Collections
Sometimes a cache is not a solution for your problem. For example, in an ETL solution, you might know you will only load each entry once. For this scenario I recommend to go for persistent collections. These are disk stored collections that are way faster than traditional databases but have nice Java APIs.
MapDB and PCollections are for me the best libraries.
Profile memory usage
On top of that, if you really want to know the actual state of your program's memory allocation I highly recommend you to use a profiler. This way you will not only know how much memory you collections occupy, but also how the GC behaves over time.
In fact, you should only try an alternative to Java's collections and data structures if there is an actual memory problem, and that is something a profiler can tell you.
The JDK has a profiler called VisualVM which does a great job. Nevertheless, I recommend you to use a commercial profiler if you can afford it. The commercial profilers usually have a low impact in the application's performance when compared to VisualVM.
Memory optimal data is nice with the network.
Finally, that it's not strictly related to your question, but it's closely connected. In case you want to serialize your Java objects into an optimal binary representation I recommend you Google Protocol Buffers in Java. Protocol buffers are ideal to transfer data structures thought the network using the least bandwidth possible and having a really fast coding/decoding.
Well there is a lot of things you can do.
Here are a few problems and solutions:
When you change the value of a string in java, the string is not actually overwritten. Instead, a new string is created to replace the old one. However, the old string still exists. This can be a problem when using RAM efficiently is a concern. Here are some solutions to this problem:
When using a string to specify something like the "state" of an object or anything else that can only have a specific set of possible values, don't use a string. Instead use an enum. If you don't know what an enum is or how to use one yet, here's a link to a tutorial on what enums are and how to use them!
If you are using a string as a variable who's value will change at some point in the program, don't define a string how you usually would. Instead, use the StringBuilder class from the java.lang package. StringBuilder is a class which is used to create strings and change their values. This class handles strings differently than usual. When it is used to change the value of a string, StringBuilder doesn't create a duplicate string with a different value to replace the old string, it actually changes the value of the original string. Therefore, since you aren't creating duplicate strings, this saves RAM. Here is a link to to the StringBuilder class in the java api.
Writer and reader objects such as fileWriters and fileReaders also take up RAM. If you have a lot of them, this can also cause problems. Here are some solutions:
All reader and writer objects have a method called close(). As you can probably guess, it closes the writer or reader object. All it does is get rid of the reader or writer object. Whenever you have a reader or writer object and you reach the point in your code when you know you will never use the reader or writer object anymore, use this method. It will get rid of the reader or writer object and will free some RAM.
Every object in java takes up memory. When you have an object that you won't use anymore, it's not very convenient to keep it around.
The Object class has a method called finalize(). This method has the same effect as the close() method in reader and writer objects. When you aren't going to use an object anymore, use the finalize() method to get rid of it and free some RAM.
Beware of early optimisation.
See When is optimisation premature?
While not knowing the exact requirements of your application or runtime environment, in my experience java was able to handle anything I threw it at. Doing some profiling on your demo /proof of concept app might be time well spent if performance or garbage collection (you tagged memory leaks) is an issue.
I am aware that String are immutable, and on when to use a StringBuilder or a StringBuffer. I also read that the bytecode for these two snippets would end up being the same:
//Snippet 1
String variable = "text";
this.class.getResourceAsStream("string"+variable);
//Snippet 2
StringBuilder sb = new StringBuilder("string");
sb.append("text");
this.class.getResourceAsStream(sb.toString());
But I obviously have something wrong. When debugging through Snippet 1 in eclipse, I am actually taken to the StringBuilder constructor and to the append method. I suppose I'm missing details on how bytecode is interpreted and how the debugger refers back to the lines in the source code; if anyone could explain this a bit, I'd really appreciate it. Also, maybe you can point out what's JVM specific and what isn't (I'm for example running Oracle's v6), Thanks!
Why do StringBuilders pop up when debugging String concatenation?
Because string concatenation (via the '+' operator) is typically compiled to code that uses a StringBuffer or StringBuilder to do the concatenation. The JLS explicitly permits this behaviour.
"An implementation may choose to perform conversion and concatenation in one step to avoid creating and then discarding an intermediate String object. To increase the performance of repeated string concatenation, a Java compiler may use the StringBuffer class or a similar technique to reduce the number of intermediate String objects that are created by evaluation of an expression." JLS 15.18.1.
(If your code is using a StringBuffer rather than a StringBuilder, it is probably because it was compiled using a really old Java compiler, or because you have specified a really old target JVM. The StringBuilder class is a relatively addition to Java. Older versions of the JLS used to mention StringBuffer instead of StringBuilder, IIRC.)
Also, maybe you can point out what's JVM specific and what isn't.
The bytecodes produced for "string" + variable" depend on how the Java compiler handles the concatenation. (In fact, all generated bytecodes are Java compiler dependent to some degree. The JLS and JVM specs do not dictate what bytecodes must be generated. The specifications are more about how the program should behave, and what individual bytecodes do.)
#supercat comments:
I wonder why string concatenation wouldn't use e.g. a String constructor overload which accepts two String objects, allocates a buffer of the proper combined size, and joins them? Or, when joining more strings, an overload which takes a String[]? Creating a String[] containing references to the strings to be joined should be no more expensive than creating a StringBuilder, and being able to create a perfect-sized backing store in one shot should be an easy performance win.
Maybe ... but I'd say probably not. This is a complicated area involving complicated trade-offs. The chosen implementation strategy for string concatenation has to work well across a wide range of different use-cases.
My understanding is that the original strategy was chosen after looking at a number of approaches, and doing some large-scale static code analysis and benchmarking to try to figure out which approach was best. I imagine they considered all of the alternatives that you proposed. (After all, they were / are smart people ...)
Having said that, the complete source code base for Java 6, 7 and 8 are available to you. That means that you could download it, and try some experiments of your own to see if your theories are right. If they are ... and you can gather solid evidence that they are ... then submit a patch to the OpenJDK team.
#StephenC I am still not convinced with the explanation. The compiler may do whatever optimization it wants to do but when you debug through the eclipse the source code view is hidden from compiler code and it should not jump one section of code to another code within the same source file.
The following description in the question suggests that the source code and byte code are not in sync. i.e., he is not running the latest code.
When debugging through Snippet 1 in eclipse, I am actually taken to the StringBuffer constructor and to the append method
and
how the debugger refers back to the lines in the source code
In Thread.java, line 146, I have noticed that the author used a char[] instead of a String for the name field. Are there any performance reasons that I am not aware of? getName() also wraps the character in a String before returning the name. Isn't it better to just use a String?
In general, yes. I suspect char[] was used in Thread for performance reasons, back in the days when such things in Java required every effort to get decent performance. With the advent of modern JVMs, such micro-optimizations have long since become unimportant, but it's just been left that way.
There's a lot of weird code in the old Java 1.0-era source, I wouldn't pay too much attention to it.
Hard to say. Perhaps they had some optimizations in mind, perhaps the person who wrote this code was simply more used to the C-style char* arrays for strings, or perhaps by the time this code was written they were not sure if strings will be immutable or not. But with this code, any time a Thread.getName() is called, a new char array is created, so this code is actually heavier on the GC than just using a string.
Maybe the reason was security protection? String can be changed with reflection, so the author wants copy on read and write. If you are doing that, you might as well use char array for faster copying.
Was recently reviewing some Java Swing code and saw this:
byte[] fooReference;
String getFoo() {
returns new String(fooReference);
}
void setFoo(String foo) {
this.fooReference = foo.getBytes();
}
The above can be useful to save on your memory foot print or so I'm told.
Is this overkill is anyone else encapsulating their Strings in this way?
That's a really, really bad idea. Don't use the platform default encoding. There's nothing to say that if you call setFoo and then getFoo that you'll get back the same data.
If you must do something like this, then use UTF-8 which can represent the whole of Unicode for certain... but I really wouldn't do it. It potentially saves some memory, but at the cost of performing conversions unnecessarily for most of the time - and being error-prone, in terms of failing to use an appropriate encoding.
I dare say there are some applications where this would be appropriate, but for 99.99% of them, it's a terrible idea.
This is not really useful:
1. You are copying the string every time getFoo or setFoo are called, therefore increasing both CPU and memory usage
2. It's obscure
A little historical excursion...
Using byte arrays instead of String objects actually used to have some considerable advantages in the early days of Java (1.0/1.1) if you could be sure that you would never need anything outside of ISO-8859-1. With the VMs of that time it was more than 10 times faster to use drawBytes() compared to drawString() and it actually does save memory which was still very scarce at that time and applets used to have a hard coded memory barrier of 32 and later 64 MB anyway. Not only is a byte[] smaller than the embedded char[] of String objects but you could also save the comparatively heavy String object itself which did make quite a difference if you had lots of short strings. Besides that accessing a plain byte array is also faster than using the accessor methods of String with all their extra bounds checks.
But since drawBytes ceased to be any faster in Java 1.2 and since current JITs are much better than the Symantec JIT of that time the remaining minimal performance advantage of byte[] arrays over strings is no longer worth the hassle. The memory advantage is still there and it might thus still be an option in some very rare extreme scenarios but nowadays it's nothing that should be considered if it's not really necessary.
It may well be overkill, and it may even consume more memory, since you now have two copies of the string. How long the actual string lives depends upon the client, but as with many such hacks, it smells a lot like premature optimization.
If you anticipate that you'll have a lot of identical strings, another much better way you can save memory is with the String.intern() method.
Each call to getFoo() is instantiating a new String. How is this saving memory? If anything you're adding additional overhead for your garbage collector to go and clean up these new instances when these new references become unreferenced
This does indeed not make any sense. If it were a compile time constant which you don't need to massage back to a String, then it would make a bit more sense. You still have the character encoding problem.
It would make more sense to me if it were a char[] constant. In real world there are several JSP compilers which optimizes String constants away into a char[] which in turn can easily be written to a Writer#write(char[]). This is finally "slightly" more efficient, but those little bits counts a lot in large and heavily used applications like Google Search and so on.
Tomcat's JSP compiler Jasper does this as well. Check the genStringAsCharArray setting. It does then like so
static final char[] text1 = "some static text".toCharArray();
instead of
static final String text1 = "some static text";
which ends up with less overhead. It doesn't need a whole String instance around those characters.
If, after profiling your code, you find that memory usage for strings is a problem, you're much better off using a general string compressor and storing compressed strings, rather than trying to use UTF-8 strings for the minor reduction in space they give you. With English language strings, you can generally compress them to 1-2 bits per character; most other languages are probably similar. Getting to <1 bit per character is hard, but possible if you have a lot of data.