I am try to execute below lines of code. My understanding was both both #1 and #2 should generate string in String pool and hence there should not be any difference in both the executions, but when I analysed the Heap dump, in case of intern() string were being generated in String pool(can be interpreted by limited number of string objects) but in case of #1 String are being generated on Heap(as large number of string objects are there in heap dump) and system is going out of memory faster than the previous case. Can somebody explain why it is so? I am using java 6 to execute below lines of code.
import java.util.LinkedList;
public class LotsOfStrings {
private static final LinkedList<String> LOTS_OF_STRINGS = new LinkedList<String>();
public static void main(String[] args) throws Exception {
int iteration = 0;
while (true) {
for (int i = 0; i < 100; i++) {
for (int j = 0; j < 1000; j++) {
String s= "String " + j;
LOTS_OF_STRINGS.add(s); // #1
//LOTS_OF_STRINGS.add(new String("String " + j).intern()); //#2
}
}
iteration++;
System.out.println("Survived Iteration: " + iteration);
Thread.sleep(100);
}
}
Heap dump object screenshot in case if intern
intern
Heap dump object screenshot in case of #1
string
If you create a String without interning it, it just goes to the heap. So there can be multiple copies of equal strings. If you intern the string, there will be only one string for each equality class.
Creating the string "String" + j multiple times for the same j is much more memory consuming without interning the strings.
Interning saves memory, but it also can slow down the program, because every string is held in some kind of HashSet and creating a string implies looking up if it already exists in that HashSet.
Note: Some strings are interned automatically, e.g. String literals in Source Code.
Related
I have a task which is to create a memory efficient method that takes a String consisting of numbers and removes any beginning zeros.
For instance "001112" becomes "1112".
public static String hej (String v)
{
StringBuilder h = new StringBuilder(v);
while(true)
{
if (h.charAt(0) == '0')
h.deleteCharAt(0);
else
break;
}
return h.toString();
}
This is my solution. Of course it does the work, but my question is, is it memory efficient to use the StringBuilder or is it more efficient to use the String itself and for instance v.substring()? Cant find too much information about what is more efficient. If anyone has links to some documentation, please share them.
Cheers
Using the String.substring(int) method will be the least memory used
public static String hej(String input)
{
int i;
for(i = 0; i < input.length(); i++)
if(input.charAt(i) != '0')
break;
return input.substring(i);
}
Source code from String:
public String substring(int beginIndex) {
if (beginIndex < 0) {
throw new StringIndexOutOfBoundsException(beginIndex);
}
int subLen = value.length - beginIndex;
if (subLen < 0) {
throw new StringIndexOutOfBoundsException(subLen);
}
return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);
}
This calls the String(char[], int, int) constructor
public String(char value[], int offset, int count) {
if (offset < 0) {
throw new StringIndexOutOfBoundsException(offset);
}
if (count <= 0) {
if (count < 0) {
throw new StringIndexOutOfBoundsException(count);
}
if (offset <= value.length) {
this.value = "".value;
return;
}
}
// Note: offset or count might be near -1>>>1.
if (offset > value.length - count) {
throw new StringIndexOutOfBoundsException(offset + count);
}
this.value = Arrays.copyOfRange(value, offset, offset+count);
}
Using a StringBuilder uses a bit of memory to create the StringBuilder for the size of the input, while using String.substring(int) will just use up as much memory as needed to represent the modified input
If your string were to have n amount of leading zeros, then using String instead of StringBuilder would consume n times more memory. You know that String creates a new space in memory everytime some char changes in it so StringBuilder is the way to go.
Keep in mind
Every string builder has a capacity. As long as the length of the
character sequence contained in the string builder does not exceed the
capacity, it is not necessary to allocate a new internal buffer. If
the internal buffer overflows, it is automatically made larger.
Oracle Docs
So
String
String is immutable ( once created can not be changed )object . The
object created as a String is stored in the Constant String Pool .
Every immutable object in Java is thread safe ,that implies String is
also thread safe . String can not be used by two threads
simultaneously. String once assigned can not be changed.
String demo = " hello " ; // The above object is stored in constant
string pool and its value can not be modified.
demo="Bye" ; //new "Bye" string is created in constant pool and
referenced by the demo variable // "hello" string still
exists in string constant pool and its value is not overrided but we
lost reference to the "hello"string
StringBuffer
StringBuffer is mutable means one can change the value of the object .
The object created through StringBuffer is stored in the heap .
StringBuffer has the same methods as the StringBuilder , but each
method in StringBuffer is synchronized that is StringBuffer is thread
safe .
Due to this it does not allow two threads to simultaneously access
the same method . Each method can be accessed by one thread at a time
.
But being thread safe has disadvantages too as the performance of the
StringBuffer hits due to thread safe property . Thus StringBuilder is
faster than the StringBuffer when calling the same methods of each
class.
StringBuffer value can be changed , it means it can be assigned to the
new value . Nowadays its a most common interview question ,the
differences between the above classes . String Buffer can be converted
to the string by using toString() method.
StringBuffer demo1 = new StringBuffer("Hello") ; // The above object
stored in heap and its value can be changed . demo1=new
StringBuffer("Bye"); // Above statement is right as it modifies the
value which is allowed in the StringBuffer
Java Hungry
In the past, I've always used printf to format printing to the console but the assignment I currently have (creating an invoice report) wants us to use StringBuilder, but I have no idea how to do so without simply using " " for every gap needed. For example... I'm supposed to print this out
Invoice Customer Salesperson Subtotal Fees Taxes Discount Total
INV001 Company Eccleston, Chris $ 2357.60 $ 40.00 $ 190.19 $ -282.91 $ 2304.88
But I don't know how to get everything to line up using the StringBuilder. Any advice?
StringBuilder aims to reduce the overhead associated with creating strings.
As you may or may not know, strings are immutable. What this means that something like
String a = "foo";
String b = "bar";
String c = a + b;
String d = c + c;
creates a new string for each line. If all we are concerned about is the final string d, the line with string c is wasting space because it creates a new String object when we don't need it.
String builder simply delays actually building the String object until you call .toString(). At that point, it converts an internal char[] to an actual string.
Let's take another example.
String foo() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 100; i++)
sb.append(i);
return sb.toString();
}
Here, we only create one string. StringBuilder will keep track of the chars you have added to your string in its internal char[] value. Note that value.length will generally be larger than the total chars you have added to your StringBuilder, but value might run out of room for what you're appending if the string you are building gets too big. When that happens, it'll resize, which just means replacing value with a larger char[], and copying over the old values to the new array, along with the chars of whatever you appended.
Finally, when you call sb.toString(), the StringBuilder will call a String constructor that takes an argument of a char[].
That means only one String object was created, and we only needed enough memory for our char[] and to resize it.
Compare with the following:
String foo() {
String toReturn = "";
for (int i = 0; i < 100; i++)
toReturn += "" + i;
toReturn;
}
Here, we have 101 string objects created (maybe more, I'm unsure). We only needed one though! This means that at every call, we're disposing the original string toReturn represented, and creating another string.
With a large string, especially, this is very expensive, because at every call you need to first acquire as much memory as the new string needs, and dispose of as much memory as the old string had. It's not a big deal when things are kept short, but when you're working with entire files this can easily become a problem.
In a nutshell: if you're working appending / removing information before finalizing an output: use a StringBuilder. If your strings are very short, I think it is OK to just concatenate normally for convenience, but this is up to you to define what "short" is.
question - Arrange all the distinct substrings of a given string in lexicographical order and concatenate them. Print the Kth character of the concatenated string. It is assured that given value of K will be valid i.e. there will be a Kth character
Input Format
First line will contain a number T i.e. number of test cases.
First line of each test case will contain a string containing characters (a−z) and second line will contain a number K.
Output Format
Print Kth character ( the string is 1 indexed )
Constraints
1≤T≤5
1≤length≤105
K will be an appropriate integer.
Sample Input #00
1
dbac
3
Sample Output #00
c
Explanation #00
The substrings when arranged in lexicographic order are as follows
a, ac, b, ba, bac, c, d, db, dba, dbac
On concatenating them, we get
aacbbabaccddbdbadbac
The third character in this string is c and hence the answer.
This is my code :
import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;
public class Solution
{
public static void gen(String str,int k)
{
int i,c;ArrayList<String>al=new ArrayList<String>();
for(c=0;c<str.length();c++)
{
for(i=1;i<=str.length()-c;i++)
{
String sub = str.substring(c,c+i);
al.add(sub);
}
}
HashSet hs = new HashSet();
hs.addAll(al);
al.clear();
al.addAll(hs);
String[] res = al.toArray(new String[al.size()]);
Arrays.sort(res);
StringBuilder sb= new StringBuilder();
for(String temp:res)
{
sb.append(temp);
}
String s = sb.toString();
System.out.println(s.charAt(k-1));
}
public static void main(String[] args)
{
Scanner sc = new Scanner (System.in);
int t = Integer.parseInt(sc.nextLine());
while((t--)>0)
{
String str = sc.nextLine();
int k = Integer.parseInt(sc.nextLine());
gen(str,k);
}
}
}
This code worked well for small inputs like for above test case but for large input's it either times out or shows something like this i do understand that problem is with memory , any alternate method to do this question or anyway to reuse the same memory??
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOfRange(Arrays.java:2694)
at java.lang.String.<init>(String.java:203)
at java.lang.String.substring(String.java:1913)
at Solution.gen(Solution.java:19)
at Solution.main(Solution.java:54)
With the constraints you are given (up to 105 characters) you shouldn't be having out-of-memory problems. Perhaps you were testing with very big strings.
So in case you have, here are some places where you are wasting memory:
After you fill the set, you copy it to your list. This means two copies of the collection of substrings, while you are not going to use the set any more.
After you copy the list to an array, you now have three copies of the collection of substrings, although you are not going to use the list anymore.
Now you create a StringBuilder and put all the substrings into it. But it's not really interesting to know the entire concatenated string. We only need one character in it, so why put the concatenation in memory at all? In addition, in all the wasteful copies above, at least you didn't duplicate the substrings themselves. But now that you are appending them to the StringBuilder, you are creating a duplicate of them. And that's going to be a very long string.
And then you copy the StringBuilder's content to a new string by using toString(). This creates a copy of the very large concatenated string (which we already said we don't actually need).
You already got a sound advice of using a TreeSet and filling it directly rather than creating a list, a set, and a sorted list. The next step is to extract the correct character from that set without actually keeping the concatenated string around.
So, assuming your set is called set:
Iterator<String> iter = set.iterator();
int lengthSoFar = 0;
String str = null;
while ( lengthSoFar < k && iter.hasNext() ) {
str = iter.next(); // Got the next substring;
lengthSoFar += str.length();
}
// At this point we have the substring where we expect the k'th
// character to be.
System.out.println( str.charAt( k - lengthSoFar + str.length() - 1 );
Note that it will take the program longer to get to high values of k than low values, but generally it will be faster than building the whole concatenated string, because you'll stop as soon as you get to the correct substring.
You are running out of memory. You can increase the memory that the JVM is using by using starting the JVM with -Xms256m -Xmx1024 and you can try some optimizations.
public static void gen(String str, int k) {
int i, c;
//Adding directly to the Set prevents a larger list because you remove the duplicates
Set<String> set = new TreeSet<String>();
for (c = 0; c < str.length(); c++) {
for (i = 1; i <= str.length() - c; i++) {
String sub = str.substring(c, c + i);
set.add(sub);
}
}
//TreeSet already orders by the String comparator
StringBuilder sb = new StringBuilder();
for (String temp : set) {
sb.append(temp);
if(sb.length()>k){
break;
}
}
String s = sb.toString();
System.out.println(s.charAt(k - 1));
}
[EDIT] Added small performance boost. Try it to see if it gets faster or not, I did not look at the performance of StringBuilder.length() to see if it will improve or decrease.
This question already has answers here:
What is the difference between strings allocated using new operator & without new operator in java J2ME?
(5 answers)
Closed 8 years ago.
With new operator String create the string in heap and put a copy in string const pool so the result of hashcode is same in below case;
String s1 = new String("Test");
String s2 = new String("Test");
System.out.println(s1.hashCode() + " "+ s2.hashCode() + " " + s1.equals(s2));
But without using new operator its still giving the same hashcode
String s1 = new String("Test");
String s2 = "Test";
System.out.println(s1.hashCode() + " "+ s2.hashCode() + " " + s1.equals(s2));
Then what is the differnce between above two notation of string creation although they are referening to same string in string const. pool
Based on Effective java.
It is often appropriate to reuse a single object instead of creating a
new function- ally equivalent object each time it is needed. Reuse can
be both faster and more stylish. An object can always be reused if it
is immutable. As an extreme example of what not to do,
consider this statement:
String s = new String("stringette"); // DON'T
DO THIS!
The statement creates a new String instance each time it is
executed, and none of those object creations is necessary. The
argument to the String construc- tor ( "stringette" ) is itself a
String instance, functionally identical to all of the objects created
by the constructor. If this usage occurs in a loop or in a frequently
invoked method, millions of String instances can be created
needlessly.
The improved version is simply the following:
String s = "stringette";
This version uses a single String instance, rather than creating a new
one each time it is executed. Furthermore, it is guaranteed that the
object will be reused by any other code running in the same virtual
machine that happens to con- tain the same string literal
therefore creating unnecessary new Object of String or any other Objects are expensive.
From docs and Java.lang.String class, Internal Implementation of hashCode()
/**
* Returns a hash code for this string. The hash code for a
* String object is computed as
*
* s[0]*31^(n-1) + s1*31^(n-2) + ... + s[n-1]
*
* using int arithmetic, where s[i] is the
* ith character of the string, n is the length of
* the string, and ^ indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
int len = count;
if (h == 0 && len > 0) {
int off = offset;
char val[] = value;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
More about String hashcode here
Both expression gives you String object, but there is difference between them. When you create String object using new() operator, it always create a new object in heap memory. On the other hand, if you create object using String literal syntax e.g. String s2 = "Test"; it may return an existing object from String pool (a cache of String object in Perm gen space, which is now moved to heap space in recent Java release), if it's already exists. Otherwise it will create a new string object and put in string pool for future re-use.
for further reading see:here
String str = new String("String");
always create a new object on the heap.
Here creates a new String having for value the value of the constant "String" and assignates its reference to the variable str.
String str = "String";
uses the String pool
Here assignates the reference associated to the constant "String" to the variable str
This question has been asked many times on StackOverflow but none of them were based on performance.
In Effective Java book it's given that
If String s = new String("stringette"); occurs in a loop or in a
frequently invoked method, millions of String instances can be created
needlessly.
The improved version is simply the following:
String s = "stringette"; This version uses a single String instance, rather than
creating a new one each time it is executed.
So, I tried both and found significant improvement in performance:
for (int j = 0; j < 1000; j++) {
String s = new String("hello World");
}
takes about 399 372 nanoseconds.
for (int j = 0; j < 1000; j++) {
String s = "hello World";
}
takes about 23 000 nanoseconds.
Why is there so much performance improvement? Is there any compiler optimization happening inside?
In the first case, a new object is being created in each iteration, in the second case, it's always the same object, being retrieved from the String constant pool.
In Java, when you do:
String bla = new String("xpto");
You force the creation of a new String object, this takes up some time and memory.
On the other hand, when you do:
String muchMuchFaster = "xpto"; //String literal!
The String will only be created the first time (a new object), and it'll be cached in the String constant pool, so every time you refer to it in it's literal form, you're getting the exact same object, which is amazingly fast.
Now you may ask... what if two different points in the code retrieve the same literal and change it, aren't there problems bound to happen?!
No, because Strings, in Java, as you may very well know, are immutable! So any operation that would mutate a String returns a new String, leaving any other references to the same literal happy on their way.
This is one of the advantages of immutable data structures, but that's another issue altogether, and I would write a couple of pages on the subject.
Edit
Just a clarification, the constant pool isn't exclusive to String types, you can read more about it here, or if you google for Java constant pool.
http://docs.oracle.com/javase/specs/jvms/se7/jvms7.pdf
Also, a little test you can do to drive the point home:
String a = new String("xpto");
String b = new String("xpto");
String c = "xpto";
String d = "xpto";
System.out.println(a == b);
System.out.println(a == c);
System.out.println(c == d);
With all this, you can probably figure out the results of these Sysouts:
false
false
true
Since c and d are the same object, the == comparison holds true.
The performance difference is in fact much greater: HotSpot has an easy time compiling the entire loop
for (int j = 0; j < 1000; j++)
{String s="hello World";}
out of existence so the runtime is a solid 0. This, however, happens only after the JIT compiler kicks in; that's what warmup is for, a mandatory procedure when microbenchmarking anything on the JVM.
This is the code I ran:
public static void timeLiteral() {
for (int j = 0; j < 1_000_000_000; j++)
{String s="hello World";}
}
public static void main(String... args) {
for (int i = 0; i < 10; i++) {
final long start = System.nanoTime();
timeLiteral();
System.out.println((System.nanoTime() - start) / 1000);
}
}
And this is a typical output:
1412
38
25
1
1
0
0
1
0
1
You can observe the JIT taking effect very soon.
Note that I don't iterate one thousand, but one billion times in the inner method.
as already have been answered the second retrieves the instance from the String pool (remember Strings are immutable).
Additionally you should check the intern() method which enables you to put new String() into a pool in case you do not know the constant value of the string in runtime: e.g:
String s = stringVar.intern();
or
new String(stringVar).intern();
I will add additional fact, you should know that additionally to the String object more info exist in the pool (the hashcode): this enables fast hashMap search by String in the relevant data Strtuctures (instead of recreating the hashcode each time)
The JVM maintains a pool of references to unique String objects that are literals. In your new String example you are wrapping the literals with an instance of each.
See http://www.precisejava.com/javaperf/j2se/StringAndStringBuffer.htm