Related
I'm making a socket application in Java that receives some HTML data from the server in ASCII and then parse the data accordingly.
byte[] receivedContent = new byte[12500];
receivedSize = inputStream.read(receivedContent);
receivedContent = Arrays.copyOf(receivedContent, receivedSize+1);
if (receivedSize == -1) {
System.out.println("ERROR! NO DATA RECEIVED");
System.exit(-1);
}
lastReceived = new String(receivedContent, StandardCharsets.US_ASCII);
This should really be quite straight forward but it's not. I printed out some debug messages and found that despite receiving some bytes of data, (for exmaple priting receivedSize tells me its received 784 bytes), the resulting string from those bytes is only a few chars long, like this:
Ard</a></li><li><a
I'm expecting a full HTML document, and so this is clearly wrong. There's also no obvious pattern as to when might this happen. It seems totally random. Since I'm allocating new memory for the buffer there really shouldn't be any old data in it that messes with the new data from the socket. Can someone shed some light on this strange behavior? Also this seems to happen less frequently on my Windows machine running OracleJDK rather than my remote Ubunut machine that runs OpenJDK, could that be the reason and how would I fix that?
UPDATE:
at the end I manually inspected the byte array's ASCII encoding against a ASCII table and found that the server is intentionally sending garbled data. Mystery solved.
Instead of using:
inputStream.read(receivedContent);
You need to read all data from the stream. Using something like (from apache commons io):
IOUtils.readFully(inputStream, receivedContent)
In Swing, the password field has a getPassword() (returns char[]) method instead of the usual getText() (returns String) method. Similarly, I have come across a suggestion not to use String to handle passwords.
Why does String pose a threat to security when it comes to passwords?
It feels inconvenient to use char[].
Strings are immutable. That means once you've created the String, if another process can dump memory, there's no way (aside from reflection) you can get rid of the data before garbage collection kicks in.
With an array, you can explicitly wipe the data after you're done with it. You can overwrite the array with anything you like, and the password won't be present anywhere in the system, even before garbage collection.
So yes, this is a security concern - but even using char[] only reduces the window of opportunity for an attacker, and it's only for this specific type of attack.
As noted in the comments, it's possible that arrays being moved by the garbage collector will leave stray copies of the data in memory. I believe this is implementation-specific - the garbage collector may clear all memory as it goes, to avoid this sort of thing. Even if it does, there's still the time during which the char[] contains the actual characters as an attack window.
While other suggestions here seem valid, there is one other good reason. With plain String you have much higher chances of accidentally printing the password to logs, monitors or some other insecure place. char[] is less vulnerable.
Consider this:
public static void main(String[] args) {
Object pw = "Password";
System.out.println("String: " + pw);
pw = "Password".toCharArray();
System.out.println("Array: " + pw);
}
Prints:
String: Password
Array: [C#5829428e
To quote an official document, the Java Cryptography Architecture guide says this about char[] vs. String passwords (about password-based encryption, but this is more generally about passwords of course):
It would seem logical to collect and store the password in an object
of type java.lang.String. However, here's the caveat: Objects of
type String are immutable, i.e., there are no methods defined that
allow you to change (overwrite) or zero out the contents of a String
after usage. This feature makes String objects unsuitable for
storing security sensitive information such as user passwords. You
should always collect and store security sensitive information in a
char array instead.
Guideline 2-2 of the Secure Coding Guidelines for the Java Programming Language, Version 4.0 also says something similar (although it is originally in the context of logging):
Guideline 2-2: Do not log highly sensitive information
Some information, such as Social Security numbers (SSNs) and
passwords, is highly sensitive. This information should not be kept
for longer than necessary nor where it may be seen, even by
administrators. For instance, it should not be sent to log files and
its presence should not be detectable through searches. Some transient
data may be kept in mutable data structures, such as char arrays, and
cleared immediately after use. Clearing data structures has reduced
effectiveness on typical Java runtime systems as objects are moved in
memory transparently to the programmer.
This guideline also has implications for implementation and use of
lower-level libraries that do not have semantic knowledge of the data
they are dealing with. As an example, a low-level string parsing
library may log the text it works on. An application may parse an SSN
with the library. This creates a situation where the SSNs are
available to administrators with access to the log files.
Character arrays (char[]) can be cleared after use by setting each character to zero and Strings not. If someone can somehow see the memory image, they can see a password in plain text if Strings are used, but if char[] is used, after purging data with 0's, the password is secure.
Some people believe that you have to overwrite the memory used to store the password once you no longer need it. This reduces the time window an attacker has to read the password from your system and completely ignores the fact that the attacker already needs enough access to hijack the JVM memory to do this. An attacker with that much access can catch your key events making this completely useless (AFAIK, so please correct me if I am wrong).
Update
Thanks to the comments I have to update my answer. Apparently there are two cases where this can add a (very) minor security improvement as it reduces the time a password could land on the hard drive. Still I think it's overkill for most use cases.
Your target system may be badly configured or you have to assume it is and you have to be paranoid about core dumps (can be valid if the systems are not managed by an administrator).
Your software has to be overly paranoid to prevent data leaks with the attacker gaining access to the hardware - using things like TrueCrypt (discontinued), VeraCrypt, or CipherShed.
If possible, disabling core dumps and the swap file would take care of both problems. However, they would require administrator rights and may reduce functionality (less memory to use) and pulling RAM from a running system would still be a valid concern.
I don't think this is a valid suggestion, but, I can at least guess at the reason.
I think the motivation is wanting to make sure that you can erase all trace of the password in memory promptly and with certainty after it is used. With a char[] you could overwrite each element of the array with a blank or something for sure. You can't edit the internal value of a String that way.
But that alone isn't a good answer; why not just make sure a reference to the char[] or String doesn't escape? Then there's no security issue. But the thing is that String objects can be intern()ed in theory and kept alive inside the constant pool. I suppose using char[] forbids this possibility.
The answer has already been given, but I'd like to share an issue that I discovered lately with Java standard libraries. While they take great care now of replacing password strings with char[] everywhere (which of course is a good thing), other security-critical data seems to be overlooked when it comes to clearing it from memory.
I'm thinking of e.g. the PrivateKey class. Consider a scenario where you would load a private RSA key from a PKCS#12 file, using it to perform some operation. Now in this case, sniffing the password alone wouldn't help you much as long as physical access to the key file is properly restricted. As an attacker, you would be much better off if you obtained the key directly instead of the password. The desired information can be leaked manifold, core dumps, a debugger session or swap files are just some examples.
And as it turns out, there is nothing that lets you clear the private information of a PrivateKey from memory, because there's no API that lets you wipe the bytes that form the corresponding information.
This is a bad situation, as this paper describes how this circumstance could be potentially exploited.
The OpenSSL library for example overwrites critical memory sections before private keys are freed. Since Java is garbage-collected, we would need explicit methods to wipe and invalidate private information for Java keys, which are to be applied immediately after using the key.
As Jon Skeet states, there is no way except by using reflection.
However, if reflection is an option for you, you can do this.
public static void main(String[] args) {
System.out.println("please enter a password");
// don't actually do this, this is an example only.
Scanner in = new Scanner(System.in);
String password = in.nextLine();
usePassword(password);
clearString(password);
System.out.println("password: '" + password + "'");
}
private static void usePassword(String password) {
}
private static void clearString(String password) {
try {
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
char[] chars = (char[]) value.get(password);
Arrays.fill(chars, '*');
} catch (Exception e) {
throw new AssertionError(e);
}
}
when run
please enter a password
hello world
password: '***********'
Note: if the String's char[] has been copied as a part of a GC cycle, there is a chance the previous copy is somewhere in memory.
This old copy wouldn't appear in a heap dump, but if you have direct access to the raw memory of the process you could see it. In general you should avoid anyone having such access.
There is nothing that char array gives you vs String unless you clean it up manually after use, and I haven't seen anyone actually doing that. So to me the preference of char[] vs String is a bit exaggerated.
Take a look at the widely used Spring Security library here and ask yourself - are Spring Security guys incompetent or char[] passwords just don't make much sense. When some nasty hacker grabs memory dumps of your RAM be sure s/he'll get all the passwords even if you use sophisticated ways to hide them.
However, Java changes all the time, and some scary features like String Deduplication feature of Java 8 might intern String objects without your knowledge. But that's a different conversation.
Edit: Coming back to this answer after a year of security research, I realize it makes the rather unfortunate implication that you would ever actually compare plaintext passwords. Please don't. Use a secure one-way hash with a salt and a reasonable number of iterations. Consider using a library: this stuff is hard to get right!
Original answer: What about the fact that String.equals() uses short-circuit evaluation, and is therefore vulnerable to a timing attack? It may be unlikely, but you could theoretically time the password comparison in order to determine the correct sequence of characters.
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
// Quits here if Strings are different lengths.
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
// Quits here at first different character.
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
Some more resources on timing attacks:
A Lesson In Timing Attacks
A discussion about timing attacks over on Information Security Stack Exchange
And of course, the Timing Attack Wikipedia page
Strings are immutable and cannot be altered once they have been created. Creating a password as a string will leave stray references to the password on the heap or on the String pool. Now if someone takes a heap dump of the Java process and carefully scans through he might be able to guess the passwords. Of course these non used strings will be garbage collected but that depends on when the GC kicks in.
On the other side char[] are mutable as soon as the authentication is done you can overwrite them with any character like all M's or backslashes. Now even if someone takes a heap dump he might not be able to get the passwords which are not currently in use. This gives you more control in the sense like clearing the Object content yourself vs waiting for the GC to do it.
String is immutable and it goes to the string pool. Once written, it cannot be overwritten.
char[] is an array which you should overwrite once you used the password and this is how it should be done:
char[] passw = request.getPassword().toCharArray()
if (comparePasswords(dbPassword, passw) {
allowUser = true;
cleanPassword(passw);
cleanPassword(dbPassword);
passw = null;
}
private static void cleanPassword (char[] pass) {
Arrays.fill(pass, '0');
}
One scenario where the attacker could use it is a crashdump—when the JVM crashes and generates a memory dump—you will be able to see the password.
That is not necessarily a malicious external attacker. This could be a support user that has access to the server for monitoring purposes. He/she could peek into a crashdump and find the passwords.
The short and straightforward answer would be because char[] is mutable while String objects are not.
Strings in Java are immutable objects. That is why they can't be modified once created, and therefore the only way for their contents to be removed from memory is to have them garbage collected. It will be only then when the memory freed by the object can be overwritten, and the data will be gone.
Now garbage collection in Java doesn't happen at any guaranteed interval. The String can thus persist in memory for a long time, and if a process crashes during this time, the contents of the string may end up in a memory dump or some log.
With a character array, you can read the password, finish working with it as soon as you can, and then immediately change the contents.
Case String:
String password = "ill stay in StringPool after Death !!!";
// some long code goes
// ...Now I want to remove traces of password
password = null;
password = "";
// above attempts wil change value of password
// but the actual password can be traced from String pool through memory dump, if not garbage collected
Case CHAR ARRAY:
char[] passArray = {'p','a','s','s','w','o','r','d'};
// some long code goes
// ...Now I want to remove traces of password
for (int i=0; i<passArray.length;i++){
passArray[i] = 'x';
}
// Now you ACTUALLY DESTROYED traces of password form memory
A string in Java is immutable. So whenever a string is created, it will remain in memory until it is garbage collected. So anyone who has access to the memory can read the value of the string.
If the value of the string is modified then it will end up creating a new string. So both the original value and the modified value stay in the memory until it is garbage collected.
With the character array, the contents of the array can be modified or erased once the purpose of the password is served. The original contents of the array will not be found in memory after it is modified and even before the garbage collection kicks in.
Because of the security concern it is better to store password as a character array.
It is debatable as to whether you should use String or use Char[] for this purpose because both have their advantages and disadvantages. It depends on what the user needs.
Since Strings in Java are immutable, whenever some tries to manipulate your string it creates a new Object and the existing String remains unaffected. This could be seen as an advantage for storing a password as a String, but the object remains in memory even after use. So if anyone somehow got the memory location of the object, that person can easily trace your password stored at that location.
Char[] is mutable, but it has the advantage that after its usage the programmer can explicitly clean the array or override values. So when it's done being used it is cleaned and no one could ever know about the information you had stored.
Based on the above circumstances, one can get an idea whether to go with String or to go with Char[] for their requirements.
A lot of the previous answers are great. There is another point which I am assuming (please correct me if I am wrong).
By default Java uses UTF-16 for storing strings. Using character arrays, char[] array, facilitates use of Unicode, regional characters, etc. This technique allows all character set to be respected equally for storing the passwords and henceforth will not initiate certain crypto issues due to character set confusion. Finally, using the character array, we can convert the password array to our desired character set string.
In Swing, the password field has a getPassword() (returns char[]) method instead of the usual getText() (returns String) method. Similarly, I have come across a suggestion not to use String to handle passwords.
Why does String pose a threat to security when it comes to passwords?
It feels inconvenient to use char[].
Strings are immutable. That means once you've created the String, if another process can dump memory, there's no way (aside from reflection) you can get rid of the data before garbage collection kicks in.
With an array, you can explicitly wipe the data after you're done with it. You can overwrite the array with anything you like, and the password won't be present anywhere in the system, even before garbage collection.
So yes, this is a security concern - but even using char[] only reduces the window of opportunity for an attacker, and it's only for this specific type of attack.
As noted in the comments, it's possible that arrays being moved by the garbage collector will leave stray copies of the data in memory. I believe this is implementation-specific - the garbage collector may clear all memory as it goes, to avoid this sort of thing. Even if it does, there's still the time during which the char[] contains the actual characters as an attack window.
While other suggestions here seem valid, there is one other good reason. With plain String you have much higher chances of accidentally printing the password to logs, monitors or some other insecure place. char[] is less vulnerable.
Consider this:
public static void main(String[] args) {
Object pw = "Password";
System.out.println("String: " + pw);
pw = "Password".toCharArray();
System.out.println("Array: " + pw);
}
Prints:
String: Password
Array: [C#5829428e
To quote an official document, the Java Cryptography Architecture guide says this about char[] vs. String passwords (about password-based encryption, but this is more generally about passwords of course):
It would seem logical to collect and store the password in an object
of type java.lang.String. However, here's the caveat: Objects of
type String are immutable, i.e., there are no methods defined that
allow you to change (overwrite) or zero out the contents of a String
after usage. This feature makes String objects unsuitable for
storing security sensitive information such as user passwords. You
should always collect and store security sensitive information in a
char array instead.
Guideline 2-2 of the Secure Coding Guidelines for the Java Programming Language, Version 4.0 also says something similar (although it is originally in the context of logging):
Guideline 2-2: Do not log highly sensitive information
Some information, such as Social Security numbers (SSNs) and
passwords, is highly sensitive. This information should not be kept
for longer than necessary nor where it may be seen, even by
administrators. For instance, it should not be sent to log files and
its presence should not be detectable through searches. Some transient
data may be kept in mutable data structures, such as char arrays, and
cleared immediately after use. Clearing data structures has reduced
effectiveness on typical Java runtime systems as objects are moved in
memory transparently to the programmer.
This guideline also has implications for implementation and use of
lower-level libraries that do not have semantic knowledge of the data
they are dealing with. As an example, a low-level string parsing
library may log the text it works on. An application may parse an SSN
with the library. This creates a situation where the SSNs are
available to administrators with access to the log files.
Character arrays (char[]) can be cleared after use by setting each character to zero and Strings not. If someone can somehow see the memory image, they can see a password in plain text if Strings are used, but if char[] is used, after purging data with 0's, the password is secure.
Some people believe that you have to overwrite the memory used to store the password once you no longer need it. This reduces the time window an attacker has to read the password from your system and completely ignores the fact that the attacker already needs enough access to hijack the JVM memory to do this. An attacker with that much access can catch your key events making this completely useless (AFAIK, so please correct me if I am wrong).
Update
Thanks to the comments I have to update my answer. Apparently there are two cases where this can add a (very) minor security improvement as it reduces the time a password could land on the hard drive. Still I think it's overkill for most use cases.
Your target system may be badly configured or you have to assume it is and you have to be paranoid about core dumps (can be valid if the systems are not managed by an administrator).
Your software has to be overly paranoid to prevent data leaks with the attacker gaining access to the hardware - using things like TrueCrypt (discontinued), VeraCrypt, or CipherShed.
If possible, disabling core dumps and the swap file would take care of both problems. However, they would require administrator rights and may reduce functionality (less memory to use) and pulling RAM from a running system would still be a valid concern.
I don't think this is a valid suggestion, but, I can at least guess at the reason.
I think the motivation is wanting to make sure that you can erase all trace of the password in memory promptly and with certainty after it is used. With a char[] you could overwrite each element of the array with a blank or something for sure. You can't edit the internal value of a String that way.
But that alone isn't a good answer; why not just make sure a reference to the char[] or String doesn't escape? Then there's no security issue. But the thing is that String objects can be intern()ed in theory and kept alive inside the constant pool. I suppose using char[] forbids this possibility.
The answer has already been given, but I'd like to share an issue that I discovered lately with Java standard libraries. While they take great care now of replacing password strings with char[] everywhere (which of course is a good thing), other security-critical data seems to be overlooked when it comes to clearing it from memory.
I'm thinking of e.g. the PrivateKey class. Consider a scenario where you would load a private RSA key from a PKCS#12 file, using it to perform some operation. Now in this case, sniffing the password alone wouldn't help you much as long as physical access to the key file is properly restricted. As an attacker, you would be much better off if you obtained the key directly instead of the password. The desired information can be leaked manifold, core dumps, a debugger session or swap files are just some examples.
And as it turns out, there is nothing that lets you clear the private information of a PrivateKey from memory, because there's no API that lets you wipe the bytes that form the corresponding information.
This is a bad situation, as this paper describes how this circumstance could be potentially exploited.
The OpenSSL library for example overwrites critical memory sections before private keys are freed. Since Java is garbage-collected, we would need explicit methods to wipe and invalidate private information for Java keys, which are to be applied immediately after using the key.
As Jon Skeet states, there is no way except by using reflection.
However, if reflection is an option for you, you can do this.
public static void main(String[] args) {
System.out.println("please enter a password");
// don't actually do this, this is an example only.
Scanner in = new Scanner(System.in);
String password = in.nextLine();
usePassword(password);
clearString(password);
System.out.println("password: '" + password + "'");
}
private static void usePassword(String password) {
}
private static void clearString(String password) {
try {
Field value = String.class.getDeclaredField("value");
value.setAccessible(true);
char[] chars = (char[]) value.get(password);
Arrays.fill(chars, '*');
} catch (Exception e) {
throw new AssertionError(e);
}
}
when run
please enter a password
hello world
password: '***********'
Note: if the String's char[] has been copied as a part of a GC cycle, there is a chance the previous copy is somewhere in memory.
This old copy wouldn't appear in a heap dump, but if you have direct access to the raw memory of the process you could see it. In general you should avoid anyone having such access.
There is nothing that char array gives you vs String unless you clean it up manually after use, and I haven't seen anyone actually doing that. So to me the preference of char[] vs String is a bit exaggerated.
Take a look at the widely used Spring Security library here and ask yourself - are Spring Security guys incompetent or char[] passwords just don't make much sense. When some nasty hacker grabs memory dumps of your RAM be sure s/he'll get all the passwords even if you use sophisticated ways to hide them.
However, Java changes all the time, and some scary features like String Deduplication feature of Java 8 might intern String objects without your knowledge. But that's a different conversation.
Edit: Coming back to this answer after a year of security research, I realize it makes the rather unfortunate implication that you would ever actually compare plaintext passwords. Please don't. Use a secure one-way hash with a salt and a reasonable number of iterations. Consider using a library: this stuff is hard to get right!
Original answer: What about the fact that String.equals() uses short-circuit evaluation, and is therefore vulnerable to a timing attack? It may be unlikely, but you could theoretically time the password comparison in order to determine the correct sequence of characters.
public boolean equals(Object anObject) {
if (this == anObject) {
return true;
}
if (anObject instanceof String) {
String anotherString = (String)anObject;
int n = value.length;
// Quits here if Strings are different lengths.
if (n == anotherString.value.length) {
char v1[] = value;
char v2[] = anotherString.value;
int i = 0;
// Quits here at first different character.
while (n-- != 0) {
if (v1[i] != v2[i])
return false;
i++;
}
return true;
}
}
return false;
}
Some more resources on timing attacks:
A Lesson In Timing Attacks
A discussion about timing attacks over on Information Security Stack Exchange
And of course, the Timing Attack Wikipedia page
Strings are immutable and cannot be altered once they have been created. Creating a password as a string will leave stray references to the password on the heap or on the String pool. Now if someone takes a heap dump of the Java process and carefully scans through he might be able to guess the passwords. Of course these non used strings will be garbage collected but that depends on when the GC kicks in.
On the other side char[] are mutable as soon as the authentication is done you can overwrite them with any character like all M's or backslashes. Now even if someone takes a heap dump he might not be able to get the passwords which are not currently in use. This gives you more control in the sense like clearing the Object content yourself vs waiting for the GC to do it.
String is immutable and it goes to the string pool. Once written, it cannot be overwritten.
char[] is an array which you should overwrite once you used the password and this is how it should be done:
char[] passw = request.getPassword().toCharArray()
if (comparePasswords(dbPassword, passw) {
allowUser = true;
cleanPassword(passw);
cleanPassword(dbPassword);
passw = null;
}
private static void cleanPassword (char[] pass) {
Arrays.fill(pass, '0');
}
One scenario where the attacker could use it is a crashdump—when the JVM crashes and generates a memory dump—you will be able to see the password.
That is not necessarily a malicious external attacker. This could be a support user that has access to the server for monitoring purposes. He/she could peek into a crashdump and find the passwords.
The short and straightforward answer would be because char[] is mutable while String objects are not.
Strings in Java are immutable objects. That is why they can't be modified once created, and therefore the only way for their contents to be removed from memory is to have them garbage collected. It will be only then when the memory freed by the object can be overwritten, and the data will be gone.
Now garbage collection in Java doesn't happen at any guaranteed interval. The String can thus persist in memory for a long time, and if a process crashes during this time, the contents of the string may end up in a memory dump or some log.
With a character array, you can read the password, finish working with it as soon as you can, and then immediately change the contents.
Case String:
String password = "ill stay in StringPool after Death !!!";
// some long code goes
// ...Now I want to remove traces of password
password = null;
password = "";
// above attempts wil change value of password
// but the actual password can be traced from String pool through memory dump, if not garbage collected
Case CHAR ARRAY:
char[] passArray = {'p','a','s','s','w','o','r','d'};
// some long code goes
// ...Now I want to remove traces of password
for (int i=0; i<passArray.length;i++){
passArray[i] = 'x';
}
// Now you ACTUALLY DESTROYED traces of password form memory
A string in Java is immutable. So whenever a string is created, it will remain in memory until it is garbage collected. So anyone who has access to the memory can read the value of the string.
If the value of the string is modified then it will end up creating a new string. So both the original value and the modified value stay in the memory until it is garbage collected.
With the character array, the contents of the array can be modified or erased once the purpose of the password is served. The original contents of the array will not be found in memory after it is modified and even before the garbage collection kicks in.
Because of the security concern it is better to store password as a character array.
It is debatable as to whether you should use String or use Char[] for this purpose because both have their advantages and disadvantages. It depends on what the user needs.
Since Strings in Java are immutable, whenever some tries to manipulate your string it creates a new Object and the existing String remains unaffected. This could be seen as an advantage for storing a password as a String, but the object remains in memory even after use. So if anyone somehow got the memory location of the object, that person can easily trace your password stored at that location.
Char[] is mutable, but it has the advantage that after its usage the programmer can explicitly clean the array or override values. So when it's done being used it is cleaned and no one could ever know about the information you had stored.
Based on the above circumstances, one can get an idea whether to go with String or to go with Char[] for their requirements.
A lot of the previous answers are great. There is another point which I am assuming (please correct me if I am wrong).
By default Java uses UTF-16 for storing strings. Using character arrays, char[] array, facilitates use of Unicode, regional characters, etc. This technique allows all character set to be respected equally for storing the passwords and henceforth will not initiate certain crypto issues due to character set confusion. Finally, using the character array, we can convert the password array to our desired character set string.
I have an assignment that requires use of a socket. On the client and server side I have
char[] value= "END STREAM".toCharArray(), which signals the stream to shutdown.
Since I have both these arrays in the 2 different files, my intention is that the client sends the message value > the server. Then the server does the function
while(!Arrays.equals(clientSentence, value))
{
...
inFromClient.read(clientSentence, 0, length); //to read in from client side
.....
}
In essence, while it does not send the END STREAM message, keep reading. My issue is that the array equality does not work as intended. I even test this by doing
System.out.println(Arrays.equals(value, clientSentence));
System.out.println(new String(value));
System.out.println(new String(clientSentence));
and it prints
false
END STREAM
END STREAM
How can it be false when it is printing the same values. I have made sure that both arrays initialize to the same length so where is it going wrong? I have been stuck on this for hours and searched for answers but cannot find a solution. Thanks
EDIT: added my read function. I use a BufferedReader
On my box:
char[] x = "END STREAM".toCharArray();
char[] y = "END STREAM".toCharArray();
System.out.println(Arrays.equals(x, y));
System.out.println(Arrays.toString(x));
System.out.println(Arrays.toString(y));
works which makes me think a few things:
The arrays, as declared in your code, are not equal.
There is a character set incompatibility, you said you were using a BufferedReader and, if you're using the Files.newBufferedReader() functionality of Java 1.7, you need specify a charset when using it which may be causing a problem.
End of line issues either from cross platform systems or something else e.g. \r vs \n
Thinking a little more about it...it's probably #2, check this out for further information: http://docs.oracle.com/javase/tutorial/i18n/text/string.html
I have a simple question.
byte[] responseData = ...;
String str = new String(responseData);
String withKey = "{\"Abcd\":" + str + "}";
in the above code, are these three lines taking 3X memory. for example if the responseData is 1mb, then line 2 will take an extra 1mb in memory and then line 3 will take extra 1mb + xx. is this true? if no, then how it is going to work. if yes, then what is the optimal way to fix this. will StringBuffer help here?
Yes, that sounds about right. Probably even more because your 1MB byte array needs to be turned into UTF-16, so depending on the encoding, it may be even bigger (2MB if the input was ASCII).
Note that the garbage collector can reclaim memory as soon as the variables that use it go out of scope. You could set them to null as early as possible to help it make this as timely as possible (for example responseData = null; after you constructed your String).
if yes, then what is the optimal way to fix this
"Fix" implies a problem. If you have enough memory there is no problem.
the problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big,
If you don't, you have to think about a better alternative to keeping a 1MB string in memory. Maybe you can stream the data off a file? Or work on the byte array directly? What kind of data is this?
The problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big, thats why I need to figure it out first that am I doing something wrong ....
Yes. Well basically your fundamental problem is that you are trying to hold the entire string in memory at one time. This is always going to fail for a sufficiently large string ... even if you code it in the most optimal memory efficient fashion possible. (And that would be complicated in itself.)
The ultimate solution (i.e. the one that "scales") is to do one of the following:
stream the data to the file system, or
process it in such a way that you don't need ever need the entire "string" to be represented.
You asked if StringBuffer will help. It might help a bit ... provided that you use it correctly. The trick is to make sure that you preallocate the StringBuffer (actually a StringBuilder is better!!) to be big enough to hold all of the characters required. Then copy data into it using a charset decoder (directly or using a Reader pipeline).
But even with optimal coding, you are likely to need a peak of 3 times the size of your input byte[].
Note that your OOME problem is probably nothing to do with GC or storage leaks. It is actually about the fundamental space requirements of the data types you are using ... and the fact that Java does not offer a "string of bytes" data type.
There is no such OutOfMemoryException in my apidocs. If it's OutOfMemoryError, especially on the server-side, you definitely got a problem.
When you receive big requests from clients, those String related statements are not the first problem. Reducing 3X to 1X is not the solution.
I'm sorry I can't help without any further codes.
Use back-end storage
You should not store the whole request body on byte[]. You can store them directly on any back-end storage such as a local file, a remote database, or cloud storage.
I would
copy stream from request to back-end with small chunked buffer
Use streams
If can use Streams not Objects.
I would
response.getWriter().write("{\"Abcd\":");
copy <your back-end stored data as stream>);
response.getWriter().write("}");
Yes, if you use a Stringbuffer for the code you have, you would save 1mb of heap space in the last step. However, considering the size of data you have, I recommend an external memory algorithm where you bring only part of your data to memory, process it and put it back to storage.
As others have mentioned, you should really try not to have such a big Object in your mobile app, and that streaming should be your best solution.
That said, there are some techniques to reduce the amount memory your app is using now:
Remove byte[] responseData entirely if possible, so the memory it used can be released ASAP (assuming it is not used anywhere else)
Create the largest String first, and then substring() it, Android uses Apache Harmony for its standard Java library implementation. If you check its String class implementation, you'll see that substring() is implemented simply by creating a new String object with the proper start and end offset to the original data and no duplicate copy is created. So doing the following would cuts the overall memory consumption by at least 1/3:
String withKey = StringBuilder().append("{\"Abcd\").append(str).append("}").toString();
String str = withKey.substring("{\"Abcd\".length(), withKey.length()-"}".length());
Never ever use something like "{\"Abcd\":" + str + "}" for large Strings, under the hood "string_a"+"string_b" is implemented as new StringBuilder().append("string_a").append("string_b").toString(); so implicitly you are creating two (or at least one if the compiler is mart) StringBuilders. For large Strings, it's better that you take over this process yourself as you have deep domain knowledge about your program that the compiler doesn't, and knows how to best manipulate the strings.