I learned about using char[] to store passwords back in the Usenet days in comp.lang.java.*.
Searching Stack Overflow, you can also easily find highly upvoted questions like this:
Why is char[] preferred over String for passwords? which agrees with what I learned a long, long time ago.
I still write my APIs to use char[] for password. But are that just hollow ideals now?
For example, look at Atlassian Jira's Java API:
LoginManager.authenticate which takes your password as a String.
Or Thales' Luna Java API:
login() method in LunaSlotManager. Of all people, an HSM vendor using String for the HSM slot password.
I think I've also read somewhere that the internals of URLConnection (and many other classes) uses String internally to handle the data. So if you ever send a password (although the password is encrypted by TLS over the wire), it will be in a String in your server's memory.
Is accessing server memory an attack factor so difficult to achieve that it is okay to store passwords as String now? Or is Thales' doing it because your password will end up in a String anyway due to classes written by others?
First, let’s recall the reason for the recommendation to use char[] instead of String: Strings are immutable, so once the string is created, there is limited control over the contents of the string until (potentially well after) the memory is garbage collected. An attacker that can dump the process memory can thus potentially read the password data. Meanwhile the contents of the char[] object can be overwritten after it has been created. Assuming that this is done, and that the GC hasn’t moved the object to another physical memory location in the interim, this means that the password contents can be destroyed (somewhat) deterministically after it has been used. An attacker reading the process memory after that point won’t be able to get the password.
So using char[] instead of String lets you prevent a very specific attack scenario, where the attacker has full access to the process memory,1 but only at specific points in time rather than continuously. And even under this scenario, using char[] and overwriting its contents does not prevent the attack, it just reduces its chance of success (if the attacker happens to read the process memory between the creation and the erasure of the password, they can read it).
I am not aware of any evidence that shows (a) how frequent this scenario is, nor (b) how much this mitigation reduces the probability of success under that scenario. As far as I know, this is pure guesswork.
In fact, on most systems, this scenario likely does not exist at all: an attacker who can get access to another process’ memory can also gain full tracing access. For instance, on both Linux and Windows any process that can read another process’ memory can also inject arbitrary logic into that process (e.g. via LD_PRELOAD and similar mechanisms2). So I would say that this mitigation at best has a limited benefit, and potentially none at all.
… Actually I can think of one specific counter-example: an application that loads an untrusted plugin library. As soon as that library is loaded via conventional means (i.e. in the same memory space), it has access to the parent application. In this scenario, it might make sense to use char[] instead of String, and overwrite its contents when done with it, if the password is handled before the plugin is loaded. But a better solution would be not to load untrusted plugins into the same memory space. A common alternative is to launch it in a separate process and communicate via IPC.
(See the answer by Gilles for more vulnerable scenarios. I still think that the benefit is relatively limited, but it’s clearly not nil.)
1 As shown in Gilles’ answer, this is not correct: no full memory access is required to mount a successful attack.
2 Although LD_PRELOAD specifically requires the attacker to not only have access to another process but either to launch that process, or to have access to its parent process.
(Note: I am a security expert but not a Java expert.)
Yes, there is a significant security advantage in using char[] rather than strings for passwords. This also applies to some extent to other highly confidential data, although most highly confidential data (e.g. cryptographic keys) tends to be bytes and not characters.
The old, and still valid, reason to use char[] is to clean up memory as soon as it is used, which is not possible with String. This is a very firmly established security practice. For example, in the (in)famous FIPS 140 requirements for cryptographic processing, which are generally considered to be security requirements, there are in fact extremely few security requirements at level 1 (the easiest level). Just two, in fact: one is that you may only used approved cryptographic algorithms, and the other one is that keys, passwords and other sensitive data must be wiped after use.
This practice is one of the reason why production implementations of cryptographic primitives are usually implemented in languages with manual memory management such as C, C++ or Rust: cryptography implementers want to retain control of where sensitive data goes, and to be sure to wipe all copies of sensitive material.
As an example of what can go wrong, consider the (in)famous Heartbleed bug. It allowed anyone on the Internet connecting to a vulnerable server to dump some of the memory of the server, without being detected. The attacker didn't get much control over which part of the memory, but could try again and again. An attacker could make requests that would cause the dumpable part to move around the heap, and thus could potentially dump the whole memory.
Are such bug common? No. This one got a lot of buzz because it was in a very popular software and the consequences were bad. But such bugs do exist and it's good to protect against them.
In addition, since Java 8, there is another reason, which is to avoid string deduplication. String deduplication means that if two String objects have the same content, they may be merged. String deduplication is problematic if an attacker can mount a side channel attack when the deduplication is attempted. The attack does not require the password to be deduplicated (although it is easier in this case): there's a problem as soon as some code compares the password against another string.
The usual way to compare strings for equality is:
If the lengths are different, return false.
Otherwise compare the characters one by one. As soon as there are different characters at one position, return false.
If the end of the strings is reached without encountering a difference, return true.
This has a timing side channel: the time of the middle step depends on the number of identical characters at the beginning of the string. Suppose that an attacker can measure this time, and can upload some strings for comparison (e.g. by making legitimate requests to a server). The attacker notices that comparing with sssssssss takes slightly longer than comparing with aaaaaaaaa, so the password must begin with s. Then the attacker tries to vary the second character, and finds that comparing with swwwwwwww takes again slightly longer. And thus, in relatively short time, the attacker can reconstruct the password character by character.
In the context of string deduplication, the attack is harder, because (as far as I know) the deduplication code first hashes the strings to compare. This may mean that the attacker has to first guess the hash value. But the total number of hash values in a given hash table (that's the number of hash buckets, not the full range of the hash method) is small enough that it's practical to enumerate.
This is not an easy attack, to be sure. But I would absolutely not rule it out, especially with a local attacker, but even with a remote attacker. Remote timing attacks are practical (still).
In conclusion, yes, you should not use String for passwords. Read them as char[], keep careful track of any copies, hash them as soon as possible if you're verifying them, and wipe all copies.
If you need to store a password for a third-party service, it's a good idea to store it in encrypted form even if there is no separate access control for the encryption key. Copies of an encrypted password are less prone to leaking through side channels than copies of the password itself, which is a printable string with low entropy.
I think I've also read somewhere that the internals of URLConnection (and many other classes) uses String internally to handle the data. So if you ever send a password (although the password is encrypted by TLS over the wire), it will be in a String in your server's memory.
I'm not a Java expert, but this doesn't sound right: the plaintext of a connection (TLS or otherwise) is a byte stream, not a character stream. It should be arrays of 8-bit bytes, not arrays of Unicode code points.
Or that your password will end up in a String anyway due to classes written by others, is that why Thales' doing it.
Possibly. Or possibly because they aren't Java experts, or because the people who write the high-level layers are often not the foremost security experts.
Almost everyone else's answer plus one additional point:
Swap space on a storage media.
If the JVM heap is ever paged out to disk and the password is still in memory as a string (immutable and not GC'd), it will be written to the swap file. This swap file can then be scanned for password values, so, essentially another attack vector that's time based and still rather difficult to utilize but obviously not that difficult because we're here :D.
Wiping the mutable array at least reduces the time where the password is in memory.
The story I heard was that if an attacker can attack your process (like a DDOS) and trigger the process to swap out, it's somewhat easier to attack the swap space than the memory, AND swap space is preserved across boots/crashes/etc. This allows for yet another attack vector where the attacker pulls the swap drive out to scan the swap space.
Lots of detail in the answers but here's the short of it: yes, in theory, putting the password in an array and wiping it provides security benefits. In practice, that only helps if you can avoid the password ever being stored in a String. That is, if you take a password stored in a String and put the contents of the String into a char[], it doesn't magically make the String disappear from the heap. The necessary requirement is that the password never is placed in a String at all. I'd be interested to see that successfully implemented in a real Java application.
It's not an idea of the moment of transfer over the network. There indeed you're indeed better off using a String as it's just more convient to use to send over the network, of course making sure it's properly encrypted.
For using passwords in applications it's different due to stack-dumps and reverse engineering, and the problem of the String being immutable:
In case the password has been entered, even if the reference to the variable is changed to another string, there is no certainty about when the garbage collector will actually remove the String from the heap. So a hacker being able to see the dump will also be able to see the password. Using an array of char prevents this as you can change the data in the array directly without relying on the garbage collector.
Now you might say: well then when sending it over the network as a String it'll still be visible no? Well yes, but that's why encrypting it before sending it is important. Never send plain text passwords over the network when possible.
My application uses Google protocol buffers to send sensitive data between client and server instances. The network link is encrypted with SSL, so I'm not worried about eavesdroppers on the network. I am worried about the actual loading of sensitive data into the protobuf because of memory concerns explained in this SO question.
For example:
Login login = Login.newBuilder().setPassword(password)// problem
.build();
Is there no way to do this securely since protocol buffers are immutable?
Protobuf does not provide any option to use char[] instead of String. On the contrary, Protobuf messages are intentionally designed to be fully immutable, which provides a different kind of security: you can share a single message instance between multiple sandboxed components of a program without worrying that one may modify the data in order to interfere with another.
In my personal opinion as a security engineer -- though others will disagree -- the "security" described in the SO question to which you link is security theater, not actually worth pursuing, for a number of reasons:
If an attacker can read your process's memory, you've already lost. Even if you overwrite the secret's memory before discarding it, if the attacker reads your memory at the right time, they'll find the password. But, worse, if an attacker is in a position to read your process's memory, they're probably in a position to do much worse things than extract temporary passwords: they can probably extract long-lived secrets (e.g. your server's TLS private key), overwrite parts of memory to change your app's behavior, access any and all resources to which your app has access, etc. This simply isn't a problem that can be meaningfully addressed by zeroing certain fields after use.
Realistically, there are too many ways that your secrets may be copied anyway, over which you have no control, making the whole exercise moot:
Even if you are careful, the garbage collector could have made copies of the secret while moving memory around, defeating the purpose. To avoid this you probably need to use a ByteBuffer backed by non-managed memory.
When reading the data into your process, it almost certainly passes through library code that doesn't overwrite its data in this way. For example, an InputStream may do internal buffering, and probably doesn't zero out its buffer afterwards.
The operating system may page your data out to swap space on disk at any time, and is not obliged to zero that data afterwards. So even if you zero out the memory, it may persist in swap. (Encrypting swap ensures that these secrets are effectively gone when the system shuts down, but doesn't necessarily protect against an attacker present on the local machine who is able to extract the swap encryption key out of the kernel.)
Etc.
So, in my opinion, using mutable objects in Java specifically to be able to overwrite secrets in this way is not a useful strategy. These threats need to be addressed elsewhere.
String val="Hello";
//blocks of code
int c=val.hashCode(); //say i get 101 as memory location.
Say i don't know what is the "val" at mem location 101 .And i want to store val="abc" at this location. How can i use the memory location to append the val ? Is this possible in Java.
Java doesn't allow direct manipulation of memory locations. It is built as a layer between the user and the actual memory.
Not to mention that hashcode has nothing to do with the memory address.
Per #ValekHalfHeart's comment I did a quick search and appearantly there is an API available for unsafe operations (including changes to the the memory location). I have no experience with this, but it might look like a backdoor to do what you want.
Do note that Java is explicitly built to abstract all these things away from you. Trying to use them after all is abusing your tools.
The short answer to your question is "no". Java "protects" you from knowing (or caring) about the memory locations of objects, allocation, destruction, etc.
If you like that kind of thing, might I recommend you use C? It's a much maligned language - but it's great for that kind of "close to the silicon" programming. With all the perils that entails...
I am working on spring web mvc and recently encountered java.lang.OutOfMemoryError: Java heap space.
So, i was reading about it and the major mistake i am doing is that i am not deferencing the used objects. so GC is not cleaning lot of memory.
Now the question is when to dereference it.
Here is basic out flow:
From front end user sends a request
server calls a library with the users request
library returns it a big chunk of array of results.
server forwards it to front end.
Now til this point i cannot dereference results array as i need the result object. Am i correct ?
So when user sends new request should i clean the results array and call the library with new request.
Also i used -XX:+HeapDumpOnOutOfMemoryError to get a dump file. But i dont see the dump file in project folder. In log i see that dump file is created. Did any one run into this case.
Generally the solution for this type of problem is:
(Obvious) Increase the maximum heap settings using -Xmx, get a bigger hardware. This appraoch might not be scalable, but could provide short term solution to the problem
Ask yourself do you really need a big chunk ? If not try requesting smaller chunks instead to conserve heap usage. Make sure you are not holding reference to an object any longer than you should. As soon as you know your variable is no longer needed, set it to null so they can be garbage collected.
It will be very difficult to help you without a http://sscce.org/ . The JVM does the GC for you and if your objects have well defined scope they should get GCed automatically.
I would recommend you start by increasing the heap memory figuring out these :
what is the scope of Result array (Global or restricted to method calls hierarchy i.e passed through method invocation stack to fron-end)?
In case it has global scope is it part of a singleton instance or created per request instance?. What objects are stored in the Result array are they referenced anywhere else in your code?.
You can Use Jhat to get the object reference graph and find out who else is referencing the objects (DISCLAIMER: It would get messy if the objects stored in the array contain references to other objects & which is usually the case)
Better ways to identify objects not getting garbage collected?
If I have a List<Object>, would it be possible to run some method on each Object to see how much memory each is consuming? I know nothing about each Object it may be an entire video file loaded onto the heap or just a two-byte string. I ultimately would like to know which objects to drop first before running out of memory.
I think Runtime.totalMemory() shows the memory currently used by the JVM, but I want to see the memory used by a single object.
SoftReference looks kinda like what you need. Create a list of soft references to your objects, and if those objects are not referenced anywhere and you run out of memory, JVM will delete some of them. I don't know how smart the algorithm for choosing what to delete is, but it could as well be removing those that will free most memory.
If you are in a container you can use Jconsole http://java.sun.com/developer/technicalArticles/J2SE/jconsole.html
The jdk since 1.5 comes with heap dump ulits... You in a container or in eclipse? Also why do you have a List of Objects??
There is no clean way to do it. You can create a dummy OutputStream which will do nothing but counting number of bytes written. So, you can make some estimation about your object graph size by serializing it to such stream.
I would not advise to do it in production system. I, personally, did it once for experimenting and making estimations.
Actually another possible tactic is just to make a crap load of instance of the class you want to check (like a million in an array).
The sheer number of objects should negate the overhead (as in the overhead of other stuff will be much smaller than your crap load of objects).
You will want to run this in isolation of course (ie public static main()).
I will admit you will need lots of memory for this test.
Something you could do is make a Map<Object, Long> which maps each object to it's memory size.
Then to measure the size of a particular object, you have to do it at instantiation of each object - measure the JVM memory use before (calling Runtime.totalMemory()) and after building the object (calling Runtime.totalMemory()) and take the difference between the two - that is the size of the object in memory. Then add the Object and Long to your map. From there you should be able to loop through all of the keys in the map and find the object using the largest amount of space.
I am not sure there is a way to do it per object after you already have your List<Object>... I hope this is helpful!