Growing ByteBuffer

Growing ByteBuffer - java

Has anyone has ever seen an implementation of java.nio.ByteBuffer that will grow dynamically if a putX() call overruns the capacity?
The reason I want to do it this way is twofold:
I don't know how much space I need ahead of time.
I'd rather not do a new ByteBuffer.allocate() then a bulk put() every time I run out of space.

In order for asynchronous I/O to work, you must have continuous memory. In C you can attempt to re-alloc an array, but in Java you must allocate new memory. You could write to a ByteArrayOutputStream, and then convert it to a ByteBuffer at the time you are ready to send it. The downside is you are copying memory, and one of the keys to efficient IO is reducing the number of times memory is copied.

A ByteBuffer cannot really work this way, as its design concept is to be just a view of a specific array, which you may also have a direct reference to. It could not try to swap that array for a larger array without weirdness happening.
What you want to use is a DataOutput. The most convenient way is to use the (pre-release) Guava library:
ByteArrayDataOutput out = ByteStreams.newDataOutput();
out.write(someBytes);
out.writeInt(someInt);
// ...
return out.toByteArray();
But you could also create a DataOutputStream from a ByteArrayOutputStream manually, and just deal with the spurious IOExceptions by chaining them into AssertionErrors.

Another option is to use direct memory with a large buffer. This consumes virtual memory but only uses as much physical memory as you use (by page which is typically 4K)
So if you allocate a buffer of 1 MB, it comsumes 1 MB of virtual memory, but the only OS gives physical pages to the application which is actually uses.
The effect is you see your application using alot of virtual memory but a relatively small amount of resident memory.

Have a look at Mina IOBuffer https://mina.apache.org/mina-project/userguide/ch8-iobuffer/ch8-iobuffer.html which is a drop in replacement (it wraps the ByteBuffer)
However , I suggest you allocate more than you need and don't worry about it too much. If you allocate a buffer (esp a direct buffer) the OS gives it virtual memory but it only uses physical memory when its actually used. Virtual memory should be very cheap.

It may be also worth to have a look at Netty's DynamicChannelBuffer. Things that I find handy are:
slice(int index, int length)
unsigned operations
separated writer and reader indexes

Indeed, auto-extending buffers are so much more intuitive to work with. If you can afford the performance luxury of reallocation, why wouldn't you!?
Netty's ByteBuf gives you exactly this. It's like they've taken java.nio's ByteBuffer and scraped away the edges, making it much easier to use.
Furthermore, it's on Maven in an independent netty-buffer package so you don't need to include the full Netty suite to use.

I'd suggest using an input stream to receive data from a file (with a sperate thread if you need non-blocking) then read bytes into a ByteArrayOutstream which gives you the ability to get it as a byte array. Heres a simple example without adding too many workarounds.
try (InputStream inputStream = Files.newInputStream(
Paths.get("filepath"), StandardOpenOption.READ)){
ByteArrayOutputStream baos = new ByteArrayOutputStream();
int byteRead = 0;
while(byteRead != -1){
byteRead = inputStream.read();
baos.write(byteRead);
}
ByteBuffer byteBuffer = ByteBuffer.allocate(baos.size())
byteBuffer.put(baos.toByteArray());
//. . . . use the buffer however you want
}catch(InvalidPathException pathException){
System.out.println("Path exception: " + pathException);
}
catch (IOException exception){
System.out.println("I/O exception: " + exception);
}

Another solution for this would be to allocate more than enough memory, fill the ByteBuffer and then only return the occupied byte array:
Initialize a big ByteBuffer:
ByteBuffer byteBuffer = ByteBuffer.allocate(1000);
After you're done putting things into it:
private static byte[] getOccupiedArray(ByteBuffer byteBuffer)
{
int position = byteBuffer.position();
return Arrays.copyOfRange(byteBuffer.array(), 0, position);
}
However, using a org.apache.commons.io.output.ByteArrayOutputStream from the start would probably be the best solution.

Netty ByteBuf is pretty good on that.

A Vector allows for continuous growth
Vector<Byte> bFOO = new Vector<Byte>();
bFOO.add((byte) 0x00);`

To serialize somethiing you will need object in entry. What you can do is put your object in collection of objects, and after that make loop to get iterator and put them in byte array. Then, call ByteBuffer.allocate(byte[].length). That is what I did and it worked for me.

Related

ByteBuffer allocate direct example

I am writing a Java program for fun which stores sensitive information from users.
For this reason I want to ensure that the garbage collection does not touch it, so that in the future when I am finished I can wipe it from memory.
So far I have this line of code creating 2048 bytes which is more than enough to store any user's passwords.
My question is how do I store a String such as "secret123", and after delete it? This is a very basic question I know but I could not see it in the documentation. I am probably making this more difficult than it is in my head, but better safe than sorry.
ByteBuffer pass = ByteBuffer.allocateDirect(2048);
I am aware of other risks such as swap page files, the computer being coldboot attacked etc...
Thanks!
EDIT:
In response to first answer - I mean to fill memory with '0' characters afterwards, not to free it.

You can't explicitly free the allocated memory, but you can clear the buffer and then write zeros (or random bytes) to the buffer when you are done. This will destroy any data that was previously stored in the buffer, reducing the window of attack.
pass.clear();
while (pass.hasRemaining())
pass.put((byte) 0);

As an alternative to #erickson's approach, if you allocate the byte array yourself and create the ByteBuffer by wrapping, then you can clear the array with a call to Arrays.fill().
byte[] byteArray = new byte[2048];
ByteBuffer bb = ByteBuffer.wrap(byteArray);
//... do your thing here
Arrays.fill(byteArray, (byte)0);
As long as you maintain a reference to either the byteArray or the ByteBuffer, garbage collection won't touch the byte array. You can also get the array back later by calling ByteBuffer.array() and then zeroing it out. (NB: You are not guaranteed an actual array if you try this with a ByteBuffer created by allocateDirect().)

What's the fastest way to write a very small string to a file in Java?

My code needs to take an integer value between 0 and 255 and write it to a file as a string. It needs to be fast as it may be called repeatedly very quickly, so any optimisation will become noticeable when under heavy load. There are other questions on here dealing with efficient ways to write large amounts of data to file, but how about small amounts of data?
Here's my current approach:
public static void writeInt(final String filename, final int value)
{
try
{
// Convert the int to a string representation in a byte array
final String string = Integer.toString(value);
final byte[] bytes = new byte[string.length()];
for (int i = 0; i < string.length(); i++)
{
bytes[i] = (byte)string.charAt(i);
}
// Now write the byte array to file
final FileOutputStream fileOutputStream = new FileOutputStream(filename);
fileOutputStream.write(bytes, 0, bytes.length);
fileOutputStream.close();
}
catch (IOException exception)
{
// Error handling here
}
}
I don't think a BufferedOutputStream will help here: the overhead of building the flushing the buffer is probably counter-productive for a 3-character write, isn't it? Are there any other improvements I can make?

I think this is about as efficient as you can get given the requirements of the 0-255 range requirement. Using a buffered writer will be less efficient since it would create some temporary structures that you don't need to create with so few bytes being written.
static byte[][] cache = new byte[256][];
public static void writeInt(final String filename, final int value)
{
// time will be spent on integer to string conversion, so cache that
byte[] bytesToWrite = cache[value];
if (bytesToWrite == null) {
bytesToWrite = cache[value] = String.valueOf(value).getBytes();
}
FileOutputStream fileOutputStream = null;
try {
// Now write the byte array to file
fileOutputStream = new FileOutputStream(filename);
fileOutputStream.write(bytesToWrite);
fileOutputStream.close();
} catch (IOException exception) {
// Error handling here
} finally {
if (fileOutputStream != null) {
fileOutputStream.close()
}
}
}

You cannot make it faster IMO. BufferedOutputStream would be of no help here if not otherwise. If we look at the src we'll see that FileOutputStream.write(byte b[], int off, int len) sends byte array directly to native method while BufferedOutputStream.write(byte b[], int off, int len) is synchronized and copies array to its buffer first, and on close it will flush the bytes from buffer to the actual stream.
Besides the slowest part in this case is opening / closing the file.

I think, the bottleneck here is IO and these two improvements could help:
to think of a granularity of updates. I.e. if you need no more than 20 updates per second out of your app, then you could optimize your app to write no more than 1 update per 1/20 second. And this could be very benefitial depending on the environment.
Java NIO has proved to be much faster for large sizes, so it also makes sense to experiment with small sizes, e.g. to write to a Channel instead of InputStream.

Sorry for coming so late to the party :)
I think trying to optimise the code is probably not the right approach. If you're writing the same tiny file repeatedly, and you have to write it each time rather than buffering it in your application, then by far the biggest consideration will be filesystem and the storage hardware.
The point is that if you're actually hitting the hardware every time then you will upset it severely. If your system is caching the writes, though, then you might be able to have it not hit the hardware very often at all: the data will have been overwritten before it gets there, and only the new data will be written.
But this depends on two things. For one, what does your filesystem do when it gets a new write before it's written the old one? Some filesystems might still end up writing extra entries in a journal, or even writing the old file in one place and then the new file in a different physical location. That would be a killer.
For another, what does your hardware do when asked to overwrite something? If it's a conventional hard drive, it will probably just overwrite the old data. If it's flash memory (as it might well be if this is Android), the wear levelling will kick in, and it'll keep writing to different bits of the drive.
You really need to do whatever you can, in terms of disk caching and filesystem, to ensure that if you send 1000 updates before anything pushes the cache to disk, only the last update gets written.
Since this is Android, you're probably looking at ext2/3/4. Look carefully at the journalling options, and investigate what the effect of the delayed allocation in ext4 would be. Perhaps the best option will be to go with ext4, but turn the journalling off.

A quick google search brought up a benchmark of different write/read operations with files of various sizes:
http://designingefficientsoftware.wordpress.com/2011/03/03/efficient-file-io-from-csharp/
The author comes to the conclusion that WinFileIO.WriteBlocks performs fastest for writing data to a file, altough I/O operations rely heavily on multiple factors, such as operating system file caching, file indexing, disk fragmentation, filesystem caching etc.

Java Strings : how the memory works with immutable Strings

I have a simple question.
byte[] responseData = ...;
String str = new String(responseData);
String withKey = "{\"Abcd\":" + str + "}";
in the above code, are these three lines taking 3X memory. for example if the responseData is 1mb, then line 2 will take an extra 1mb in memory and then line 3 will take extra 1mb + xx. is this true? if no, then how it is going to work. if yes, then what is the optimal way to fix this. will StringBuffer help here?

Yes, that sounds about right. Probably even more because your 1MB byte array needs to be turned into UTF-16, so depending on the encoding, it may be even bigger (2MB if the input was ASCII).
Note that the garbage collector can reclaim memory as soon as the variables that use it go out of scope. You could set them to null as early as possible to help it make this as timely as possible (for example responseData = null; after you constructed your String).
if yes, then what is the optimal way to fix this
"Fix" implies a problem. If you have enough memory there is no problem.
the problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big,
If you don't, you have to think about a better alternative to keeping a 1MB string in memory. Maybe you can stream the data off a file? Or work on the byte array directly? What kind of data is this?

The problem is that I am getting OutOfMemoryException as the byte[] data coming from server is quite big, thats why I need to figure it out first that am I doing something wrong ....
Yes. Well basically your fundamental problem is that you are trying to hold the entire string in memory at one time. This is always going to fail for a sufficiently large string ... even if you code it in the most optimal memory efficient fashion possible. (And that would be complicated in itself.)
The ultimate solution (i.e. the one that "scales") is to do one of the following:
stream the data to the file system, or
process it in such a way that you don't need ever need the entire "string" to be represented.
You asked if StringBuffer will help. It might help a bit ... provided that you use it correctly. The trick is to make sure that you preallocate the StringBuffer (actually a StringBuilder is better!!) to be big enough to hold all of the characters required. Then copy data into it using a charset decoder (directly or using a Reader pipeline).
But even with optimal coding, you are likely to need a peak of 3 times the size of your input byte[].
Note that your OOME problem is probably nothing to do with GC or storage leaks. It is actually about the fundamental space requirements of the data types you are using ... and the fact that Java does not offer a "string of bytes" data type.

There is no such OutOfMemoryException in my apidocs. If it's OutOfMemoryError, especially on the server-side, you definitely got a problem.
When you receive big requests from clients, those String related statements are not the first problem. Reducing 3X to 1X is not the solution.
I'm sorry I can't help without any further codes.
Use back-end storage
You should not store the whole request body on byte[]. You can store them directly on any back-end storage such as a local file, a remote database, or cloud storage.
I would
copy stream from request to back-end with small chunked buffer
Use streams
If can use Streams not Objects.
I would
response.getWriter().write("{\"Abcd\":");
copy <your back-end stored data as stream>);
response.getWriter().write("}");

Yes, if you use a Stringbuffer for the code you have, you would save 1mb of heap space in the last step. However, considering the size of data you have, I recommend an external memory algorithm where you bring only part of your data to memory, process it and put it back to storage.

As others have mentioned, you should really try not to have such a big Object in your mobile app, and that streaming should be your best solution.
That said, there are some techniques to reduce the amount memory your app is using now:
Remove byte[] responseData entirely if possible, so the memory it used can be released ASAP (assuming it is not used anywhere else)
Create the largest String first, and then substring() it, Android uses Apache Harmony for its standard Java library implementation. If you check its String class implementation, you'll see that substring() is implemented simply by creating a new String object with the proper start and end offset to the original data and no duplicate copy is created. So doing the following would cuts the overall memory consumption by at least 1/3:
String withKey = StringBuilder().append("{\"Abcd\").append(str).append("}").toString();
String str = withKey.substring("{\"Abcd\".length(), withKey.length()-"}".length());
Never ever use something like "{\"Abcd\":" + str + "}" for large Strings, under the hood "string_a"+"string_b" is implemented as new StringBuilder().append("string_a").append("string_b").toString(); so implicitly you are creating two (or at least one if the compiler is mart) StringBuilders. For large Strings, it's better that you take over this process yourself as you have deep domain knowledge about your program that the compiler doesn't, and knows how to best manipulate the strings.

What's causing "Unable to retrieve native address from ByteBuffer object"?

As a very novice Java programmer, I probably should not mess with that kind of things. Unfortunately, I'm using a library which have a method that accepts a ByteBuffer object and throws when I try to use it:
Exception in thread "main" java.lang.NullPointerException: Unable to retrieve native address from ByteBuffer object
Is it because I'm using a non-direct buffer?
edit:
There's not a lot of my code there. The library I'm using is jNetPcap, and I'm trying to dump a packet to file. My code takes an existing packet, and extract a ByteBuffer out of it:
byte[] bytes = m_packet.getByteArray(0, m_packet.size());
ByteBuffer buffer = ByteBuffer.wrap(bytes);
Then it calls on of the dump methods of jNetPcap that takes a ByteBuffer.

Many JNI calls expect a direct ByteBuffer. Even the standard libraries in Oracle Java 6.0 expect this and if you provide them with a heap ByteBuffer they copy your data to/from a direct one for you. In your case, you have a byte[] which can be copied to a direct ByteBuffer. note: creating a direct ByteBuffer is expensive and you should cache/recycle them if you can.
// the true size is always a multiple of a page anyway.
static final ByteBuffer buffer = ByteBuffer.allocateDirect(4096);
// synchronize the buffer if you need to, or use a ThreadLocal buffer as a simple cache.
byte[] bytes = m_packet.getByteArray(0, m_packet.size());
buffer.clear();
buffer.put(bytes);
buffer.flip();

Based on the information you've provided it appears you are using a ByteBuffer implementation that doesn't allow the Native code to get access to the underlying memory structure. It is attempting to access the direct memory in your ByteBuffer, which it probably shouldn't be doing, and is failing because the class deriving from ByteBuffer doesn't store data directly.
If this is critical code you can't change, your best bet would be to create a ByteBuffer using the Java implementation, then copy the original data into your temporary buffer; Pass the new buffer to your native method. I would then profile the code to see if it is a performance impact.
Here is an example of how to do this. I am a little hesitant to use rewind() and limit() as I don't know what the implementation of your ByteBuffer will return so check to make sure it implements the interface of ByteBuffer correctly.
This code illegally access index 3 on purpose to show that extra data isn't added.
public static void main(String[] args) {
// This will be your implementation of ByteBuffer that
// doesn't allow direct access.
ByteBuffer originalBuffer = ByteBuffer.wrap(new byte[]{12, 50, 70});
originalBuffer.rewind();
byte[] newArray = new byte[originalBuffer.limit()];
originalBuffer.get(newArray, 0, newArray.length);
ByteBuffer newBuffer = ByteBuffer.wrap(newArray);
System.out.println("Limit: " + newBuffer.limit());
System.out.println("Index 0: " + newBuffer.get(0));
System.out.println("Index 1: " + newBuffer.get(1));
System.out.println("Index 2: " + newBuffer.get(2));
System.out.println("Index 3: " + newBuffer.get(3));
}
Output:
Limit: 3
Index 0: 12
Index 1: 50
Index 2: 70
Exception in thread "main" java.lang.IndexOutOfBoundsException
at java.nio.Buffer.checkIndex(Buffer.java:514)
at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:121)
at stackoverflow_4534583.Main.main(Main.java:35)

wrap does not create a 'direct' byte buffer. A direct byte buffer typically results from using the memory mapping API. Whoever wrote the JNI code you are using wasn't kind to you insofar as they didn't write the code to tolerate a non-direct buffer.
However, all is not lost: http://download.oracle.com/javase/6/docs/api/java/nio/ByteBuffer.html#allocateDirect(int)
will do what you need.

Why does reading a file into memory takes 4x the memory in Java?

I have the following code which reads in the follow file, append a \r\n to the end of each line and puts the result in a string buffer:
public InputStream getInputStream() throws Exception {
StringBuffer holder = new StringBuffer();
try{
FileInputStream reader = new FileInputStream(inputPath);
BufferedReader br = new BufferedReader(new InputStreamReader(reader));
String strLine;
//Read File Line By Line
boolean start = true;
while ((strLine = br.readLine()) != null) {
if( !start )
holder.append("\r\n");
holder.append(strLine);
start = false;
}
//Close the input stream
reader.close();
}catch (Throwable e){//this is where the heap error is caught up to 2Gb
System.err.println("Error: " + e.getMessage());
}
return new StringBufferInputStream(holder.toString());
}
I tried reading in a 400Mb file, and I changed the max heap space to 2Gb and yet it still gives the out of memory heap exception. Any ideas?

It may be to do with how the StringBuffer resizes when it reaches capacity - This involves creating a new char[] double the size of the previous one and then copying the contents across into the new array. Together with the points already made about characters in Java being stored as 2 bytes this will definitely add to your memory usage.
To resolve this you could create a StringBuffer with sufficient capacity to begin with, given that you know the file size (and hence approximate number of characters to read in). However, be warned that the array allocation will also occur if you then attempt to convert this large StringBuffer into a String.
Another point: You should typically favour StringBuilder over StringBuffer as the operations on it are faster.
You could consider implementing your own "CharBuffer", using for example a LinkedList of char[] to avoid expensive array allocation / copy operations. You could make this class implement CharSequence and perhaps avoid converting to a String altogether. Another suggestion for more compact representation: If you're reading in English text containing large numbers of repeated words you could read and store each word, using the String.intern() function to significantly reduce storage.

To begin with Java strings are UTF-16 (i.e. 2 bytes per character), so assuming your input file is ASCII or a similar one-byte-per-character format then holder will be ~2x the size of the input data, plus the extra \r\n per line and any additional overhead. There's ~800MB straight away, assuming a very low storage overhead in StringBuffer.
I could also believe that the contents of your file is buffered twice - once at the I/O level and once in the BufferedReader.
However, to know for sure, it's probably best to look at what's actually on the heap - use a tool like HPROF to see exactly where your memory has gone.
I terms of solving this, I suggest you process a line at a time, writing out each line after your have added the line termination. That way your memory usage should be proportional to the length of a line, instead of the entire file.

It's an interesting question, but rather than stress over why Java is using so much memory, why not try a design that doesn't require your program to load the entire file into memory?

You have a number of problems here:
Unicode: characters take twice as much space in memory as on disk (assuming a 1 byte encoding)
StringBuffer resizing: could double (permanently) and triple (temporarily) the occupied memory, though this is the worst case
StringBuffer.toString() temporarily doubles the occupied memory since it makes a copy
All of these combined mean that you could require temporarily up to 8 times your file's size in RAM, i.e. 3.2G for a 400M file. Even if your machine physically has that much RAM, it has to be running a 64bit OS and JVM to actually get that much heap for the JVM.
All in all, it's simply a horrible idea to keep such a huge String in memory - and it's totally unneccessary as well - since your method returns an InputStream, all you really need is a FilterInputStream that adds the line breaks on the fly.

It's the StringBuffer. The empty constructor creates a StringBuffer with a initial length of 16 Bytes. Now if you append something and the capacity is not sufficiant, it does an Arraycopy of the internal String Array to a new buffer.
So in fact, with each line appended the StringBuffer has to create a copy of the complete internal Array which nearly doubles the required memory when appending the last line. Together with the UTF-16 representation this results in the observed memory demand.
Edit
Michael is right, when saying, that the internal buffer is not incremented in small portions - it roughly doubles in size each to you need more memory. But still, in the worst case, say the buffer needs to expand capacity just with the very last append, it creates a new array twice the size of the actual one - so in this case, for a moment you need roughly three times the amount of memory.
Anyway, I've learned the lesson: StringBuffer (and Builder) may cause unexpected OutOfMemory errors and I'll always initialize it with a size, at least when I have to store large Strings. Thanks for the question :)

At the last insert into the StringBuffer, you need three times the memory allocated, because the StringBuffer always expands by (size + 1) * 2 (which is already double because of unicode). So a 400GB file could require an allocation of 800GB * 3 == 2.4GB at the end of the inserts. It may be something less, that depends on exactly when the threshold is reached.
The suggestion to concatenate Strings rather than using a Buffer or Builder is in order here. There will be a lot of garbage collection and object creation (so it will be slow), but a much lower memory footprint.
[At Michael's prompting, I investigated this further, and concat wouldn't help here, as it copies the char buffer, so while it wouldn't require triple, it would require double the memory at the end.]
You could continue to use the Buffer (or better yet Builder in this case) if you know the maximum size of the file and initialize the size of the Buffer on creation and you are sure this method will only get called from one thread at a time.
But really such an approach of loading such a large file into memory at once should only be done as a last resort.

I would suggest you use the OS file cache instead of copying the data into Java memory via characters and back to bytes again. If you re-read the file as required (perhaps transforming it as you go) it will be faster and very likely to be simpler
You need over 2 GB because 1 byte letters use char (2-bytes) in memory and when your StringBuffer resizes you need double that (to copy the old array to the larger new array) The new array is typically 50% larger so you need up to 6x the original file size. If the performance wasn't bad enough, you are using StringBuffer instead of StringBuilder which synchronizes every call when it is clearly not needed. (This only slows you down, but uses the same amount of memory)

Others have explained why you're running out of memory. As to how to solve this problem, I'd suggest writing a custom FilterInputStream subclass. This class would read one line at a time, append the "\r\n" characters and buffer the result. Once the line has been read by the consumer of your FilterInputStream, you'd read another line. This way you'd only ever have one line in memory at a time.

I also recommend checking out Commons IO FileUtils class for this. Specifically: org.apache.commons.io.FileUtils#readFileToString. You can also specify the encoding if you know you only are using ASCII.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.