ByteArrayOutputStream capacity restriction

ByteArrayOutputStream capacity restriction - java

I create ByteArrayOutputStream barr = new ByteArrayOutputStream(1);, i.e. with capacity 1 bytes and write to it more than 1 byte barr.write("123456789000000".getBytes());. No error occurs, I check the length of barr it is 15. Why my writing was not blocked or wrapped? Is there a way to prevent of writing more than capacity and which outputstream could be used for that?
I am very limited in available memory and don`t want to write there more than my limitations define
P.S.
Thanks a lot for the answers! I had a following up question
It could be great if you could look

ByteArrayOutputStream will grow the backing array if you try to write more bytes. This is usually thought of as a good thing.
If you want different behavior, you can always write your own OutputStream implementation that throws an IOException if the number of bytes to write goes beyond the capacity.
ByteArrayOutputStream is not final, so you can extend it. I think all you would have to do is override write(int) and write(byte[], int, int) to throw Exception if the number of bytes to write is more than the amount remaining. The fields buf and count are protected so your subclass should be able to see how much of the backing array is written to and the length of the array

That is because the capacity that you specify to the constructor is the initial size of the buffer. If you write more data, the buffer will be automatically re-allocated with a larger size, to fit more data.
As far as I know, there is no way with ByteArrayOutputStream to limit the growth of the buffer. You could use something else instead, for example a java.nio.ByteBuffer, which has a fixed size.

I had the same problem, and eventually turned up an existing implementation of the same thing within Hadoop - BoundedByteArrayOutputStream.java

Related

Java direct ByteBuffer - decode the characters

I would like to read the bytes into the direct ByteBuffer and then decode them without rewrapping the original buffer into the byte[] array to minimize memory allocations.
Hence I'd like to avoid using StandardCharsets.UTF_8.decode() as it allocates the new array on the heap.
I'm stuck on how to decode the bytes. Consider the following code that writes a string into the buffer and then reads id again.
ByteBuffer byteBuffer = ByteBuffer.allocateDirect(2 << 16);
byteBuffer.put("Hello Dávid".getBytes(StandardCharsets.UTF_8));
byteBuffer.flip();
CharBuffer charBuffer = byteBuffer.asCharBuffer();
for (int i = charBuffer.position(); i < charBuffer.length(); i++) {
System.out.println(charBuffer.get());
}
The code output:
䡥汬漠
How can I decode the buffer?

I would like to read the bytes into the direct ByteBuffer and then decode them without rewrapping the original buffer into the byte[] array to minimize memory allocations.
ByteBuffer.asCharBuffer() fits your need, indeed, since both wrappers share the same underlying buffer.
This method's javadoc says:
The new buffer's position will be zero, its capacity and its limit will be the number of bytes remaining in this buffer divided by two
Although it's not explicitly said, it's a hint that CharBuffer uses UTF-16 character encoding over the given buffer. Since we don't have control over what encoding the charbuffer uses, you don't have much choice but to necessarily write the character bytes in that encoding.
byteBuffer.put("Hello Dávid".getBytes(StandardCharsets.UTF_16));
One thing about your printing for loop. Be careful that CharBuffer.length() is actually the number of remaining chars between the buffer's position and limit, so it decreases as you call CharBuffer.get(). So you should use get(int) or change the for termination condition to limit().

You can't specify the encoding of a CharBuffer. See here: What Charset does ByteBuffer.asCharBuffer() use?
Also, since buffers are mutable, I don't see how you could ever possibly create a String from it which are always immutable without doing a memory re-allocation...

Efficient growable java byte array that allows access to bytes

Does anyone know of a Java class to store bytes that satisfies the following conditions?
Stores bytes efficiently (i.e. not one object per bytes).
Grows automatically, like a StringBuilder.
Allows indexed access to all of its bytes (without copying everything to a byte[].
Nothing I've found so far satisfies these. Specifically:
byte[] : Doesn't satisfy 2.
ByteBuffer : Doesn't satisfy 2.
ByteArrayOutputStream : Doesn't satisfy 3.
ArrayList : Doesn't satisfy 1 (AFAIK, unless there's some special-case optimisation).
If I can efficiently remove bytes from the beginning of the array that would be nice. If I were writing it from scratch I would implement it as something like
{ ArrayList<byte[256]> data; int startOffset; int size; }
and then the obvious functions. Does something like this exist?

Most straightforward would be to subclass ByteArrayOutputStream and add functionality to access the underlying byte[].
Removal of bytes from the beginning can be implemented in different ways depending on your requirements. If you need to remove a chunk, System.arrayCopy should work fine, if you need to remove single bytes I would put a headIndex which would keep track of the beginning of the data (performing an arraycopy after enough data is "removed").

There are some implementations for high performance primitive collections such as:
hppc or Koloboke

You'd have to write one. Off the top of my head what I would do is create an ArrayList internally and store the bytes 4 to each int, with appropriate functions for masking off the bytes. Performance will be sub optimal for removing and adding individual bytes. However it will store the object in the minimal size if that is a real consideration, wasting no more than 3 bytes for storage (on top of the overhead for the ArrayList).

The laziest method will be ArrayList. Its not as inefficient as you seem to believe, since Byte instances can and will be shared, meaning there will be only 256 byte objects in the entire VM unless you yourself do a "new Byte()" somewhere.

What is the initial "mode" of ByteBuffer?

While studying the ByteBuffer class I got to thinking about an array wrapped ByteBuffer that might be constructed as follows:
byte data[] = new byte[10];
// Populate data array
ByteBuffer myBuffer = ByteBuffer.wrap(data);
int i = myBuffer.getInt();
Which, I thought, might retrieve the first 4 bytes of my byte array as an int value. However, as I studied further, I seemed to find that the ByteBuffer has two modes which are read and write and we can flip between them using the flip() method. However, since flip is basically a toggle, it pre-supposes than one knows the initial value to meaningfully flip between the read and write states.
What is the definition of the initial state of a ByteBuffer?
write?
read?
A function of how it was created (eg. allocate vs wrap)?

Strictly speaking the ByteBuffer itself doesn't track if it is "read" or "write"; that's merely a function of how it is used. A ByteBuffer can read and write at any time. The reason why we say flip switches the "mode" is because it is part of the common task of writing to the buffer, flipping it, then reading from the buffer.
Indeed, both allocate and wrap set the limit and capacity to be equal to the array size, and the position to zero. This means that read operations can read up to the whole array, and write operations can fill the whole array. You can therefore do either reading or writing with a newly allocated or wrapped ByteBuffer.

Using sun.misc.Unsafe, what is the fastest way to scan bytes from a Direct ByteBuffer?

BACKGROUND
Assume I have a direct ByteBuffer:
ByteBuffer directBuffer = ByteBuffer.allocateDirect(1024);
and assume I am passing the buffer to an AsynchronousSocketChannel to read chunks of data off that socket up to X bytes at a time (1024 in the example here).
The transfer time off the socket into the direct ByteBuffer is fantastic because it is all occurring in native OS memory space; I haven't passed through the JVM "blood-brain" barrier yet...
QUESTION
Assuming my job is to scan through all the bytes read back in from the direct byte buffer, what is the fastest way for me to do this?
I originally asked "... utilizing sun.misc.Unsafe" but maybe that is the wrong assumption.
POSSIBLE APPROACHES
I currently see three approaches and the one I am most curious about is #3:
(DEFAULT) Use ByteBuffer's bulk-get to pull bytes directly from native OS space into an internal byte[1024] construct.
(UNSAFE) Use Unsafe's getByte ops to pull the values directly out of the ByteBuffer skipping all the bounds-checking of ByteBuffer's standard get ops. Peter Lawrey's answer here seemed to suggest that those raw native methods in Unsafe can even be optimized out by the JIT compiler ("intrinsics") to single machine instructions leading to even more fantastic access time. (===UPDATE=== interesting, it looks like the underlying DirectByteBuffer class does exactly this with the get/put ops for those interested.)
(BANANAS) In some crime-against-humanity sort of way, using Unsafe, can I copy the memory region of the direct ByteBuffer to the same memory address my byte[1024] exists at inside of the VM, and just start accessing the array using standard int indexes? (This makes the assumption that the "copyMemory" operation can potentially do something fantastically optimized at the OS level.
It does occur to me that assuming the copyMemory operation does exactly what it advertises, even in the more-optimal OS space, that the #2 approach above is probably still the most optimized since I am not creating duplicates of the buffer before beginning to process it.
This IS different than the "can I use Unsafe to iterate over a byte[] faster?" question as I am not even planning on pulling the bytes into a byte[] internally if it isn't necessary.
Thanks for the time; just curious if anyone (Peter?) has gotten nuts with Unsafe to do something like this.

ByteBuffer methods are extremely fast, because these methods are intrinsics, VM has mapped them to very low level instructions. Compare these two approaches:
byte[] bytes = new byte[N];
for(int m=0; m<M; m++)
for(int i=0; i<bytes.length; i++)
sum += bytes[i];
ByteBuffer bb = ByteBuffer.allocateDirect(N);
for(int m=0; m<M; m++)
for(int i=0; i<bb.remaining(); i++)
sum += bb.get(i);
on my machine, the difference is 0.67ns vs 0.81ns (per loop).
I'm a little surprised that ByteBuffer is not as fast as byte[]. But I think you should definitely NOT copy it to a byte[] then access.

Custom java serialization of message

While writing a message on wire, I want to write down the number of bytes in the data followed by the data.
Message format:
{num of bytes in data}{data}
I can do this by writing the data to a temporary byteArrayOutput stream and then obtaining the byte array size from it, writing the size followed by the byte array. This approach involves a lot of overhead, viz. unnecessary creation of temporary byte arrays, creation of temporary streams, etc.
Do we have a better (considering both CPU and garbage creation) way of achieving this?

A typical approach would be to introduce a re-useable ByteBuffer. For example:
ByteBuffer out = ...
int oldPos = out.position(); // Remember current position.
out.position(oldPos + 2); // Leave space for message length (unsigned short)
out.putInt(...); // Write out data.
// Finally prepend buffer with number of bytes.
out.putShort(oldPos, (short)(out.position() - (oldPos + 2)));
Once the buffer is populated you could then send the data over the wire using SocketChannel.write(ByteBuffer) (assuming you are using NIO).

Here’s what I would do, in order of preference.
Don’t bother about memory consumption and stuff. Most likely this already is the optimal solution unless it takes a lot of time to create the byte representation of your data so that creating it twice is a noticable impact.
(Actually this would be more like #37 on my list, with #2 to #36 being empty.) Include a method in your all your data objects that can calculate the size of the byte representation and takes less resources than it would to create the byte representation.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.