What are some practical areas where ByteArrayInputStream and/or ByteArrayOutputStream are used? Examples are also welcome.
If one searches for examples, one finds usually something like:
byte[] buf = { 16, 47, 12 };
ByteArrayInputStream byt = new ByteArrayInputStream(buf);
It does not help where or why should one use it. I know that they are used when working with images, ZIP files, or writing to ServletOutputStream.
ByteArrayInputStream: every time you need an InputStream (typically because an API takes that as argument), and you have all the data in memory already, as a byte array (or anything that can be converted to a byte array).
ByteArrayOutputStream: every time you need an OutputStream (typically because an API writes its output to an OutputStream) and you want to store the output in memory, and not in a file or on the network.
Related
I have been trying to implement decompressing text compressed in GZIP format
Below we have method I implemented
private byte[] decompress(String compressed) throws Exception {
ByteArrayOutputStream out = new ByteArrayOutputStream();
ByteArrayInputStream in = new
ByteArrayInputStream(compressed.getBytes(StandardCharsets.UTF_8));
GZIPInputStream ungzip = new GZIPInputStream(in);
byte[] buffer = new byte[256];
int n;
while ((n = ungzip.read(buffer)) >= 0) {
out.write(buffer, 0, n);
}
return out.toByteArray();
}
And now I am testing the solution for following compressed text:
H4sIAAAAAAAACjM0MjYxBQAcOvXLBQAAAA==
And there is Not a gzip format exception.
I tried different ways but there still is this error. Maybe anyone has idea what am I doing wrong?
That's not gzip formatted. In general, compressed cannot be a string (because compressed data is bytes, and a string isn't bytes. Some languages / tutorials / 1980s thinking conflate the 2, but it's the 2020s. We don't do that anymore. There are more characters than what's used in english).
It looks like perhaps the following has occurred:
Someone has some data.
They gzipped it.
They then turned the gzipped stream (which are bytes) into characters using Base64 encoding.
They sent it to you.
You now want to get back to the data.
Given that 2 transformations occurred (first, gzip it, then, base64 it), you need to also do 2 transformations, in reverse. You need to:
Take the input string, and de-base64 it, giving you bytes.
You then need to take these bytes and decompress them.
and now you have the original data back.
Thus:
byte[] gzipped = java.util.Base64.getDecoder().decode(compressed);
var in = new GZIPInputStream(new ByteArrayInputStream(gzipped));
return in.readAllBytes();
Note:
Pushing the data from input to outputstream like this is a waste of resources and a bunch of finicky code. There is no need to write this; just call readAllBytes.
If the incoming Base64 is large, there are ways to do this in a streaming fashion. This would require that this method takes in a Reader (instead of a String which cannot be streamed), and would return an InputStream instead of a byte[]. Of course if the input is not particularly large, there is no need. The above approach is somewhat wasteful - both the base64-ed data, and the un-base64ed data, and the decompressed data is all in memory at the same time and you can't avoid this nor can the garbage collector collect any of this stuff in between (because the caller continues to ref that base64-ed string most likely).
In other words, if the compressed ratio is, say, 50%, and the total uncompressed data is 100MB in size, this method takes MORE than:
100MB (uncompressed ) + 50MB (compressed) + 50*4/3 = 67MB (compressed but base64ed) = ~ 217MB of memory.
You know better than we do how much heap your VM is running on, and how large the input data is likely to ever get.
NB: Base64 transfer is extremely inefficient, taking 4 bytes of base64 content for every 3 bytes of input data, and if the data transfer is in UTF-16, it's 8 bytes per 3, even. Ouch. Given that the content was GZipped, this feels a bit daft: First we painstakingly reduce the size of this thing, and then we casually inflate it by 33% for probably no good reason. You may want to check the 'pipe' that leads you to this, possibly you can just... eliminate the base64 aspect of this.
For example, if you have a wire protocol and someone decided that JSON was a good idea, then.. simply.. don't. JSON is not a good idea if you have the need to transfer a bunch of raw data. Use protobuf, or send a combination of JSON and blobs, etc.
I'm writing a binary file header from java, and I had been using fixed values for the file size in the header. That was easy:
OutputStream os = new FileOutputStream(filename);
os.write(0x36);//LSB
os.write(0x10);
os.write(0x0E);
os.write(0x00);//MSB
But now I want to be more dynamic and write whatever size buffer I have to a file. So I might get the size of my array as say 4054; I want to take that and either break it apart and do four os.writes, or maybe there's a way to write it all at once.
OutputStream seems to only take one byte at a time, but I'd like to still use it as all the rest of my header code is already using it.
Use a ByteBuffer, so you can control whether it writes LSB or MSB first.
ByteBuffer buf = ByteBuffer.allocate(4).order(ByteOrder.LITTLE_ENDIAN);
buf.putLong(value);
os.write(buf.array());
I am implementing some kind of file viewer/file explorer as a Web-Application. Therefore I need to read files from the hard disk of the system. Of course I have to deal with small files and large files and I want the fastest and most performant way of doing this.
Now I have the following code and want to ask the "big guys" who have a lot of knowledge about efficiently reading (large) files if I am doing it the right way:
RandomAccessFile fis = new RandomAccessFile(filename, "r");
FileChannel fileChannel = fis.getChannel();
// Don't load the whole file into the memory, therefore read 4096 bytes from position on
MappedByteBuffer mappedByteBuffer = fileChannel.map(MapMode.READ_ONLY, position, 4096);
byte[] buf = new byte[4096];
StringBuilder sb = new StringBuilder();
while (mappedByteBuffer.hasRemaining()) {
// Math.min(..) to avoid BufferUnderflowException
mappedByteBuffer.get(buf, 0, Math.min(4096, map1.remaining()));
sb.append(new String(buf));
}
LOGGER.debug(sb.toString()); // Debug purposes
I hope you can help me and give me some advices.
When you are going to view arbitrary files, including potentially large files, I’d assume that there’s possibility that these files are not actually text files or that you may encounter files which have different encodings.
So when you are going to view such files as text on a best-effort basis, you should think about which encoding you want to use and make sure that failures do not harm your operation. The constructor you use with new String(buf) does replace invalid characters, but it is redundant to construct a new String instance and append it to a StringBuilder afterwards.
Generally, you shouldn’t go so many detours. Since Java 7, you don’t need a RandomAccessFile (or FileInputStream) to get a FileChannel. A straight-forward solution would look like
// Instead of StandardCharsets.ISO_8859_1 you could also use Charset.defaultCharset()
CharsetDecoder decoder = StandardCharsets.ISO_8859_1.newDecoder()
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.replaceWith(".");
try(FileChannel fileChannel=FileChannel.open(Paths.get(filename),StandardOpenOption.READ)) {
//Don't load the whole file into the memory, therefore read 4096 bytes from position on
ByteBuffer mappedByteBuffer = fileChannel.map(MapMode.READ_ONLY, position, 4096);
CharBuffer cb = decoder.decode(mappedByteBuffer);
LOGGER.debug(cb.toString()); // Debug purposes
}
You can operate with the resulting CharBuffer directly or invoke toString() on it to get a String instance (but of course, avoid doing it multiple times). The CharsetDecoder also allows to re-use a CharBuffer, however, that may not have such a big impact on the performance. What you should definitely avoid, is to concatenate all these chunks to a big string.
Currently I do have the problem that this piece of code will be called >500k of times. The size of the compressed byte[] is less than 1KB. Every time the method is called all of the streams has to been created. So I am looking for a way to improve this code.
private byte[] unzip(byte[] data) throws IOException, DataFormatException {
byte[] unzipData = new byte[4096];
try (ByteArrayInputStream in = new ByteArrayInputStream(data);
GZIPInputStream gzipIn = new GZIPInputStream(in);
ByteArrayOutputStream out = new ByteArrayOutputStream()) {
int read = 0;
while( (read = gzipIn.read(unzipData)) != -1) {
out.write(unzipData, 0, read);
}
return out.toByteArray();
}
}
I already tried to replace ByteArrayOutputStream with a ByteBuffer, but at the time of creation I don't know how many bytes I need to allocate.
Also, I tried to use Inflater but I stumbled across the problem descriped here.
Any other idea what I could do to improve the perfomance of this code.
UPDATE#1
Maybe this lib helps someone.
Also there is an open JDK-Bug.
Profile your application, to be sure that you're really spending optimizable time in this function. It doesn't matter how many times you call this function; if it doesn't account for a significant fraction of overall program execution time, then optimization is wasted.
Pre-size the ByteArrayOutputStream. The default buffer size is 32 bytes, and resizes require copying all existing bytes. If you know that your decoded arrays will be around 1k, use new ByteArrayOutputStream(2048).
Rather than reading a byte at a time, read a block at a time, using a pre-allocated byte[]. Beware that you must use the return value from read as an input to write. Better, use something like Jakarta Commons IOUtils.copy() to avoid mistakes.
I'm not sure if it applies in your case, but I've found incredible speed difference when comparing using the default buffer size of GZIPInputStream vs increasing to 65536.
example: using a 500M input file ->
new GZIPInputStream(new FileInputStream(path.toFile())) // takes 4 mins to process
vs
new GZIPInputStream(new FileInputStream(path.toFile()), 65536) // takes 10s
J
More details can be found here http://java-performance.info/java-io-bufferedinputstream-and-java-util-zip-gzipinputstream/
Both BufferedInputStream and GZIPInputStream have internal buffers.
Default size for the former one is 8192 bytes and for the latter one
is 512 bytes. Generally it worth increasing any of these sizes to at
least 65536.
You can use the Inflater class method reset() to reuse the Inflater object without having to recreate it each time. You will have a little bit of added programming to do in order to decode the gzip header and perform the integrity check with the gzip trailer. You would then use Inflater with the nowrap option to decompress the raw deflated data after then gzip header and before the trailer.
I simply use
IOUtils.copy(myInputStream, myOutputStream);
And I see that before calling IOUtils.copy the input stream is avaible to read and after not.
flux.available()
(int) 1368181 (before)
(int) 0 (after)
I saw some explanation on this post, and I see I can copy the bytes from my InputStream to a ByteArrayInputStream and then use mark(0) and read(), in order to read multiple times an input stream.
Here is the code resulted (which is working).
I find this code very verbose, and I'd like if there is a better solution to do that.
ByteArrayInputStream fluxResetable = new ByteArrayInputStream(IOUtils.toByteArray(myInputStream));
fluxResetable.mark(0);
IOUtils.copy(fluxResetable, myOutputStream);
fluxResetable.reset();
An InputStream, unless otherwise stated, is single shot: you consume it once and that's it.
If you want to read it many times, that isn't just a stream any more, it's a stream with a buffer. Your solution reflects that accurately, so it is acceptable. The one thing I would probably change is to store the byte array and always create a new ByteArrayInputStream from it when needed, rather than resetting the same one:
byte [] content = IOUtils.toByteArray(myInputStream);
IOUtils.copy(new ByteArrayInputStream(content), myOutputStream);
doSomethingElse(new ByteArrayInputStream(content));
The effect is more or less the same but it's slightly easier to see what you're trying to do.