Gunzipping Contents of a URL - Java - java

So as the title suggests, I'm trying to get and gunzip a string from an HTTP request.
urlConn = url.openConnection();
int len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
System.out.println(gbuffer.read(gbytes)+"/"+len);
System.out.println(gbytes);
result = new String(gbytes, "UTF-8");
gbuffer.close();
System.out.println(result);
With some URLs, it works fine. I get output like this:
42/42
[B#96e8209
The entire 42 bytes of my data. Abcdefghij.
With others, it gives me something like the following output:
22/77
[B#1d94882
The entire 77 bytes of
As you can see, the first some-odd bytes of data are very similar if not the same, so they shouldn't be causing these issues. I really can't seem to pin it down. Increasing CONTENT_LENGTH doesn't help, and data streams of sizes both larger and smaller than the ones giving me issues work fine.
EDIT: The issue also does not lie within the raw gzipped data, as Cocoa and Python both gunzip it without issue.
EDIT: Solved. Including final code:
urlConn = url.openConnection();
int offset = 0, len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
while(offset < len)
{
offset += gbuffer.read(gbytes, offset, offset-len);
}
result = new String(gbytes, "UTF-8");
gbuffer.close();

It's possible that the data isn't available in the stream. The first println() you have says you've only read 22 bytes, so only 22 bytes were available when you called read(). You can try looping until you've read CONTENT_LENGTH worth of bytes. Maybe something like:
int index = 0;
int bytesRead = gbuffer.read(gbytes);
while(bytesRead>0 && index<len) {
index += bytesRead;
bytesRead = gbuffer.read(gbytes,index,len-index);
}

GZIPInputStream.read() is not guaranteed to read all data in one call. You should use a loop:
byte[] buf = new byte[1024];
int len = 0, total = 0;
while ((len = gbuffer.read(buf)) > 0) {
total += len;
// do something with data
}

Related

Facing performance issue converting raw data from byte array of http request

We are using below code to extract raw data from http request and its taking quite a long time. Also CPU utilization peeks during this time. Request header has an XML with close to 4000-5000 characters. Is there any way we can re-write below code to save on time and utilization?
private byte[] getRequestBytes(HttpServletRequest request) throws IOException {
byte[] requestBytes = null;
byte[] streamBytes = new byte[1];
InputStream stream = request.getInputStream();
int length = 0;
ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
while((length = stream.read(streamBytes,0,1)) != -1) {
arrayOutputStream.write(streamBytes);
}
requestBytes = arrayOutputStream.toByteArray();
return requestBytes;
}
Java version is 1.7u45
Here are some issues with the code :
byte[] streamBytes = new byte[1]; this buffer is too small use something like 4096.
you are not closing your stream ,which may lead to resource leak.
stream.read(streamBytes,0,1) you are reading only on one byte per loop execution , which leads to poor performance.
use of length variable is redundant you could just do stream.read(streamBytes,0,1) != -1

InputStream reader

I'm currently trying to read in a image file from the server but either getting a incomplete data or
Exception in thread "main"
java.lang.NegativeArraySizeException.
Has this something to do with the buffer size? I have tried to use static size instead of contentlength. Please kindly advise.
URL myURL = new URL(url);
HttpURLConnection connection = (HttpURLConnection)myURL.openConnection();
connection.setRequestMethod("GET");
status = connection.getResponseCode();
if (status == 200)
{
int size = connection.getContentLength() + 1024;
byte[] bytes = new byte[size];
InputStream input = new ByteArrayInputStream(bytes);
FileOutputStream out = new FileOutputStream(file);
input = connection.getInputStream();
int data = input.read(bytes);
while(data != -1){
out.write(bytes);
data = input.read(bytes);
}
out.close();
input.close();
Let's examine the code:
int size = connection.getContentLength() + 1024;
byte[] bytes = new byte[size];
why do you add 1024 bytes to the size? What's the point? The buffer size should be something large enough to avoid too many reads, but small enough to avoid consuming too much memory. Set it at 4096, for example.
InputStream input = new ByteArrayInputStream(bytes);
FileOutputStream out = new FileOutputStream(file);
input = connection.getInputStream();
Why do you create a ByteArrayInputStream, and then forget about it completely? You don't need a ByteArrayInputStream, since you don't read from a byte array, but from the connection's input stream.
int data = input.read(bytes);
This reads bytes from the input. The max number of bytes read is the length of the byte array. The actual number of bytes read is returned and stored in data.
while (data != -1) {
out.write(bytes);
data = input.read(bytes);
}
So you have read data bytes, but you don't write only the first data bytes of the array. You write the whole array of bytes. That is wrong. Suppose your array if of size 4096 and data is 400, instead of writing the 400 bytes that have been read, you write the 400 bytes + the remaining 3696 bytes of the array, which could be 0, or could have values coming from a previous read. It should be
out.write(bytes, 0, data);
Finally:
out.close();
input.close();
If any exception occurs before, those two streams will never be closed. Do that a few times, and your whold OS won't have file descriptos available anymore. Use the try-with-resources statement to be sure your streams are closed, no matter what happens.
This code can help you
input = connection.getInputStream();
byte[] buffer = new byte[4096];
int n = - 1;
OutputStream output = new FileOutputStream( file );
while ( (n = input.read(buffer)) != -1)
{
if (n > 0)
{
output.write(buffer, 0, n);
}
}
output.close();

how to close connection without full the byte array

I tried to make a telnet session many times for some devices to send a command. When I run this program it works with the 512 byte character. If I increase the byte size the program doesn't start another session even when the session is closed because there is a 2048 byte character. How can i fix this problem?
byte[] buff = new byte[512];
int ret_read = 0;
do {
ret_read = instr.read(buff);
if(ret_read > 0) {
//sending some commands
}
} while (ret_read >= 0);
Maybe try to use:
BufferedInputStream is = new BufferedInputStream(socket.getInputStream(), 512);

Reading chunked data

I have a client that sends chunked data. My server is expected to read that data. On the server i am using Tomcat 7.0.42 and expecting this data to be loaded via an existing servlet.
I was looking up google to see if i can get any examples that read chunked data, unfortunately i haven't stumbled upon any.
I found few references of ChunkedInputStream provided by Apache Http Client or ChunkedInputFilter provided by Tomcat. But i couldn't find any decent examples on how best to use these.
If any of you guys have any experience with reading/parsing chunked data, please share pointers around those.
Java version used - 1.7.0.45
In my existing servlet code, i have been handling simple request via post using NIO. But now if a client has set transfer encoding to chunked, i need to specifically handle that. So i am having a forking code in place. Something like below,
inputStream = httpServletRequest.getInputStream();
if ("chunked".equals(getRequestHeader(httpServletRequest, "Transfer-Encoding"))) {
// Need to process chunked data
} else {
// normal request data
if (inputStream != null) {
int contentLength = httpServletRequest.getContentLength()
if (contentLength <= 0) {
return new byte[0];
}
ReadableByteChannel channel = Channels.newChannel(inputStream);
byte[] postData = new byte[contentLength];
ByteBuffer buf = ByteBuffer.allocateDirect(contentLength);
int numRead = 0;
int counter = 0;
while (numRead >= 0) {
buf.rewind();
numRead = channel.read(buf);
buf.rewind();
for (int i = 0; i < numRead; i++) {
postData[counter++] = buf.get();
}
}
return postData;
}
}
So if you observe, the normal request case is based on the "content-length" being available, while for chunked encoding, that is not present. And hence an alternative process to handle chunked data.
Thanks,
Vicky
See HTTP 1/1 Chunked Transfer Coding.
You're servlet will be served with chunks of variable size. You'll get the size of each chunk in it's first line. The protocol is quiet simple so you could implement it by yourself.
Following NIO based code worked for me,
ReadableByteChannel channel = Channels.newChannel(chunkedInputStream);
// content length is not known upfront, hence a initial size
int bufferLength = 2048;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ByteBuffer byteBuffer = ByteBuffer.allocate(bufferLength);
int numRead = 0;
while (numRead >= 0) {
byteBuffer.rewind();
//Read bytes from the channel
numRead = channel.read(byteBuffer);
byteBuffer.rewind();
if (numRead > 0) {
byte[] dataBytes = byteBuffer.array();
baos.write(dataBytes, 0, dataBytes.length);
}
byteBuffer.clear();
}
return baos.toByteArray();

Why initialize byte array to 1024 while reading or writing a file?

In java input or output stream , there always has a byte array size of 1024.
Just like below:
URL url = new URL(src);
URLConnection connection = url.openConnection();
InputStream is = connection.getInputStream();
OutputStream os = new FileOutputStream("D:\\images"+"\\"+getName(src)+getExtension(src));
byte[] byteArray = new byte[1024];
int len = 0;
while((len = is.read(byteArray))!= -1){
os.write(byteArray, 0, len);
}
is.close();
os.close();
Why initialize this array to 1024?
That is called buffering and ,each time you overwrite the contents of the buffer each time you go through the loop.
Simply reading file in chunks, instead of allocating memory for the file content at a time.
Reason behind this to do is you will become a clear victim of OutOfMemoryException if file is too large.
And coming to the specific question, That is need not to be 1024, even you can do with 500. But a minimum 1024 is a common usage.

Categories