Reading chunked data

Reading chunked data - java

I have a client that sends chunked data. My server is expected to read that data. On the server i am using Tomcat 7.0.42 and expecting this data to be loaded via an existing servlet.
I was looking up google to see if i can get any examples that read chunked data, unfortunately i haven't stumbled upon any.
I found few references of ChunkedInputStream provided by Apache Http Client or ChunkedInputFilter provided by Tomcat. But i couldn't find any decent examples on how best to use these.
If any of you guys have any experience with reading/parsing chunked data, please share pointers around those.
Java version used - 1.7.0.45
In my existing servlet code, i have been handling simple request via post using NIO. But now if a client has set transfer encoding to chunked, i need to specifically handle that. So i am having a forking code in place. Something like below,
inputStream = httpServletRequest.getInputStream();
if ("chunked".equals(getRequestHeader(httpServletRequest, "Transfer-Encoding"))) {
// Need to process chunked data
} else {
// normal request data
if (inputStream != null) {
int contentLength = httpServletRequest.getContentLength()
if (contentLength <= 0) {
return new byte[0];
}
ReadableByteChannel channel = Channels.newChannel(inputStream);
byte[] postData = new byte[contentLength];
ByteBuffer buf = ByteBuffer.allocateDirect(contentLength);
int numRead = 0;
int counter = 0;
while (numRead >= 0) {
buf.rewind();
numRead = channel.read(buf);
buf.rewind();
for (int i = 0; i < numRead; i++) {
postData[counter++] = buf.get();
}
}
return postData;
}
}
So if you observe, the normal request case is based on the "content-length" being available, while for chunked encoding, that is not present. And hence an alternative process to handle chunked data.
Thanks,
Vicky

See HTTP 1/1 Chunked Transfer Coding.
You're servlet will be served with chunks of variable size. You'll get the size of each chunk in it's first line. The protocol is quiet simple so you could implement it by yourself.

Following NIO based code worked for me,
ReadableByteChannel channel = Channels.newChannel(chunkedInputStream);
// content length is not known upfront, hence a initial size
int bufferLength = 2048;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ByteBuffer byteBuffer = ByteBuffer.allocate(bufferLength);
int numRead = 0;
while (numRead >= 0) {
byteBuffer.rewind();
//Read bytes from the channel
numRead = channel.read(byteBuffer);
byteBuffer.rewind();
if (numRead > 0) {
byte[] dataBytes = byteBuffer.array();
baos.write(dataBytes, 0, dataBytes.length);
}
byteBuffer.clear();
}
return baos.toByteArray();

Related

How to pass the body of a request as a stream to a websocket without corrupting data?

I want to get the request body as a stream and while it's uploading, pass it to a web socket. This is my current implementation:
InputStream is = request.getInputStream();
int total = 0;
int bytes;
byte[] buff = new byte[8192];
while ((bytes = is.read(buff)) != -1) {
total = total + bytes;
this.simpMessagingTemplate.convertAndSend("test", Arrays.toString(buff));
}
this.simpMessagingTemplate.convertAndSend("test", "END");
When sending it now, I'm not sure if the data gets corrupted or duplciated but I can't parse the request. Once the data is sent over, I send END so I know that I can parse the data on the frontend.

BufferedReader can't read long line

I am reading this file: https://www.reddit.com/r/tech/top.json?limit=100 into a BufferedReader from a HttpUrlConnection. I've got it to read some of the file, but it only reads about a 1/10th of what it should. It doesn't change anything if I change the size of the input buffer - it prints the same thing just in smaller chunks:
try{
URL url = new URL(urlString);
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder sb = new StringBuilder();
int charsRead;
char[] inputBuffer = new char[500];
while(true) {
charsRead = reader.read(inputBuffer);
if(charsRead < 0) {
break;
}
if(charsRead > 0) {
sb.append(String.copyValueOf(inputBuffer, 0, charsRead));
Log.d(TAG, "Value read " + String.copyValueOf(inputBuffer, 0, charsRead));
}
}
reader.close();
return sb.toString();
} catch(Exception e){
e.printStackTrace();
}
I believe the issue is that the text is all on one line since it's not formatted in json correctly, and BufferedReader can only take a line so long. Is there any way around this?

read() should continue to read as long as charsRead > 0. Every time it makes a call to read, the reader marks where it last read from and the next call starts at that place and continues on until there is no more to read. There is no limit to the size it can read. The only limit is the size of the array but the overall size of the file there is none.
You could try the following:
try(InputStream is = connection.getInputStream();
ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
int read = 0;
byte[] buffer = new byte[4096];
while((read = is.read(buffer)) > 0) {
baos.write(buffer, 0, read);
}
return new String(baos.toByteArray(), StandardCharsets.UTF_8);
} catch (Exception ex){}
The above method is using purely the bytes from the stream and reading it into the output stream, then creating the string from that.

I suggest using 3d party Http client. It could reduce your code literally to just a few lines and you don't have to worry about all those little details. Bottom line is - someone already wrote the code that you are trying to write. And it works and already well tested. Few suggestions:
Apache Http Client - A well known and popular Http client, but might be a bit bulky and complicated for a simple case like yours.
Ok Http Client - Another well-known Http client
And finally, my favorite (because it is written by me) MgntUtils Open Source library that has Http Client. Maven artifacts can be found here, GitHub that includes the library itself as a jar file, source code, and Javadoc can be found here and JavaDoc is here
Just to demonstrate the simplicity of what you want to do here is the code using MgntUtils library. (I tested the code and it works like a charm)
private static void testHttpClient() {
HttpClient client = new HttpClient();
client.setContentType("application/json; charset=utf-8");
client.setConnectionUrl("https://www.reddit.com/r/tech/top.json?limit=100");
String content = null;
try {
content = client.sendHttpRequest(HttpMethod.GET);
} catch (IOException e) {
content = client.getLastResponseMessage() + TextUtils.getStacktrace(e, false);
}
System.out.println(content);
}

My wild guess is that your default platform charset was UTF-8 and encoding problems were raised. For remote content the encoding should be specified, and not assumed to be equal to the default encoding on your machine.
The charset of the response data must be correct. For that the headers must be inspected. The default should be Latin-1, ISO-8859-1, but browsers interprete that
as Windows Latin-1, Cp-1252.
String charset = connection.getContentType().replace("^.*(charset=|$)", "");
if (charset.isEmpty()) {
charset = "Windows-1252"; // Windows Latin-1
}
Then you can better read bytes, as there is no exact correspondence to the number of bytes read and the number of chars read. If at the end of a buffer is the first char of a surrogate pair, two UTF-16 chars that form a Unicode glyph, symbol, code point above U+FFFF, I do not know the efficiency of the underlying "repair."
BufferedInputStream in = new BufferedInputStream(connection.getInputStream());
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[512];
while (true) {
int bytesRead = in.read(buffer);
if (bytesRead < 0) {
break;
}
if (bytesRead > 0) {
out.write(buffer, 0, bytesRead);
}
}
return out.toString(charset);
And indeed it is safe to do:
sb.append(inputBuffer, 0, charsRead);
(Taking a copy was probably a repair attempt.)
By the way char[500] takes almost twice the memory of byte[512].
I saw that the site uses gzip compression in my browser. That makes sense for text such as json. I mimicked it by setting a request header Accept-Encoding: gzip.
URL url = new URL("https://www.reddit.com/r/tech/top.json?limit=100");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestProperty("Accept-Encoding", "gzip");
try (InputStream rawIn = connection.getInputStream()) {
String charset = connection.getContentType().replaceFirst("^.*?(charset=|$)", "");
if (charset.isEmpty()) {
charset = "Windows-1252"; // Windows Latin-1
}
boolean gzipped = "gzip".equals(connection.getContentEncoding());
System.out.println("gzip=" + gzipped);
try (InputStream in = gzipped ? new GZIPInputStream(rawIn)
: new BufferedInputStream(rawIn)) {
ByteArrayOutputStream out = new ByteArrayOutputStream();
byte[] buffer = new byte[512];
while (true) {
int bytesRead = in.read(buffer);
if (bytesRead < 0) {
break;
}
if (bytesRead > 0) {
out.write(buffer, 0, bytesRead);
}
}
return out.toString(charset);
}
}
It might be for not gzip conform "browsers" the content length of the compressed content was erroneously set in the response. Which is a bug.

I believe the issue is that the text is all on one line since it's not formatted in json correctly, and BufferedReader can only take a line so long.
This explanation is not correct:
You are not reading a line at a time, and BufferedReader is not treating the text as line based.
Even when you do read from a BufferedReader a line at a time (i.e. using readLine()) the only limits on the length of a line are the inherent limits of a Java String length (2^31 - 1 characters), and the size of your heap.
Also, note that "correct" JSON formatting is subjective. The JSON specification says nothing about formatting. It is common for JSON emitters to not waste CPU cycles and network bandwidth on formatting for JSON that a human will only rarely read. Application code that consumes JSON needs to be able cope with this.
So what is actually going on?
Unclear, but here are some possibilities:
A StringBuilder also has an inherent limit of 2^31 - 1 characters. However, with (at least) some implementations, if you attempt to grow a StringBuilder beyond that limit, it will throw an OutOfMemoryError. (This behavior doesn't appear to be documented, but it is clear from reading the source code in Java 8.)
Maybe you are reading the data too slowly (e.g. because your network connection is too slow) and the server is timing out the connection.
Maybe the server has a limit on the amount of data that it is willing to send in a response.
Since you haven't mentioned any exceptions and you always seem to get the same amount of data, I suspect the 3rd explanation is the correct one.

Java Socket HTTP GET request

I'm trying to create a simple Java program that create an HTTP request to a HTTP server hosted locally, by using Socket.
This is my code:
try
{
//Create Connection
Socket s = new Socket("localhost",80);
System.out.println("[CONNECTED]");
DataOutputStream out = new DataOutputStream(s.getOutputStream());
DataInputStream in = new DataInputStream(s.getInputStream());
String header = "GET / HTTP/1.1\n"
+"Host:localhost\n\n";
byte[] byteHeader = header.getBytes();
out.write(byteHeader,0,header.length());
String res = "";
/////////////READ PROCESS/////////////
byte[] buf = new byte[in.available()];
in.readFully(buf);
System.out.println("\t[READ PROCESS]");
System.out.println("\t\tbuff length->"+buf.length);
for(byte b : buf)
{
res += (char) b;
}
System.out.println("\t[/READ PROCESS]");
/////////////END READ PROCESS/////////////
System.out.println("[RES]");
System.out.println(res);
System.out.println("[CONN CLOSE]");
in.close();
out.close();
s.close();
}catch(Exception e)
{
e.printStackTrace();
}
But by when I run it the Server reponse with a '400 Bad request error'.
What is the problem? Maybe some HTTP headers to add but I don't know which one to add.

There are a couple of issues with your request:
String header = "GET / HTTP/1.1\n"
+ "Host:localhost\n\n";
The line break to be used must be Carriage-Return/Newline, i.e. you should change that to
String header = "GET / HTTP/1.1\r\n"
+ "Host:localhost\r\n\r\n";
Next problem comes when you write the data to the OutputStream:
byte[] byteHeader = header.getBytes();
out.write(byteHeader,0,header.length());
The call of readBytes without the specification of a charset uses the system's charset which might be a different than the one that is needed here, better use getBytes("8859_1"). When writing to the stream, you use header.length() which might be different from the length of the resulting byte-array if the charset being used leads to the conversion of one character into multiple bytes (e.g. with UTF-8 as encoding). Better use byteHeader.length.
out.write(byteHeader,0,header.length());
String res = "";
/////////////READ PROCESS/////////////
byte[] buf = new byte[in.available()];
After sending the header data you should do a flush on the OutputStream to make sure that no internal buffer in the streams being used prevents the data to actually be sent to the server.
in.available() only returns the number of bytes you can read from the InputStream without blocking. It's not the length of the data being returned from the server. As a simple solution for starters, you can add Connection: close\r\n to your header data and simply read the data you're receiving from the server until it closes the connection:
StringBuffer sb = new StringBuffer();
byte[] buf = new byte[4096];
int read;
while ((read = in.read(buf)) != -1) {
sb.append(new String(buf, 0, read, "8859_1"));
}
String res = sb.toString();
Oh and independent form the topic of doing an HTTP request by your own:
String res = "";
for(byte b : buf)
{
res += (char) b;
}
This is a performance and memory nightmare because Java is actually caching all strings in memory in order to reuse them. So the internal cache gets filled with each result of this concatenation. A response of 100 KB size would mean that at least 5 GB of memory are allocated during that time leading to a lot of garbage collection runs in the process.
Oh, and about the response of the server: This most likely comes from the invalid line breaks being used. The server will regard the whole header including the empty line as a single line and complains about the wrong format of the GET-request due to additional data after the HTTP/1.1.

According to HTTP 1.1:
HTTP/1.1 defines the sequence CR LF as the end-of-line marker for all
protocol elements except the entity-body [...].
So, you'll need all of your request to be ending with \r\n.

Facing performance issue converting raw data from byte array of http request

We are using below code to extract raw data from http request and its taking quite a long time. Also CPU utilization peeks during this time. Request header has an XML with close to 4000-5000 characters. Is there any way we can re-write below code to save on time and utilization?
private byte[] getRequestBytes(HttpServletRequest request) throws IOException {
byte[] requestBytes = null;
byte[] streamBytes = new byte[1];
InputStream stream = request.getInputStream();
int length = 0;
ByteArrayOutputStream arrayOutputStream = new ByteArrayOutputStream();
while((length = stream.read(streamBytes,0,1)) != -1) {
arrayOutputStream.write(streamBytes);
}
requestBytes = arrayOutputStream.toByteArray();
return requestBytes;
}
Java version is 1.7u45

Here are some issues with the code :
byte[] streamBytes = new byte[1]; this buffer is too small use something like 4096.
you are not closing your stream ,which may lead to resource leak.
stream.read(streamBytes,0,1) you are reading only on one byte per loop execution , which leads to poor performance.
use of length variable is redundant you could just do stream.read(streamBytes,0,1) != -1

Gunzipping Contents of a URL - Java

So as the title suggests, I'm trying to get and gunzip a string from an HTTP request.
urlConn = url.openConnection();
int len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
System.out.println(gbuffer.read(gbytes)+"/"+len);
System.out.println(gbytes);
result = new String(gbytes, "UTF-8");
gbuffer.close();
System.out.println(result);
With some URLs, it works fine. I get output like this:
42/42
[B#96e8209
The entire 42 bytes of my data. Abcdefghij.
With others, it gives me something like the following output:
22/77
[B#1d94882
The entire 77 bytes of
As you can see, the first some-odd bytes of data are very similar if not the same, so they shouldn't be causing these issues. I really can't seem to pin it down. Increasing CONTENT_LENGTH doesn't help, and data streams of sizes both larger and smaller than the ones giving me issues work fine.
EDIT: The issue also does not lie within the raw gzipped data, as Cocoa and Python both gunzip it without issue.
EDIT: Solved. Including final code:
urlConn = url.openConnection();
int offset = 0, len = CONTENT_LENGTH
byte[] gbytes = new byte[len];
gbuffer = new GZIPInputStream(urlConn.getInputStream(), len);
while(offset < len)
{
offset += gbuffer.read(gbytes, offset, offset-len);
}
result = new String(gbytes, "UTF-8");
gbuffer.close();

It's possible that the data isn't available in the stream. The first println() you have says you've only read 22 bytes, so only 22 bytes were available when you called read(). You can try looping until you've read CONTENT_LENGTH worth of bytes. Maybe something like:
int index = 0;
int bytesRead = gbuffer.read(gbytes);
while(bytesRead>0 && index<len) {
index += bytesRead;
bytesRead = gbuffer.read(gbytes,index,len-index);
}

GZIPInputStream.read() is not guaranteed to read all data in one call. You should use a loop:
byte[] buf = new byte[1024];
int len = 0, total = 0;
while ((len = gbuffer.read(buf)) > 0) {
total += len;
// do something with data
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.