HTTPClient MultipartEntity seems to be adding garbage text to StringBody parts

HTTPClient MultipartEntity seems to be adding garbage text to StringBody parts - java

I am trying to use Apache Commons's HttpClient to send a multipart POST request with a binary file and a couple of string parameters.
However, it seems that somewhere along the line, some garbage text is making its way into my string parameters. For instance, as confirmed through the debugger, the sizeBody variable here is indeed holding the value "100":
StringBody sizeBody = new StringBody("100", Charset.forName("UTF-8"));
However, if I listen to the request with Wireshark, I see this:
--o2mm51iGsng9w0Pb-Guvf8XDwXgG7BPcupLnaa
Content-Disposition: form-data; name="x"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
100
a5
--o2mm51iGsng9w0Pb-Guvf8XDwXgG7BPcupLnaa
Note the a5 after the 100.
What could be causing this? Where should I look?

What you are seeing are likely to be chunk headers used by so the called chunk transfer encoding [1]. See if the message head has a Transfer-Encoding: chunked field.
[1] http://en.wikipedia.org/wiki/Chunked_transfer_encoding

I had this same issue testing my POSTs with NanoHTTPD receiving them. It's indeed that HttpClient is using chunked transfer encoding, which NanoHTTPD doesn't support. It did that in my case because the binary file was supplied via an InputStreamBody, and since that cannot determine its own content length (it just sends back -1), the client uses chunked encoding.
I switched to using a ByteArrayBody for the file contents, and since that and StringBody can supply content lengths, the requests now do not use chunked encoding.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
IOUtils.copy (fileInputStream, baos); // from Apache Commons IO, or roll your own
ContentBody filePart = new ByteArrayBody (baos.toByteArray(), fileName);
Of course, if your file is huge, loading the whole thing into a byte array as above could cause memory problems.

Related

How to parse http header to get uploaded file and save it to disk

I am developing a http web server in java using socket which gets post header InputStream and then I processed the header with some String split by the header 'boundary' and '\r\n' and got all Headers, Cookies in HashMap(s) and got the contents of the file in a String and saved that String to a file on the server. It works fine when I upload text file or java source file to the server but in case of doc, pdf and image it shows corrupted file and corrupted image.
PrintWriter out;
try {
out = new PrintWriter(new OutputStreamWriter(
new FileOutputStream(UploadPath + "\\" + FileName)));
out.print(FileData);
out.close();
} catch (Exception e) {
}
Above code will save contents of 'FileData' at 'UploadPath' with 'FileName'.
In case of jpg or doc file String FileData is having binary contents of the uploaded file which saved by the above code and also I checked both files for their size in bytes and both were having equal size in byte and I also matched contents of the actual file and content FileData String by debugging the application.
I also checked actual uploaded image file and the FileData String and both matches byte by byte but the image uploaded is totally corrupted.
After searching on internet for this complete day I am not able to find the solution for this. Please help.
I do not want to use apache commons which was suggested on most of the pages.
If you want to see more codes then I will post them.

As you are dealing with binary data, you should use byte and OutputStream instead of String and Writer: If you put some bytes in a string, they are decoded
So if you have found the boundaries of the binary data in your request (represented by a byte array), copy the content byte-wise directly to an output stream.
This only works, if your request is already completely in memory. Regarding file upload, this is not always possible, because you can run out of memory, if you have large files.
So the best way to implement a file upload is to read only the next byte from the stream: This is the difference between splitting and parsing. Actually you need a real parser for multipart form data. Now things get complex, and this is the reason why everybody uses commons-fileupload: It's not that easy to detect the boundaries, if your "look ahead" is just some bytes.
I had to implement a clean-room implementation for legal reasons. If that is not your situation, look in the the source of commons-fileupload. And have a look at the RFC

Since you use Java 7, this is quite easy: use Files.copy().
Also, DO NOT store file contents as Strings, those will only ever be valid for text files. Use classical InputStream/OutputStreams to read/write.

You could read it using an array of bytes like the following
InputStream is = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[16384];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
buffer.flush();
return buffer.toByteArray();

I solved my problem like this,
while (inputRequest.available()>0) {
try {
int t = inputRequest.read();
ch = (char) t;
//here i checked each byte data
} catch (IOException e) {
}
}
Problem was that the input stream was having http header fields along with the file content located anywhere in the stream, so I firstly stored the bytes in a temp String until i get '\r' and '\n' in the stream. In this way I got the boundary for multipart/form-data HTTP header and then I compared the temp String until I found the boundary and other known header contents and then I sent the input-stream to file output-stream. But in some cases header may contain other contents after file content so and definitely it will have a ending boundary so I was continuously keeping track of each byte that I have read and then I sent each byte individually to the file output-stream. Here is the sample http header-
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://localhost/index.html
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------274761981030199
Content-Length: 1405
-----------------------------274761981030199
Content-Disposition: form-data; name="name1"
pppppp
-----------------------------274761981030199
Content-Disposition: form-data; name="name2"
rrrrrrrrr
-----------------------------274761981030199
Content-Disposition: form-data; name="name3"
eeeeeeee
-----------------------------274761981030199
Content-Disposition: form-data; name="name4"
2
-----------------------------274761981030199
Content-Disposition: form-data; name="name5"; filename="CgiPost.java"
Content-Type: text/x-java-source
import java.io.*;
// This appears in Core Web Programming from
// Prentice Hall Publishers, and may be freely used
// or adapted. 1997 Marty Hall, hall#apl.jhu.edu.
public class CgiPost extends CgiGet
{
public static void main(String[] args)
{
try
{
DataInputStream in
= new DataInputStream(System.in);
String[] data = { in.readLine() };
CgiPost app = new CgiPost("CgiPost", data, "POST");
app.printFile();
} catch(IOException ioe) {
System.out.println
("IOException reading POST data: " + ioe);
}
}
public CgiPost(String name, String[] args,
String type) {
super(name, args, type);
}
}
-----------------------------274761981030199
Content-Disposition: form-data; name="name6"
pppppppppp
-----------------------------274761981030199--
NOTE: In some cases there are chances that your application code reaches to inputRequest.available() but the browser haven't sent the request yet, in this case inputRequest.available() will always return 0 and your while loop will exit immediately. To avoid this first read one byte using inputRequest.read() and then execute code because you can guess the first byte from others in case of http header.
If you are using some count int then use long instead of int, because stream stops in some cases where int variable reaches its limit.
Try to transfer the int value returned from int t = inputRequest.read() to fileoutputstream.write(t).
inputRequest.available() keeps decreasing as you are reading byte form inputstream, it returns number of bytes available in the stream.
In this way you can upload files of large size without any corruption in it.
Leave your comment if anyone needs more details about this.

How do I send an HTTP response without Transfer Encoding: chunked?

I have a Java Servlet that responds to the Twilio API. It appears that Twilio does not support the chunked transfer that my responses are using. How can I avoid using Transfer-Encoding: chunked?
Here is my code:
// response is HttpServletResponse
// xml is a String with XML in it
response.getWriter().write(xml);
response.getWriter().flush();
I am using Jetty as the Servlet container.

I believe that Jetty will use chunked responses when it doesn't know the response content length and/or it is using persistent connections. To avoid chunking you either need to set the response content length or to avoid persistent connections by setting "Connection":"close" header on the response.

Try setting the Content-length before writing to the stream. Don't forget to calculate the amount of bytes according to the correct encoding, e.g.:
final byte[] content = xml.getBytes("UTF-8");
response.setContentLength(content.length);
response.setContentType("text/xml"); // or "text/xml; charset=UTF-8"
response.setCharacterEncoding("UTF-8");
final OutputStream out = response.getOutputStream();
out.write(content);

The container will decide itself to use Content-Length or Transfer-Encoding basing on the size of data to be written by using Writer or outputStream. If the size of the data is larger than the HttpServletResponse.getBufferSize(), then the response will be trunked. If not, Content-Length will be used.
In your case, just remove the 2nd flushing code will solve your problem.

How to force browser to download file?

Everything works fine, but only if file is small, about 1MB, when I tried it with bigger files, like 20MB my browser display it, instead of force to download, I tried many headers so far, now my code looks:
PrintWriter out = response.getWriter();
String fileName = request.getParameter("filename");
File f= new File(fileName);
InputStream in = new FileInputStream(f);
BufferedInputStream bin = new BufferedInputStream(in);
DataInputStream din = new DataInputStream(bin);
while(din.available() > 0){
out.print(din.readLine());
out.print("\n");
}
response.setContentType("application/force-download");
response.setContentLength((int)f.length());
response.setHeader("Content-Transfer-Encoding", "binary");
response.setHeader("Content-Disposition","attachment; filename=\"" + "xxx\"");//fileName);
in.close();
bin.close();
din.close();

You are setting the response headers after writing the contents of the file to the output stream. This is quite late in the response lifecycle to be setting headers. The correct sequence of operations should be to set the headers first, and then write the contents of the file to the servlet's outputstream.
Therefore, your method should be written as follows (this won't compile as it is a mere representation):
response.setContentType("application/force-download");
response.setContentLength((int)f.length());
//response.setContentLength(-1);
response.setHeader("Content-Transfer-Encoding", "binary");
response.setHeader("Content-Disposition","attachment; filename=\"" + "xxx\"");//fileName);
...
...
File f= new File(fileName);
InputStream in = new FileInputStream(f);
BufferedInputStream bin = new BufferedInputStream(in);
DataInputStream din = new DataInputStream(bin);
while(din.available() > 0){
out.print(din.readLine());
out.print("\n");
}
The reason for the failure is that it is possible for the actual headers sent by the servlet would be different from what you are intending to send. After all, if the servlet container does not know what headers (which appear before the body in the HTTP response), then it may set appropriate headers to ensure that the response is valid; setting the headers after the file has been written is therefore futile and redundant as the container might have already set the headers. You could confirm this by looking at the network traffic using Wireshark or a HTTP debugging proxy like Fiddler or WebScarab.
You may also refer to the Java EE API documentation for ServletResponse.setContentType to understand this behavior:
Sets the content type of the response being sent to the client, if the response has not been committed yet. The given content type may include a character encoding specification, for example, text/html;charset=UTF-8. The response's character encoding is only set from the given content type if this method is called before getWriter is called.
This method may be called repeatedly to change content type and character encoding. This method has no effect if called after the response has been committed.
...

Set content-type and other headers before you write the file out. For small files the content is buffered, and the browser gets the headers first. For big ones the data come first.

This is from a php script which solves the problem perfectly with every browser I've tested (FF since 3.5, IE8+, Chrome)
header("Content-Disposition: attachment; filename=\"".$fname_local."\"");
header("Content-Type: application/force-download");
header("Content-Transfer-Encoding: binary");
header("Content-Length: ".filesize($fname));
So as far as I can see, you're doing everything correctly. Have you checked your browser settings?

Bad encoding of streamed CSV with Stripes / Tomcat

Actually i'm trying to stream a CSV file. I set the encoding to windows-1252 but it seems it is still streamed as UTF-8 file.
final String encoding = "windows-1252";
exportResolution = new StreamingResolution(builder.getContentType() + ";charset=" + encoding.toLowerCase()) {
#Override
public void stream(HttpServletResponse response) throws Exception {
// Set response headers
response.setHeader("Cache-control", "private, max-age=0");
response.setCharacterEncoding(encoding);
OutputStream os = response.getOutputStream();
writeExportStream(os,builder);
}
}.setFilename(filename);
writeExportStream just streams the content to the outputstream (with pagination and db calls, it takes some time)
It doesn't work in local (jetty plugin) + dev (tomcat) Neither with firefox / chrome
I've not tested but people at work told me that it works better when we don't stream the content but we write the file in one time after having loaded all the objets we want from db.
Anybody know what is happening? Thanks
Btw my headers:
HTTP/1.1 200 OK
Content-Language: fr-FR
Content-Type: text/csv;charset=windows-1252
Content-Disposition: attachment;filename="export_rshop_01-02-11.csv"
Cache-Control: private, max-age=0
Transfer-Encoding: chunked
Server: Jetty(6.1.14)
I want the file to be able to be imported in excel in windows-1252 but i can't, it just open in utf8 while my header is windows-1252

The problem lies in the writeExportStream(os,builder); method. We can't see what encoding operations it is performing, but I'm guessing it is writing UTF-8 data.
The output operation needs to perform two encoding tasks:
Tell the client what encoding the response text is in (via the headers)
Encode the data writen to the client in a matching encoding (e.g. via a writer)
Step 1 is being done correctly. Step 2 is probably the source of the error.
If you use the provided writer, it will encode character data in the appropriate response encoding.
If pre-encoded data is written via the raw byte stream (getOutputStream()), you need to make sure this process uses the same encoding.

How to gzip ajax requests with Struts 2?

How to gzip an ajax response with Struts2? I tried to create a filter but it didn't work. At client-side I'm using jQuery and the ajax response I'm expecting is in json.
This is the code I used on server:
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(out);
gz.write(json.getBytes());
gz.close();
I'm redirecting the response to dummy jsp page defined at struts.xml.
The reason why I want to gzip the data back is because there's a situation where I must send a relatively big sized json back to the client.
Any reference provided will be appreciated.
Thanks.

You shouldn't randomly gzip responses. You can only gzip the response when the client has notified the server that it accepts (understands) gzipped responses. You can do that by determining if the Accept-Encoding request header contains gzip. If it is there, then you can safely wrap the OutputStream of the response in a GZIPOutputStream. You only need to add the Content-Encoding header beforehand with a value of gzip to inform the client what encoding the content is been sent in, so that the client knows that it needs to ungzip it.
In a nutshell:
response.setContentType("application/json");
response.setCharacterEncoding("UTF-8");
OutputStream output = response.getOutputStream();
String acceptEncoding = request.getHeader("Accept-Encoding");
if (acceptEncoding != null && acceptEncoding.contains("gzip")) {
response.setHeader("Content-Encoding", "gzip");
output = new GZIPOutputStream(output);
}
output.write(json.getBytes("UTF-8"));
(note that you would like to set the content type and character encoding as well, this is taken into account in the example)
You could also configure this at appserver level. Since it's unclear which one you're using, here's just a Tomcat-targeted example: check the compression and compressableMimeType attributes of the <Connector> element in /conf/server.xml: HTTP connector reference. This way you can just write to the response without worrying about gzipping it.

If your response is JSON I would recommend using the struts2-json plugin http://struts.apache.org/2.1.8/docs/json-plugin.html and setting the
enableGZIP param to true.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

HTTPClient MultipartEntity seems to be adding garbage text to StringBody parts - java

What you are seeing are likely to be chunk headers used by so the called chunk transfer encoding [1]. See if the message head has a Transfer-Encoding: chunked field. [1] http://en.wikipedia.org/wiki/Chunked_transfer_encoding

Related

How to parse http header to get uploaded file and save it to disk

How do I send an HTTP response without Transfer Encoding: chunked?

How to force browser to download file?

Bad encoding of streamed CSV with Stripes / Tomcat

How to gzip ajax requests with Struts 2?

Categories

Resources