Servlet handling file-upload, Why bigger than the original? - java

Servlet doPost handing file-uploads,
InputStream in = req.getInputStream();
File file = new File("c:/8.dat");
OutputStream out = new FileOutputStream(file);
byte[] buffer = new byte[1024];
int len =0;
while((len=in.read(buffer))!=-1){
out.write(buffer, 0, len);
}
bao.close();
out.close();
in.close();
Dose Request's getInputStream Method take the http header information?
Why is the uploaded file bigger than the original?

Sending files in a HTTP request is usually done using multipart/form-data encoding. This enables the server to distinguish multiple form data parts in a single request (it would otherwise not be possible to send multiple files and/or input fields along in a single request). Each part is separated by a boundary and preceeded by form data headers. The entire request body roughly look like this (taking an example form with 3 plain <input type="text"> fields with names name1, name2 and name3 which have the values value1, value2 and value3 filled):
--SOME_BOUNDARY
content-disposition: form-data;name="name1"
content-type: text/plain;charset=UTF-8
value1
--SOME_BOUNDARY
content-disposition: form-data;name="name2"
content-type: text/plain;charset=UTF-8
value2
--SOME_BOUNDARY
content-disposition: form-data;name="name3"
content-type: text/plain;charset=UTF-8
value3
--SOME_BOUNDARY--
With a single <input type="file"> field with the name file1 the entire request body look like this:
--SOME_BOUNDARY
content-disposition: form-data;name="file1";filename="some.ext"
content-type: application/octet-stream
binary file content here
--SOME_BOUNDARY--
That's thus basically what you're reading by request.getInputStream(). You should be parsing the binary file content out of the request body. It's exactly that boundary and the form data header which makes your uploaded file to seem bigger (and actually also corrupted). If you're on servlet 3.0, you should have used request.getPart() instead to get the sole file content.
InputStream content = request.getPart("file1").getInputStream();
// ...
If you're still on servlet 2.5 or older, then you can use among others Apache Commons FileUpload to parse it.
See also:
How to upload files to server using JSP/Servlet?

Related

Cisco ThreatGrid Submit Sample API... How do I send a Sample File?

I am not able to successfully call the ThreatGrid Submit Sample API using Java. I've used Java to call APIs in the past, so I have experience setting up these calls.
I should be POSTing to https://panacea.threatgrid.com/api/v2/samples and provide parameters in the body of my request.
I also need to write the sample file (the file being evaluated) into the body of the request.
I understand that I'll need to set the 'Content-Type' to 'multipart/form-data;' and provide a Boundary string to separate the parts of the request.
Upon calling the submit API, I am receiving an HTTP 400 Bad Request with the following error return:
{"api_version":2,"id":7162013,"error":{"message":"The parameter sample is required. ","code":400,"errors":[{"code":400,"message":"The parameter sample is required. ","help":"/doc/main/index.html","report":"support#threatgrid.com"}]}}
This is saying that I am not providing the 'sample' parameter. Sample is the file being submitted for threat evaluation. Note that my second part (section of data) that I am sending in the request body was given the name 'sample'.
Here's how I am setting the request headers in my connection:
connection.addRequestProperty("Content-Type", "multipart/form-data; boundary=BOUNDARY");
connection.addRequestProperty("cache-control", "no-cache");
connection.addRequestProperty("accept", "*/*");
connection.addRequestProperty("Content-Length", "164784" );
connection.addRequestProperty("Host", "panacea.threatgrid.com");
Here's an example of what I believe I am writing to the connection's output stream:
--BOUNDARY
Content-Disposition: form-data; name="application/json"
{"private":"true","vm":"win7-x64","email_notification":false}
--BOUNDARY
Content-Disposition: form-data; name="sample"; filename="GracePeriod.pdf"
Content-Type: application/pdf
[Bytes of the Sample File being submitted to ThreatGrid api]
--BOUNDARY--
Code that builds the body of my request:
String boundaryString = "BOUNDARY";
String LINE_FEED = "\r\n";
File sampleFileToUpload = new File(fileUrl);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("--" + boundaryString);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("Content-Disposition: form-data; name=\"application/json\"");
outputStream.writeBytes(LINE_FEED);
// Build the parameters that get placed into the Header
Map<String, Object> headers = new HashMap<String, Object>();
headers.put("private", "true");
headers.put("vm", "win7-x64");
headers.put("email_notification", false);
Gson gson = new Gson();
String body = gson.toJson(headers);
outputStream.writeBytes( body );
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("--" + boundaryString);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("Content-Disposition: form-data; name='sample'; filename='"+sampleFileToUpload.getName()+"'");
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("Content-Type: application/pdf");
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
// Write the contents of the file being submitted...
FileInputStream inputStream = new FileInputStream(sampleFileToUpload);
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] dataArray = new byte[16384];
while ((nRead = inputStream.read(dataArray, 0, dataArray.length)) != -1) {
buffer.write(dataArray, 0, nRead);
}
buffer.flush();
byte[] bytes = buffer.toByteArray();
inputStream.close();
outputStream.write(bytes);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes("--" + boundaryString + "--");
outputStream.writeBytes(LINE_FEED);
outputStream.writeBytes(LINE_FEED);
outputStream.flush();
outputStream.close();
I should be getting the HTTP 200 message and the response message that contains details about my submission.
Hopefully, someone has done this before and can show me the error in my ways.
Thank you!
EDIT: I forgot to mention that I can use the Postman app to set up and call this API successfully. I set the 'private', 'vm', 'email_notification' and 'sample' items in the body of the request as form-data. Postman allows you to set these items as either text or file (there is a dropdown). In the case of 'sample', I set it to file and Postman allows me to 'attach' the file. I used the Postman console to look at what is being sent in the request and I tried to emulate that in my Java code as best as possible. There must be other detail that I need that Postman doesn't show me in the console.
I was finally able (head slightly bloodied) to get the API to respond successfully (HTTP 200 Message). I'll provide the details that got it to work if this can help anyone in the future.
Upon looking at the definition of the API, it states that "The request parameters are to encoded as 'multipart/form-data'". I was sending some of the parameters as JSON data. I decided that I needed to send each parameter as a separate form variable, each separated by a Boundary marker (I tried this once earlier, but I came back to that same idea).
After doing that I started paying attention to the detail of the spaces (CRLFs) after each item in the Request body. The API is very sensitive to how the data is formatted in the body. I found that it requires a CRLF before the actual value of the form data that you are sending.
Here's an example of the request body as I am sending it:
--BOUNDARY
Content-Disposition: form-data; name="private"
[CRLF (a space)]
true
--BOUNDARY
Content-Disposition: form-data; name="vm"
[CRLF (a space)]
win7-x64
--BOUNDARY
Content-Disposition: form-data; name="email_notification"
[CRLF (a space)]
false
--BOUNDARY
Content-Disposition: form-data; name="sample";
filename="CourseCompletionCertificate.pdf"
Content-Type: application/pdf
[CRLF (a space)]
[data stream of the Sample file in a byte array...]
--BOUNDARY--
I found examples of multipart/form-data and I noticed the use of CRLFs in the data and I did my best to copy how that data was being sent. It was after that detail that the API responded with success.

How to parse http header to get uploaded file and save it to disk

I am developing a http web server in java using socket which gets post header InputStream and then I processed the header with some String split by the header 'boundary' and '\r\n' and got all Headers, Cookies in HashMap(s) and got the contents of the file in a String and saved that String to a file on the server. It works fine when I upload text file or java source file to the server but in case of doc, pdf and image it shows corrupted file and corrupted image.
PrintWriter out;
try {
out = new PrintWriter(new OutputStreamWriter(
new FileOutputStream(UploadPath + "\\" + FileName)));
out.print(FileData);
out.close();
} catch (Exception e) {
}
Above code will save contents of 'FileData' at 'UploadPath' with 'FileName'.
In case of jpg or doc file String FileData is having binary contents of the uploaded file which saved by the above code and also I checked both files for their size in bytes and both were having equal size in byte and I also matched contents of the actual file and content FileData String by debugging the application.
I also checked actual uploaded image file and the FileData String and both matches byte by byte but the image uploaded is totally corrupted.
After searching on internet for this complete day I am not able to find the solution for this. Please help.
I do not want to use apache commons which was suggested on most of the pages.
If you want to see more codes then I will post them.
As you are dealing with binary data, you should use byte and OutputStream instead of String and Writer: If you put some bytes in a string, they are decoded
So if you have found the boundaries of the binary data in your request (represented by a byte array), copy the content byte-wise directly to an output stream.
This only works, if your request is already completely in memory. Regarding file upload, this is not always possible, because you can run out of memory, if you have large files.
So the best way to implement a file upload is to read only the next byte from the stream: This is the difference between splitting and parsing. Actually you need a real parser for multipart form data. Now things get complex, and this is the reason why everybody uses commons-fileupload: It's not that easy to detect the boundaries, if your "look ahead" is just some bytes.
I had to implement a clean-room implementation for legal reasons. If that is not your situation, look in the the source of commons-fileupload. And have a look at the RFC
Since you use Java 7, this is quite easy: use Files.copy().
Also, DO NOT store file contents as Strings, those will only ever be valid for text files. Use classical InputStream/OutputStreams to read/write.
You could read it using an array of bytes like the following
InputStream is = ...
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
int nRead;
byte[] data = new byte[16384];
while ((nRead = is.read(data, 0, data.length)) != -1) {
buffer.write(data, 0, nRead);
}
buffer.flush();
return buffer.toByteArray();
I solved my problem like this,
while (inputRequest.available()>0) {
try {
int t = inputRequest.read();
ch = (char) t;
//here i checked each byte data
} catch (IOException e) {
}
}
Problem was that the input stream was having http header fields along with the file content located anywhere in the stream, so I firstly stored the bytes in a temp String until i get '\r' and '\n' in the stream. In this way I got the boundary for multipart/form-data HTTP header and then I compared the temp String until I found the boundary and other known header contents and then I sent the input-stream to file output-stream. But in some cases header may contain other contents after file content so and definitely it will have a ending boundary so I was continuously keeping track of each byte that I have read and then I sent each byte individually to the file output-stream. Here is the sample http header-
Host: localhost
User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:21.0) Gecko/20100101 Firefox/21.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
DNT: 1
Referer: http://localhost/index.html
Connection: keep-alive
Content-Type: multipart/form-data; boundary=---------------------------274761981030199
Content-Length: 1405
-----------------------------274761981030199
Content-Disposition: form-data; name="name1"
pppppp
-----------------------------274761981030199
Content-Disposition: form-data; name="name2"
rrrrrrrrr
-----------------------------274761981030199
Content-Disposition: form-data; name="name3"
eeeeeeee
-----------------------------274761981030199
Content-Disposition: form-data; name="name4"
2
-----------------------------274761981030199
Content-Disposition: form-data; name="name5"; filename="CgiPost.java"
Content-Type: text/x-java-source
import java.io.*;
// This appears in Core Web Programming from
// Prentice Hall Publishers, and may be freely used
// or adapted. 1997 Marty Hall, hall#apl.jhu.edu.
public class CgiPost extends CgiGet
{
public static void main(String[] args)
{
try
{
DataInputStream in
= new DataInputStream(System.in);
String[] data = { in.readLine() };
CgiPost app = new CgiPost("CgiPost", data, "POST");
app.printFile();
} catch(IOException ioe) {
System.out.println
("IOException reading POST data: " + ioe);
}
}
public CgiPost(String name, String[] args,
String type) {
super(name, args, type);
}
}
-----------------------------274761981030199
Content-Disposition: form-data; name="name6"
pppppppppp
-----------------------------274761981030199--
NOTE: In some cases there are chances that your application code reaches to inputRequest.available() but the browser haven't sent the request yet, in this case inputRequest.available() will always return 0 and your while loop will exit immediately. To avoid this first read one byte using inputRequest.read() and then execute code because you can guess the first byte from others in case of http header.
If you are using some count int then use long instead of int, because stream stops in some cases where int variable reaches its limit.
Try to transfer the int value returned from int t = inputRequest.read() to fileoutputstream.write(t).
inputRequest.available() keeps decreasing as you are reading byte form inputstream, it returns number of bytes available in the stream.
In this way you can upload files of large size without any corruption in it.
Leave your comment if anyone needs more details about this.

REST - HTTP Post Multipart with JSON

I need to receive an HTTP Post Multipart which contains only 2 parameters:
A JSON string
A binary file
Which is the correct way to set the body?
I'm going to test the HTTP call using Chrome REST console, so I'm wondering if the correct solution is to set a "label" key for the JSON parameter and the binary file.
On the server side I'm using Resteasy 2.x, and I'm going to read the Multipart body like this:
#POST
#Consumes("multipart/form-data")
public String postWithPhoto(MultipartFormDataInput multiPart) {
Map <String, List<InputPart>> params = multiPart.getFormDataMap();
String myJson = params.get("myJsonName").get(0).getBodyAsString();
InputPart imagePart = params.get("photo").get(0);
//do whatever I need to do with my json and my photo
}
Is this the way to go?
Is it correct to retrieve my JSON string using the key "myJsonName" that identify that particular content-disposition?
Are there any other way to receive these 2 content in one HTTP multipart request?
If I understand you correctly, you want to compose a multipart request manually from an HTTP/REST console. The multipart format is simple; a brief introduction can be found in the HTML 4.01 spec. You need to come up with a boundary, which is a string not found in the content, let’s say HereGoes. You set request header Content-Type: multipart/form-data; boundary=HereGoes. Then this should be a valid request body:
--HereGoes
Content-Disposition: form-data; name="myJsonString"
Content-Type: application/json
{"foo": "bar"}
--HereGoes
Content-Disposition: form-data; name="photo"
Content-Type: image/jpeg
Content-Transfer-Encoding: base64
<...JPEG content in base64...>
--HereGoes--

HTTPClient MultipartEntity seems to be adding garbage text to StringBody parts

I am trying to use Apache Commons's HttpClient to send a multipart POST request with a binary file and a couple of string parameters.
However, it seems that somewhere along the line, some garbage text is making its way into my string parameters. For instance, as confirmed through the debugger, the sizeBody variable here is indeed holding the value "100":
StringBody sizeBody = new StringBody("100", Charset.forName("UTF-8"));
However, if I listen to the request with Wireshark, I see this:
--o2mm51iGsng9w0Pb-Guvf8XDwXgG7BPcupLnaa
Content-Disposition: form-data; name="x"
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
100
a5
--o2mm51iGsng9w0Pb-Guvf8XDwXgG7BPcupLnaa
Note the a5 after the 100.
What could be causing this? Where should I look?
What you are seeing are likely to be chunk headers used by so the called chunk transfer encoding [1]. See if the message head has a Transfer-Encoding: chunked field.
[1] http://en.wikipedia.org/wiki/Chunked_transfer_encoding
I had this same issue testing my POSTs with NanoHTTPD receiving them. It's indeed that HttpClient is using chunked transfer encoding, which NanoHTTPD doesn't support. It did that in my case because the binary file was supplied via an InputStreamBody, and since that cannot determine its own content length (it just sends back -1), the client uses chunked encoding.
I switched to using a ByteArrayBody for the file contents, and since that and StringBody can supply content lengths, the requests now do not use chunked encoding.
ByteArrayOutputStream baos = new ByteArrayOutputStream();
IOUtils.copy (fileInputStream, baos); // from Apache Commons IO, or roll your own
ContentBody filePart = new ByteArrayBody (baos.toByteArray(), fileName);
Of course, if your file is huge, loading the whole thing into a byte array as above could cause memory problems.

How to gzip ajax requests with Struts 2?

How to gzip an ajax response with Struts2? I tried to create a filter but it didn't work. At client-side I'm using jQuery and the ajax response I'm expecting is in json.
This is the code I used on server:
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gz = new GZIPOutputStream(out);
gz.write(json.getBytes());
gz.close();
I'm redirecting the response to dummy jsp page defined at struts.xml.
The reason why I want to gzip the data back is because there's a situation where I must send a relatively big sized json back to the client.
Any reference provided will be appreciated.
Thanks.
You shouldn't randomly gzip responses. You can only gzip the response when the client has notified the server that it accepts (understands) gzipped responses. You can do that by determining if the Accept-Encoding request header contains gzip. If it is there, then you can safely wrap the OutputStream of the response in a GZIPOutputStream. You only need to add the Content-Encoding header beforehand with a value of gzip to inform the client what encoding the content is been sent in, so that the client knows that it needs to ungzip it.
In a nutshell:
response.setContentType("application/json");
response.setCharacterEncoding("UTF-8");
OutputStream output = response.getOutputStream();
String acceptEncoding = request.getHeader("Accept-Encoding");
if (acceptEncoding != null && acceptEncoding.contains("gzip")) {
response.setHeader("Content-Encoding", "gzip");
output = new GZIPOutputStream(output);
}
output.write(json.getBytes("UTF-8"));
(note that you would like to set the content type and character encoding as well, this is taken into account in the example)
You could also configure this at appserver level. Since it's unclear which one you're using, here's just a Tomcat-targeted example: check the compression and compressableMimeType attributes of the <Connector> element in /conf/server.xml: HTTP connector reference. This way you can just write to the response without worrying about gzipping it.
If your response is JSON I would recommend using the struts2-json plugin http://struts.apache.org/2.1.8/docs/json-plugin.html and setting the
enableGZIP param to true.

Categories