How to POST form-like image data from java - java

Hey, I've tried researching how to POST data from java, and nothing seems to do what I want to do. Basically, theres a form for uploading an image to a server, and what I want to do is post an image to the same server - but from java. It also needs to have the right parameter name (whatever the form input's name is). I would also want to return the response from this method.
It baffles me as to why this is so difficult to find, since this seems like something so basic.
EDIT ---- Added code
Based on some of the stuff BalusC showed me, I created the following method. It still doesn't work, but its the most successful thing I've gotten yet (seems to post something to the other server, and returns some kind of response - I'm not sure I got the response correctly though):
EDIT2 ---- added to code based on BalusC's feedback
EDIT3 ---- posting code that pretty much works, but seems to have an issue:
....
FileItemFactory factory = new DiskFileItemFactory();
// Create a new file upload handler
ServletFileUpload upload = new ServletFileUpload(factory);
// Parse the request
List<FileItem> items = upload.parseRequest(req);
// Process the uploaded items
for(FileItem item : items) {
if( ! item.isFormField()) {
String fieldName = item.getFieldName();
String fileName = item.getName();
String itemContentType = item.getContentType();
boolean isInMemory = item.isInMemory();
long sizeInBytes = item.getSize();
// POST the file to the cdn uploader
postDataRequestToUrl("<the host im uploading too>", "uploadedfile", fileName, item.get());
} else {
throw new RuntimeException("Not expecting any form fields");
}
}
....
// Post a request to specified URL. Get response as a string.
public static void postDataRequestToUrl(String url, String paramName, String fileName, byte[] requestFileData) throws IOException {
URLConnection connection=null;
try{
String boundary = Long.toHexString(System.currentTimeMillis()); // Just generate some unique random value.
String charset = "utf-8";
connection = new URL(url).openConnection();
connection.setDoOutput(true);
connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + boundary);
PrintWriter writer = null;
OutputStream output = null;
try {
output = connection.getOutputStream();
writer = new PrintWriter(new OutputStreamWriter(output, charset), true); // true = autoFlush, important!
// Send binary file.
writer.println("--" + boundary);
writer.println("Content-Disposition: form-data; name=\""+paramName+"\"; filename=\"" + fileName + "\"");
writer.println("Content-Type: " + URLConnection.guessContentTypeFromName(fileName));
writer.println("Content-Transfer-Encoding: binary");
writer.println();
output.write(requestFileData, 0, requestFileData.length);
output.flush(); // Important! Output cannot be closed. Close of writer will close output as well.
writer.println(); // Important! Indicates end of binary boundary.
// End of multipart/form-data.
writer.println("--" + boundary + "--");
} finally {
if (writer != null) writer.close();
if (output != null) output.close();
}
//* screw the response
int status = ((HttpURLConnection) connection).getResponseCode();
logger.info("Status: "+status);
for (Map.Entry<String, List<String>> header : connection.getHeaderFields().entrySet()) {
logger.info(header.getKey() + "=" + header.getValue());
}
} catch(Throwable e) {
logger.info("Problem",e);
}
}
I can see this code uploading the file, but only after I shutdown the tomcat. This leads me to believe that I'm leaving some sort of connection open.
This worked!

The core API you'd like to use is java.net.URLConnection. This is however pretty low level and verbose. You'd like to learn about the HTTP specifics in detail and take them into account (headers, etcetera). You can find here a related question with lot of examples.
A more convenient HTTP client API is the Apache Commons HttpComponents Client. You can find an example here.
Update: as per your update: you should read the response as a character stream, not as a binary stream and attempt to cast a byte to a char. This ain't going to work. Head to the Gathering HTTP response information part in the linked question with examples. Here's how it should look like:
BufferedReader reader = null;
StringBuilder builder = new StringBuilder();
try {
reader = new BufferedReader(new InputStreamReader(response, charset));
for (String line; (line = reader.readLine()) != null;) {
builder.append(line);
}
} finally {
if (reader != null) try { reader.close(); } catch (IOException logOrIgnore) {}
}
return builder.toString();
Update 2: as per your second update. Seeing the way how you continue to attampt reading/writing streams, I think it's high time to learn the basic Java IO :) Well, this part is also answered in the linked question. You would like to use Apache Commons FileUpload to parse a multipart/form-data request in a servlet. How to use it is also described/linked in the linked question. Look at the bottom of the Uploading files chapter. By the way, the content length header would return zero since you are not explicitly setting it (and also cannot do without buffering the entire request in memory).
Update 3:
I can see this code uploading the file, but only after I shutdown the tomcat. This leads me to believe that I'm leaving some sort of connection open.
You need to close the OutputStream with which you wrote the file to the disk. Once again, read the above linked basic Java IO tutorial.

What have you tried? If you google for Http Post Java, dozens of pages appear - what's wrong with them? This one, http://www.devx.com/Java/Article/17679/1954 for example, appears pretty decent.

Related

What is wrong with my HttpURLConnection request?

I am trying to call a REST API with a PUT Request but I am receiving a 400 Error Code (Bad Request). Can someone spot what I may be doing wrong?
I have successfully called this API with a REST Client, here are the headers and body used:
https://imgur.com/dZVyawn
https://imgur.com/lMtn2JB
String credentials = Base64.getEncoder().encodeToString(("wcadmin:wcadmin").getBytes());
URL url = new URL(getURL());
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
connection.setRequestMethod("PUT");
connection.setDoOutput(true);
connection.setDoInput(true);
//Set Headers
String fileUrl = "c:\\0000000050.xml";
File fileToUpload = new File(fileUrl);
long length = fileToUpload.length();
String FORM_DATA_BOUNDARY = "------FormBoundary" + System.currentTimeMillis();
connection.setRequestProperty("csrf_nonce", getNonceValue());
connection.setRequestProperty("Accept", "application/xml");
connection.setRequestProperty("Authorization", "Basic " + credentials);
connection.setRequestProperty("Content-Type", "multipart/form-data; boundary=" + FORM_DATA_BOUNDARY);
connection.setRequestProperty("Content-Length", Long.toString(length));
//Setup Request Body Writer
OutputStream requestBodyOutputStream = connection.getOutputStream();
BufferedWriter requestBodyWriter = new BufferedWriter(new OutputStreamWriter(requestBodyOutputStream));
//Write Body
requestBodyWriter.write("\r\n\r\n");
requestBodyWriter.write(FORM_DATA_BOUNDARY);
requestBodyWriter.write("\r\n");
requestBodyWriter.write("Content-Disposition: form-data; name=\"file\"; filename=\"" + fileUrl + "\"");
requestBodyWriter.write("\r\n");
requestBodyWriter.write("Content-Type: text/xml");
requestBodyWriter.write("\r\n\r\n");
requestBodyWriter.flush();
FileInputStream uploadFileStream = new FileInputStream(fileToUpload);
int bytesRead;
byte[] dataBuffer = new byte[1024];
while ((bytesRead = uploadFileStream.read(dataBuffer)) != -1) {
requestBodyOutputStream.write(dataBuffer, 0, bytesRead);
}
requestBodyOutputStream.flush();
requestBodyWriter.write("\r\n");
requestBodyWriter.write(FORM_DATA_BOUNDARY);
requestBodyWriter.flush();
//Close the streams
requestBodyOutputStream.close();
requestBodyWriter.close();
uploadFileStream.close();
//Read Response
String inputLine;
StringBuffer content = new StringBuffer();
InputStream inputStream = connection.getInputStream();
if (inputStream != null) {
BufferedReader responseReader = new BufferedReader(new InputStreamReader(inputStream));
if (responseReader != null) {
while ((inputLine = responseReader.readLine()) != null) {
content.append(inputLine);
}
responseReader.close();
}
}
connection.disconnect();
Error 400 Bad Request response received
First and most important: you cannot write to both an OutputStream, and a OutputStreamWriter which wraps that same OutputStream. They will conflict with each other.
Do not use OutputStreamWriter at all; instead, convert text to bytes yourself:
OutputStream requestBodyOutputStream = connection.getOutputStream();
requestBodyOutputStream.write("\r\n\r\n".getBytes(StandardCharsets.UTF_8));
requestBodyOutputStream.write(FORM_DATA_BOUNDARY.getBytes(StandardCharsets.UTF_8));
// etc.
Second, you are converting between bytes and strings using the system’s default charset, which means exactly what gets written depends on the system where the code is running. Don’t call String.getBytes without specifying an explicit Charset. Usually getBytes(StandardCharsets.UTF_8) is what you want.
Similarly, you need to pass a Charset to your InputStreamReader creation (although this isn’t the cause of your problem, since you aren’t getting a valid response at the moment). Don’t assume a charset; use the charset MIME type parameter from the response’s Content-Type header. If you’re using a version of Java older than 11, you can parse the Content-Type value with the javax.activation.MimeType class, but be aware that the javax.activation package has been removed from Java SE as of Java 11. For Java 11 and later, Java Activation can be downloaded as a stand-alone library. Another option is to use the JavaMail library, specifically its ContentType class, for parsing.
Third, the Content-Length header inside the body part (between the boundaries) should use the file’s length as a Content-Length. The Content-Length of the entire request body must be the length of everything you’ve written: the boundaries, the body part headers, and the file content.
The good news is, I think (though I’m not positive) that URLConnection will set the request’s overall Content-Length automatically, based on the bytes you write, so you probably don’t need to compute the length yourself; you can simply refrain from setting "Content-Length" at all.
When you do pass a correct request, you will find that you are dropping the newlines in the response. If the response is supposed to be human-readable text, those newlines are likely to matter. If you’re using Java 10 or later, you can use Reader.transferTo with a StringWriter:
StringWriter responseBody = new StringWriter();
responseReader.transferTo(responseBody);
String content = responseBody.toString();
If you’re using a version of Java older than 10:
new BufferedReader cannot return null, so checking for null is pointless. In Java, the new operator always, no matter what, returns a new object (unless an exception is thrown, in which case new doesn’t return at all).
You should use StringBuilder, not StringBuffer. They are identical except that StringBuffer is an older class that provides thread safety for every method, creating unnecessary overhead for nearly all use cases.
You are copying your file into the request without any buffering, which is going to be slow and inefficient. Consider using Files.copy(fileToUpload.toPath(), requestBodyOutputStream) instead.

HttpURLConnection FileNotFoundException on large request properties

I'm using HttpURLConnection to send JSON data from an Android Application to my Tomcat Server.
The POST works fine with small sized JSONs. On bigger data sets it fails with a FileNotFoundException.
What can it be?
Here's the code:
try {
URL url = new URL(urlIn);
strOut = "";
huc = (HttpURLConnection) url.openConnection();
huc.setRequestProperty("Connection", "Close");
huc.setRequestMethod("POST");
huc.setRequestProperty("User", userId);
huc.setRequestProperty("Action", action);
huc.setRequestProperty("JSON", jsonData);
huc.setConnectTimeout(10000);
in = new BufferedReader(new InputStreamReader(huc.getInputStream()));
while ((inputLine = in.readLine()) != null){
if (strOut.equalsIgnoreCase("")){
strOut = inputLine;
} else {
strOut = strOut + inputLine;
}
}
} catch (Exception e) {
strOut = "";
e.printStackTrace();
}
When jsonData get to a certain size (arround 10000 chars), the POST fails with the error mentioned. The content of the JSON does not have any special character.
Thanks in advance.
Best regards, Federico.
HTTPUrlConnection throws a FileNotFoundException if the server responds with a 404 response code, so the reason why this happens seems to be located on the server side rather than the client side. Most likely the server is configured to accept request headers up to a particular length and will return an error if that size is exceeded. A short Google-search brought up a couple of results, sizes of 16 KB are mentioned but shorter values are also reasonable.
As I mentioned in my comment to your question, you should change your process to receive the JSON-data (and the other values for User and Action as well BTW) as part of the request body, e.g. as url-encoded query string or as multipart formdata. Both ways are supported by HTTP client libraries you can use or are easily built manually.
After lots of reading and trying I gave up with configuring Tomcat to accept larger headers.
So I convinced the team in charge of the Tomcat app to make a servlet that is able to receive this data in the body, just as Lothar suggested.
Thanks!

Resumable upload from Java client to Grails web application?

After almost 2 workdays of Googling and trying several different possibilities I found throughout the web, I'm asking this question here, hoping that I might finally get an answer.
First of all, here's what I want to do:
I'm developing a client and a server application with the purpose of exchanging a lot of large files between multiple clients on a single server. The client is developed in pure Java (JDK 1.6), while the web application is done in Grails (2.0.0).
As the purpose of the client is to allow users to exchange a lot of large files (usually about 2GB each), I have to implement it in a way, so that the uploads are resumable, i.e. the users are able to stop and resume uploads at any time.
Here's what I did so far:
I actually managed to do what I wanted to do and stream large files to the server while still being able to pause and resume uploads using raw sockets. I would send a regular request to the server (using Apache's HttpClient library) to get the server to send me a port that was free for me to use, then open a ServerSocket on the server and connect to that particular socket from the client.
Here's the problem with that:
Actually, there are at least two problems with that:
I open those ports myself, so I have to manage open and used ports myself. This is quite error-prone.
I actually circumvent Grails' ability to manage a huge amount of (concurrent) connections.
Finally, here's what I'm supposed to do now and the problem:
As the problems I mentioned above are unacceptable, I am now supposed to use Java's URLConnection/HttpURLConnection classes, while still sticking to Grails.
Connecting to the server and sending simple requests is no problem at all, everything worked fine. The problems started when I tried to use the streams (the connection's OutputStream in the client and the request's InputStream in the server). Opening the client's OutputStream and writing data to it is as easy as it gets. But reading from the request's InputStream seems impossible to me, as that stream is always empty, as it seems.
Example Code
Here's an example of the server side (Groovy controller):
def test() {
InputStream inStream = request.inputStream
if(inStream != null) {
int read = 0;
byte[] buffer = new byte[4096];
long total = 0;
println "Start reading"
while((read = inStream.read(buffer)) != -1) {
println "Read " + read + " bytes from input stream buffer" //<-- this is NEVER called
}
println "Reading finished"
println "Read a total of " + total + " bytes" // <-- 'total' will always be 0 (zero)
} else {
println "Input Stream is null" // <-- This is NEVER called
}
}
This is what I did on the client side (Java class):
public void connect() {
final URL url = new URL("myserveraddress");
final byte[] message = "someMessage".getBytes(); // Any byte[] - will be a file one day
HttpURLConnection connection = url.openConnection();
connection.setRequestMethod("GET"); // other methods - same result
// Write message
DataOutputStream out = new DataOutputStream(connection.getOutputStream());
out.writeBytes(message);
out.flush();
out.close();
// Actually connect
connection.connect(); // is this placed correctly?
// Get response
BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = null;
while((line = in.readLine()) != null) {
System.out.println(line); // Prints the whole server response as expected
}
in.close();
}
As I mentioned, the problem is that request.inputStream always yields an empty InputStream, so I am never able to read anything from it (of course). But as that is exactly what I'm trying to do (so I can stream the file to be uploaded to the server, read from the InputStream and save it to a file), this is rather disappointing.
I tried different HTTP methods, different data payloads, and also rearranged the code over and over again, but did not seem to be able to solve the problem.
What I hope to find
I hope to find a solution to my problem, of course. Anything is highly appreciated: hints, code snippets, library suggestions and so on. Maybe I'm even having it all wrong and need to go in a totally different direction.
So, how can I implement resumable file uploads for rather large (binary) files from a Java client to a Grails web application without manually opening ports on the server side?
HTTP GET method have special headers for range retrieval: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.35 It's used by most downloaders to do resumable download from server.
As I understand, there are no standard practice for using this headers for POST/PUT request, but it's up to you, right? You can make pretty standard Grails controller, that will accept standard http upload, with header like Range: bytes=500-999. And controller should put this 500 uploaded bytes from client into file, starting at position 500
At this case you don't need to open any socket, and make own protocols, etc.
P.S. 500 bytes is just a example, probably you're using much bigger parts.
Client Side Java Programming:
public class NonFormFileUploader {
static final String UPLOAD_URL= "http://localhost:8080/v2/mobileApp/fileUploadForEOL";
static final int BUFFER_SIZE = 4096;
public static void main(String[] args) throws IOException {
// takes file path from first program's argument
String filePath = "G:/study/GettingStartedwithGrailsFinalInfoQ.pdf";
File uploadFile = new File(filePath);
System.out.println("File to upload: " + filePath);
// creates a HTTP connection
URL url = new URL(UPLOAD_URL);
HttpURLConnection httpConn = (HttpURLConnection) url.openConnection();
httpConn.setDoOutput(true);
httpConn.setRequestMethod("POST");
// sets file name as a HTTP header
httpConn.setRequestProperty("fileName", uploadFile.getName());
// opens output stream of the HTTP connection for writing data
OutputStream outputStream = httpConn.getOutputStream();
// Opens input stream of the file for reading data
FileInputStream inputStream = new FileInputStream(uploadFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = -1;
while ((bytesRead = inputStream.read(buffer)) != -1) {
System.out.println("bytesRead:"+bytesRead);
outputStream.write(buffer, 0, bytesRead);
outputStream.flush();
}
System.out.println("Data was written.");
outputStream.flush();
outputStream.close();
inputStream.close();
int responseCode = httpConn.getResponseCode();
if (responseCode == HttpURLConnection.HTTP_OK) {
// reads server's response
BufferedReader reader = new BufferedReader(new InputStreamReader(
httpConn.getInputStream()));
String response = reader.readLine();
System.out.println("Server's response: " + response);
} else {
System.out.println("Server returned non-OK code: " + responseCode);
}
}
}
Server Side Grails Programme:
Inside the controller:
def fileUploadForEOL(){
def result
try{
result = mobileAppService.fileUploadForEOL(request);
}catch(Exception e){
log.error "Exception in fileUploadForEOL service",e
}
render result as JSON
}
Inside the Service Class:
def fileUploadForEOL(request){
def status = false;
int code = 500
def map = [:]
try{
String fileName = request.getHeader("fileName");
File saveFile = new File(SAVE_DIR + fileName);
System.out.println("===== Begin headers =====");
Enumeration<String> names = request.getHeaderNames();
while (names.hasMoreElements()) {
String headerName = names.nextElement();
System.out.println(headerName + " = " + request.getHeader(headerName));
}
System.out.println("===== End headers =====\n");
// opens input stream of the request for reading data
InputStream inputStream = request.getInputStream();
// opens an output stream for writing file
FileOutputStream outputStream = new FileOutputStream(saveFile);
byte[] buffer = new byte[BUFFER_SIZE];
int bytesRead = inputStream.read(buffer);
long count = bytesRead
while(bytesRead != -1) {
outputStream.write(buffer, 0, bytesRead);
bytesRead = inputStream.read(buffer);
count += bytesRead
}
println "count:"+count
System.out.println("Data received.");
outputStream.close();
inputStream.close();
System.out.println("File written to: " + saveFile.getAbsolutePath());
code = 200
}catch(Exception e){
mLogger.log(java.util.logging.Level.SEVERE,"Exception in fileUploadForEOL",e);
}finally{
map <<["code":code]
}
return map
}
I have tried with above code it is worked for me(only for file size 3 to 4MB, but for small size files some bytes of code missing or not even coming but in request header content-length is coming, not sure why it is happening.)

Incrementally handling twitter's streaming api using apache httpclient?

I am using Apache HTTPClient 4 to connect to twitter's streaming api with default level access. It works perfectly well in the beginning but after a few minutes of retrieving data it bails out with this error:
2012-03-28 16:17:00,040 DEBUG org.apache.http.impl.conn.SingleClientConnManager: Get connection for route HttpRoute[{tls}->http://myproxy:80->https://stream.twitter.com:443]
2012-03-28 16:17:00,040 WARN com.cloudera.flume.core.connector.DirectDriver: Exception in source: TestTwitterSource
java.lang.IllegalStateException: Invalid use of SingleClientConnManager: connection still allocated.
at org.apache.http.impl.conn.SingleClientConnManager.getConnection(SingleClientConnManager.java:216)
Make sure to release the connection before allocating another one.
at org.apache.http.impl.conn.SingleClientConnManager$1.getConnection(SingleClientConnManager.java:190)
I understand why I am facing this issue. I am trying to use this HttpClient in a flume cluster as a flume source. The code looks like this:
public Event next() throws IOException, InterruptedException {
try {
HttpHost target = new HttpHost("stream.twitter.com", 443, "https");
new BasicHttpContext();
HttpPost httpPost = new HttpPost("/1/statuses/filter.json");
StringEntity postEntity = new StringEntity("track=birthday",
"UTF-8");
postEntity.setContentType("application/x-www-form-urlencoded");
httpPost.setEntity(postEntity);
HttpResponse response = httpClient.execute(target, httpPost,
new BasicHttpContext());
BufferedReader reader = new BufferedReader(new InputStreamReader(
response.getEntity().getContent()));
String line = null;
StringBuffer buffer = new StringBuffer();
while ((line = reader.readLine()) != null) {
buffer.append(line);
if(buffer.length()>30000) break;
}
return new EventImpl(buffer.toString().getBytes());
} catch (IOException ie) {
throw ie;
}
}
I am trying to buffer 30,000 characters in the response stream to a StringBuffer and then return this as the data received. I am obviously not closing the connection - but I do not want to close it just yet I guess. Twitter's dev guide talks about this here It reads:
Some HTTP client libraries only return the response body after the
connection has been closed by the server. These clients will not work
for accessing the Streaming API. You must use an HTTP client that will
return response data incrementally. Most robust HTTP client libraries
will provide this functionality. The Apache HttpClient will handle
this use case, for example.
It clearly tells you that HttpClient will return response data incrementally. I've gone through the examples and tutorials, but I haven't found anything that comes close to doing this. If you guys have used a httpclient (if not apache) and read the streaming api of twitter incrementally, please let me know how you achieved this feat. Those who haven't, please feel free to contribute to answers. TIA.
UPDATE
I tried doing this: 1) I moved obtaining stream handle to the open method of the flume source. 2) Using a simple inpustream and reading data into a bytebuffer. So here is what the method body looks like now:
byte[] buffer = new byte[30000];
while (true) {
int count = instream.read(buffer);
if (count == -1)
continue;
else
break;
}
return new EventImpl(buffer);
This works to an extent - I get tweets, they are nicely being written to a destination. The problem is with the instream.read(buffer) return value. Even when there is no data on the stream, and the buffer has default \u0000 bytes and 30,000 of them, so this value is getting written to the destination. So the destination file looks like this.. " tweets..tweets..tweeets.. \u0000\u0000\u0000\u0000\u0000\u0000\u0000...tweets..tweets... ". I understand the count won't return a -1 coz this is a never ending stream, so how do I figure out if the buffer has new content from the read command?
The problem is that your code is leaking connections. Please make sure that no matter what you either close the content stream or abort the request.
InputStream instream = response.getEntity().getContent();
try {
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream));
String line = null;
StringBuffer buffer = new StringBuffer();
while ((line = reader.readLine()) != null) {
buffer.append(line);
if (buffer.length()>30000) {
httpPost.abort();
// connection will not be re-used
break;
}
}
return new EventImpl(buffer.toString().getBytes());
} finally {
// if request is not aborted the connection can be re-used
try {
instream.close();
} catch (IOException ex) {
// log or ignore
}
}
It turns out that it was a flume issue. Flume is optimized to transfer events of size 32kb. Anything beyond 32kb, Flume bails out. (The workaround is to tune event size to be greater than 32KB). So, I've changed my code to buffer 20,000 characters at least. It kind of works, but it is not fool proof. This can still fail if the buffer length exceeds 32kb, however, it hasn't failed so far in an hour of testing - I believe it has to do with the fact that Twitter doesn't send a lot of data on its public stream.
while ((line = reader.readLine()) != null) {
buffer.append(line);
if(buffer.length()>20000) break;
}

HTTP GET request not working in java when HTTP is 1.1?

so i made a little code that can download 4chan pages. i get the raw HTML page and parse it for my need. the code below was working fine but it suddenly stopped working. when i run it the server does not accept my request it seems its waiting for something more. however i know that HTTP request is as below
GET /ck HTTP/1.1
Host: boards.4chan.org
(extra new line)
if i change this format in anyway i revive "400 bad request" status code. but if i change HTTP/1.1 to 1.0 the server responses in "200 ok" status and i get the whole page. so this makes me thing the error is in the host line since that became mandatory in HTTP/1.1. but still i cannot figure out what exactly need to be changed.
the calling function simply this, to get one whole board
downloadHTMLThread( "ck", -1);
or for a specific thread u just change -1 to that number. for example like for the link below will have like below.
//http://boards.4chan.org/ck/res/3507158
//url.getDefaultPort() is 80
//url.getHost() is boards.4chan.org
//url.getFile() is /ck/res/3507158
downloadHTMLThread( "ck", 3507158);
any advise would be appreciated, thanks
public static final String BOARDS = "boards.4chan.org";
public static final String IMAGES = "images.4chan.org";
public static final String THUMBS = "thumbs.4chan.org";
public static final String RES = "/res/";
public static final String HTTP = "http://";
public static final String SLASH = "/";
public String downloadHTMLThread( String board, int thread) {
BufferedReader reader = null;
PrintWriter out = null;
Socket socket = null;
String str = null;
StringBuilder input = new StringBuilder();
try {
URL url = new URL(HTTP+BOARDS+SLASH+board+(thread==-1?SLASH:RES+thread));
socket = new Socket( url.getHost(), url.getDefaultPort());
reader = new BufferedReader( new InputStreamReader( socket.getInputStream()));
out = new PrintWriter(socket.getOutputStream(), true);
out.println( "GET " +url.getFile()+ " HTTP/1.1");
out.println( "HOST: " + url.getHost());
out.println();
long start = System.currentTimeMillis();
while ((str = reader.readLine()) != null) {
input.append( str).append("\r\n");
}
long end = System.currentTimeMillis();
System.out.println( input);
System.out.println( "\nTime: " +(end-start)+ " milliseconds");
} catch (Exception ex) {
ex.printStackTrace();
input = null;
} finally {
if( reader!=null){
try {
reader.close();
} catch (IOException ioe) {
// nothing to see here
}
}
if( socket!=null){
try {
socket.close();
} catch (IOException ioe) {
// nothing to see here
}
}
if( out!=null){
out.close();
}
}
return input==null? null: input.toString();
}
Try using Apache HttpClient instead of rolling your own:
static String getUriContentsAsString(String uri) throws IOException {
HttpClient client = new DefaultHttpClient();
HttpResponse response = client.execute(new HttpGet(uri));
return EntityUtils.toString(response.getEntity());
}
If you are doing this to really learn the internals of HTTP client requests, then you might start by playing with curl from the command line. This will let you get all your headers and request body squared away. Then it will be a simple matter of adjusting your request to match what works in curl.
By the code I think that you are sending 'HOST' instead of 'Host'. Since this is a compulsory header in http/1.1, but ignored in http/1.0, that might be the problem.
Anyway, you could use a program to capture the packet sent (i. e. wireshark), just to make sure.
Using println is quite useful, but the line separator appended to the command depends on the system property line.separator. I think (although I'm not sure) that the line separator used in http protocol has to be '\r\n'. If you're capturing the packet, I think it'd be a good idea to check that each line sent ends with '\r\n' (bytes x0D0A) (just in case your os line separator is different)
Use www.4chan.org as the host instead. Since boards.4chan.org is a 302 redirect to www.4chan.org, you won't be able to scrape anything from boards.4chan.org.

Categories