Can I get cached images using HttpClient? - java

Is it possible to load login page once, using HttpClient, and get image file of img element from cache, not from src link, without reload? It is important because I need to save captcha for just loaded page, if I try load it from src link, it will be another captcha. I tried:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://www.mysite/login.jsp");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
OutputStream outstream = new FileOutputStream("d://file.html");
org.apache.commons.io.IOUtils.copy(instream, outstream);
outstream.close();
instream.close();
but there are not any images. I also tried HtmlUnitDriver from selenium library, there are not any images too. Maybe I must try something else? Can you help me with it?
Thanks and sorry for my English.

As it mentioned here: HttpClient Get images from response the DefaultHttpClient/HttpClient get's only one content, which is in your case it's an HTML page (served from: http://www.mysite/login.jsp). Than you need to parse that HTML page and get the specified img tag with it's src than you need only to download it (ONLY that, without resend the login.jsp request!). If you download a captcha image you need to get that image as soon as possible or it could be overwritten by another user, who tries to login.
As the browser does, you need to do the same way, download HTML, than parse it, than request all src/link/ect depends on what you need.
DefaultHttpClient doesn't cache by default.
CachingHttpClient cache is enabled by default, in this case you need to analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache. If there's no change on the server, you will get cached data, if you cached previously.

Related

Does inputstream contain data of iframe?

URL url = new URL("https://www.cs.tut.fi/~jkorpela/html/iframe-pdf.html");
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
InputStream in = connection.getInputStream();
When calling on getInputStream, i turn all the bytes into a string. But why am i not seeing any sign of the data in the iframe?
My goal is to download the PDF.
If you request a URL, you will only get the contents of that file. An iframe is normally effectively a seperate page, so you would need to request that seperately. A browser will normally do all this transparently.
I would recommend using a library such as JSoup which contains lots of methods for parsing HTML, which you will need to get the URL of the iframe (and the URL of the PDF).

Apache Httppost retrieve image from server Java

Right now I am using Httppost to Post some parameters in the form of xml to a server. When the post occurs, a geotiff or .tif file is downloaded. I have successfully posted the document to the server and successfully downloaded the file simply by attaching the parameters to the url but I can't seem to combine the two. I have to use post because just using the URL leaves out elevation data in the geotiff.
In short, I am not sure how to simultaneously post and retrieve the image of the post. This is what I have thus far...
// Get target URL
String strURL = POST;
// Get file to be posted
String strXMLFilename = XML_PATH;
File input = new File(strXMLFilename);
// Prepare HTTP post
HttpPost post = new HttpPost(strURL);
post.setEntity(new InputStreamEntity(
new FileInputStream(input), input.length()));
// Specify content type and encoding
post.setHeader(
"Content-type", "text/xml");
// Get HTTP client
HttpClient httpclient = new DefaultHttpClient();
//Locate file to store data in
FileEntity entity = new FileEntity(newTiffFile, ContentType.create("image/geotiff"));
post.setEntity(entity);
// Execute request
try {
System.out.println("Connecting to Metoc site...\n");
HttpResponse result = httpclient.execute(post);
I was under the impression that the entity would contain the resulting image. Any help is much appreciated!
Thanks for the help guys. The entity was what was being sent to the server. I had code that was trying to read it from the response as well but it wasn't working because setting the entity to a file entity messed up the post request. By removing that part, it works great!

How to get dynamic cookie

I want to get the id cookie that Google issues when you opt-in at the ads settings page (if you're already accepting target advertisement, you must opt out first to see the page to which I am referring).
I've found that, in order to get this cookie, you have to perform an HTTP GET to the action URL in the form that is in this page. The problem is that this URL contains a hash that changes for every new HTTP connection so, first, I must go to this page and get this URL and, then, perform the GET to the URL.
I'm using HttpComponents to get http://www.google.com/ads/preferences but when I parse the contents with JSOUP there is only a script and no form can be found.
I'm afraid that this happens becauses contents are loaded dynamically using some sort of timeout... Does anyone know a workaround for this?
EDIT: by the way, the code that I use by now is:
HttpClient httpclient = new DefaultHttpClient();
// Create a local instance of cookie store
CookieStore cookieStore = new BasicCookieStore();
// Bind custom cookie store to the local context
((AbstractHttpClient) httpclient).setCookieStore(cookieStore);
CookieSpecFactory csf = new CookieSpecFactory() {
public CookieSpec newInstance(HttpParams params) {
return new BrowserCompatSpec() {
#Override
public void validate(Cookie cookie, CookieOrigin origin)
throws MalformedCookieException {
// Allow all cookies
System.out.println("Allowed cookie: " + cookie.getName() + " "
+ cookie.getValue() + " " + cookie.getPath());
}
};
}
};
((AbstractHttpClient) httpclient).getCookieSpecs().register("EASY", csf);
// Create local HTTP context
HttpContext localContext = new BasicHttpContext();
// Bind custom cookie store to the local context
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpGet httpget = new HttpGet(doubleClickURL);
// Override the default policy for this request
httpclient.getParams().setParameter(
ClientPNames.COOKIE_POLICY, "EASY");
// Pass local context as a parameter
HttpResponse response = httpclient.execute(httpget, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream));
instream.close();
// Find action attribute of form
Document document = Jsoup.parse(reader.readLine());
Element form = document.select("form").first();
String optinURL = form.attr("action");
URL connection = new URL(optinURL);
// ... get id Cookie
}
You may have more chance using HtmlUnit, Selenium or jWebUnit for such a task. JSoup does not interpret Javascript, and the Google page your pointing to is full of Javascript that should be executed by a browser to produce what you're seeing.
HtmlUnit is OS independent and does not need anything else installed, but I've never used it for complicated Javascript sites. HtmlUnit can also extract data from the web page like JSoup does, but you can still feed the html to JSoup if you prefer using it.
Finally I found it! I found the following site describing the doubleclick cookie protocol:
Privacy Advisory
Then, is as easy as setting a cookie in that domain with name id and value A. Then make an HTTP request to http://www.google.com/ads/preferences and they'll set a correct ID value.
It is a very specific question but I hope that serves to future viewers.
By the way, I found that amazon.com is for example a member of the Ad-sense Network. An HTTP request to doubleclick is sent by means of script in the main page to:
http://ad.doubleclick.net/adj/amzn.us.gw.atf
There you can find a script that seems the actual code to give you the id cookie. Nevertheless, if you access this with the cookie with value A it will set the id of doubleclick.

How can i programmatically upload a file to a website?

I have to upload a file to a server which only exposes a jsf web page with file upload button (over http). I have to automate a process (done as java stand alone process) which generates a file and uploads the file to the server.Sadly the server to where the file has to be uploaded does not provide a FTP or SFTP. Is there a way to do this?
Thanks,
Richie
When programmatically submitting a JSF-generated form, you need to make sure that you take the following 3 things in account:
Maintain the HTTP session (certainly if website has JSF server side state saving turned on).
Send the name-value pair of the javax.faces.ViewState hidden field.
Send the name-value pair of the button which is virtually to be pressed.
Otherwise the action will possibly not be invoked at all. For the remnant it's not different from "regular" forms. The flow is basically as follows:
Send a GET request on the page with the form.
Extract the JSESSIONID cookie.
Extract the value of the javax.faces.ViewState hidden field from the response. If necessary (for sure if it has a dynamically generated name and thus possibly changes every request), extract the name of input file field and the submit buttonas well. Dynamically generated IDs/names are recognizeable by the j_id prefix.
Prepare a multipart/form-data POST request.
Set the JSESSIONID cookie (if not null) on that request.
Set the name-value pair of javax.faces.ViewState hidden field and the button.
Set the file to be uploaded.
You can use any HTTP client library to perform the task. The standard Java SE API offers java.net.URLConnection for this, which is pretty low level. To end up with less verbose code, you could use Apache HttpClient to do the HTTP requests and manage the cookies and Jsoup to extract data from the HTML.
Here's a kickoff example, assuming that the page has only one <form> (otherwise you need to include an unique identifier of that form in Jsoup's CSS selectors):
String url = "http://localhost:8088/playground/test.xhtml";
String viewStateName = "javax.faces.ViewState";
String submitButtonValue = "Upload"; // Value of upload submit button.
HttpClient httpClient = new DefaultHttpClient();
HttpContext httpContext = new BasicHttpContext();
httpContext.setAttribute(ClientContext.COOKIE_STORE, new BasicCookieStore());
HttpGet httpGet = new HttpGet(url);
HttpResponse getResponse = httpClient.execute(httpGet, httpContext);
Document document = Jsoup.parse(EntityUtils.toString(getResponse.getEntity()));
String viewStateValue = document.select("input[type=hidden][name=" + viewStateName + "]").val();
String uploadFieldName = document.select("input[type=file]").attr("name");
String submitButtonName = document.select("input[type=submit][value=" + submitButtonValue + "]").attr("name");
File file = new File("/path/to/file/you/want/to/upload.ext");
InputStream fileContent = new FileInputStream(file);
String fileContentType = "application/octet-stream"; // Or whatever specific.
String fileName = file.getName();
HttpPost httpPost = new HttpPost(url);
MultipartEntity entity = new MultipartEntity();
entity.addPart(uploadFieldName, new InputStreamBody(fileContent, fileContentType, fileName));
entity.addPart(viewStateName, new StringBody(viewStateValue));
entity.addPart(submitButtonName, new StringBody(submitButtonValue));
httpPost.setEntity(entity);
HttpResponse postResponse = httpClient.execute(httpPost, httpContext);
// ...
Try using HttpClient, here's an article that I think describes what you want, towards the bottom there's a section titled "Using HttpClient-Based FileUpload".
Hope this helps.
Probably that webpage just sends a POST request to the server with the contents of the form. You can easily send such a POST request yourself from Java, without using that page. For example this article shows an example of sending POST requests from Java
What you'll need to do is to examine the HTML on the page and work out what parameters are needed to post the form. It'll probably look something like this:
<form action="/RequestURL">
<input type=file name=file1>
<input type=textbox name=value1>
</form>
Based on that you can write some code to do a POST request to the url:
String data = URLEncoder.encode("value1", "UTF-8") + "=" + URLEncoder.encode("value1", "UTF-8");
data += "&" + URLEncoder.encode("file1", "UTF-8") + "=" + URLEncoder.encode(FileData, "UTF-8");
// Send data
URL url = new URL("http://servername.com/RequestURL");
URLConnection conn = url.openConnection();
conn.setDoOutput(true);
OutputStreamWriter wr = new OutputStreamWriter(conn.getOutputStream());
wr.write(data);
wr.flush();
wr.close();
Remember that the person who wrote the page might do some checks to make sure the POST request came from the same site. In that case you might be in trouble, and you might need to set the user agent correctly.
You could try to use HtmlUnit for this. It provides a very simply API for simulating browser actions. I already used this approach for similar requirements. It's very easy. You should give it a try.

HttpClient - Cookies - and JEditorPane

I've successfully managed to logon to a site using httpclient and print out the cookies that enable that logon.
However, I am now stuck because I wanted to display subsequent pages in a JEditorPane using .setPage(url) function. However, when I do that and analyse my GET request using Wireshark I see that the user agent is not my httpclient but the following:
User-Agent: Java/1.6.0_17
The GET request (which is coded somewhere in side jeditorpane's setPage(URL url) method) does not have the cookies that were retrieved using the httpclient. My question is - how can I somehow transfer the cookies received with httpclient so that my JEditorPane can display URLs from the site?
I'm beginning to think it's not possible and I should try and logon using normal Java URLconnection etc but would rather stick with httpclient as it's more flexible (I think). Presumably I would still have a problem with the cookies??
I had thought of extending the JEditorPane class and overriding the setPage() but I don't know the actual code I should put in it as can't seem to find out how setPage() actually works.
Any help/suggestions would be greatly appreciated.
Dave
As I mentioned in the comment, HttpClient and the URLConnection used by the JEditorPane to fetch the URL content don't talk to each other. So, any cookies that HttpClient may have fetched won't transfer over to the URLConnection. However, you can subclass JEditorPane like so :
final HttpClient httpClient = new DefaultHttpClient();
/* initialize httpClient and fetch your login page to get the cookies */
JEditorPane myPane = new JEditorPane() {
protected InputStream getStream(URL url) throws IOException {
HttpGet httpget = new HttpGet(url.toExternalForm());
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
// important! by overriding getStream you're responsible for setting content type!
setContentType(entity.getContentType().getValue());
// another thing that you're now responsible for... this will be used to resolve
// the images and other relative references. also beware whether it needs to be a url or string
getDocument().putProperty(Document.StreamDescriptionProperty, url);
// using commons-io here to take care of some of the more annoying aspects of InputStream
InputStream content = entity.getContent();
try {
return new ByteArrayInputStream(IOUtils.toByteArray(content));
}
catch(RuntimeException e) {
httpget.abort(); // per example in HttpClient, abort needs to be called on unexpected exceptions
throw e;
}
finally {
IOUtils.closeQuietly(content);
}
}
};
// now you can do this!
myPane.setPage(new URL("http://www.google.com/"));
By making this change, you'll be using HttpClient to fetch the URL content for your JEditorPane. Be sure to read the JavaDoc here http://download.oracle.com/javase/1.4.2/docs/api/javax/swing/JEditorPane.html#getStream(java.net.URL) to make sure that you catch all the corner cases. I think I've got most of them sorted, but I'm not an expert.
Of course, you can change around the HttpClient part of the code to avoid loading the response into memory first, but this is the most concise way. And since you're going to be loading it up into an editor, it will all be in memory at some point. ;)
Under Java 5 & 6, there is a default cookie manager which "automatically" supports HttpURLConnection, the type of connection JEditorPane uses by default.
Based on this blog entry, if you write something like
CookieManager manager = new CookieManager();
manager.setCookiePolicy(CookiePolicy.ACCEPT_NONE);
CookieHandler.setDefault(manager);
seems enough to support cookies in JEditorPane.
Make sure to add this code before any internet communication with JEditorPane takes place.

Categories