HttpClient - Cookies - and JEditorPane - java

I've successfully managed to logon to a site using httpclient and print out the cookies that enable that logon.
However, I am now stuck because I wanted to display subsequent pages in a JEditorPane using .setPage(url) function. However, when I do that and analyse my GET request using Wireshark I see that the user agent is not my httpclient but the following:
User-Agent: Java/1.6.0_17
The GET request (which is coded somewhere in side jeditorpane's setPage(URL url) method) does not have the cookies that were retrieved using the httpclient. My question is - how can I somehow transfer the cookies received with httpclient so that my JEditorPane can display URLs from the site?
I'm beginning to think it's not possible and I should try and logon using normal Java URLconnection etc but would rather stick with httpclient as it's more flexible (I think). Presumably I would still have a problem with the cookies??
I had thought of extending the JEditorPane class and overriding the setPage() but I don't know the actual code I should put in it as can't seem to find out how setPage() actually works.
Any help/suggestions would be greatly appreciated.
Dave

As I mentioned in the comment, HttpClient and the URLConnection used by the JEditorPane to fetch the URL content don't talk to each other. So, any cookies that HttpClient may have fetched won't transfer over to the URLConnection. However, you can subclass JEditorPane like so :
final HttpClient httpClient = new DefaultHttpClient();
/* initialize httpClient and fetch your login page to get the cookies */
JEditorPane myPane = new JEditorPane() {
protected InputStream getStream(URL url) throws IOException {
HttpGet httpget = new HttpGet(url.toExternalForm());
HttpResponse response = httpClient.execute(httpget);
HttpEntity entity = response.getEntity();
// important! by overriding getStream you're responsible for setting content type!
setContentType(entity.getContentType().getValue());
// another thing that you're now responsible for... this will be used to resolve
// the images and other relative references. also beware whether it needs to be a url or string
getDocument().putProperty(Document.StreamDescriptionProperty, url);
// using commons-io here to take care of some of the more annoying aspects of InputStream
InputStream content = entity.getContent();
try {
return new ByteArrayInputStream(IOUtils.toByteArray(content));
}
catch(RuntimeException e) {
httpget.abort(); // per example in HttpClient, abort needs to be called on unexpected exceptions
throw e;
}
finally {
IOUtils.closeQuietly(content);
}
}
};
// now you can do this!
myPane.setPage(new URL("http://www.google.com/"));
By making this change, you'll be using HttpClient to fetch the URL content for your JEditorPane. Be sure to read the JavaDoc here http://download.oracle.com/javase/1.4.2/docs/api/javax/swing/JEditorPane.html#getStream(java.net.URL) to make sure that you catch all the corner cases. I think I've got most of them sorted, but I'm not an expert.
Of course, you can change around the HttpClient part of the code to avoid loading the response into memory first, but this is the most concise way. And since you're going to be loading it up into an editor, it will all be in memory at some point. ;)

Under Java 5 & 6, there is a default cookie manager which "automatically" supports HttpURLConnection, the type of connection JEditorPane uses by default.
Based on this blog entry, if you write something like
CookieManager manager = new CookieManager();
manager.setCookiePolicy(CookiePolicy.ACCEPT_NONE);
CookieHandler.setDefault(manager);
seems enough to support cookies in JEditorPane.
Make sure to add this code before any internet communication with JEditorPane takes place.

Related

Can I get cached images using HttpClient?

Is it possible to load login page once, using HttpClient, and get image file of img element from cache, not from src link, without reload? It is important because I need to save captcha for just loaded page, if I try load it from src link, it will be another captcha. I tried:
DefaultHttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://www.mysite/login.jsp");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
InputStream instream = entity.getContent();
OutputStream outstream = new FileOutputStream("d://file.html");
org.apache.commons.io.IOUtils.copy(instream, outstream);
outstream.close();
instream.close();
but there are not any images. I also tried HtmlUnitDriver from selenium library, there are not any images too. Maybe I must try something else? Can you help me with it?
Thanks and sorry for my English.
As it mentioned here: HttpClient Get images from response the DefaultHttpClient/HttpClient get's only one content, which is in your case it's an HTML page (served from: http://www.mysite/login.jsp). Than you need to parse that HTML page and get the specified img tag with it's src than you need only to download it (ONLY that, without resend the login.jsp request!). If you download a captcha image you need to get that image as soon as possible or it could be overwritten by another user, who tries to login.
As the browser does, you need to do the same way, download HTML, than parse it, than request all src/link/ect depends on what you need.
DefaultHttpClient doesn't cache by default.
CachingHttpClient cache is enabled by default, in this case you need to analyzes If-Modified-Since and If-None-Match headers in order to decide if request to the remote server is performed, or if its result is returned from cache. If there's no change on the server, you will get cached data, if you cached previously.

How to get dynamic cookie

I want to get the id cookie that Google issues when you opt-in at the ads settings page (if you're already accepting target advertisement, you must opt out first to see the page to which I am referring).
I've found that, in order to get this cookie, you have to perform an HTTP GET to the action URL in the form that is in this page. The problem is that this URL contains a hash that changes for every new HTTP connection so, first, I must go to this page and get this URL and, then, perform the GET to the URL.
I'm using HttpComponents to get http://www.google.com/ads/preferences but when I parse the contents with JSOUP there is only a script and no form can be found.
I'm afraid that this happens becauses contents are loaded dynamically using some sort of timeout... Does anyone know a workaround for this?
EDIT: by the way, the code that I use by now is:
HttpClient httpclient = new DefaultHttpClient();
// Create a local instance of cookie store
CookieStore cookieStore = new BasicCookieStore();
// Bind custom cookie store to the local context
((AbstractHttpClient) httpclient).setCookieStore(cookieStore);
CookieSpecFactory csf = new CookieSpecFactory() {
public CookieSpec newInstance(HttpParams params) {
return new BrowserCompatSpec() {
#Override
public void validate(Cookie cookie, CookieOrigin origin)
throws MalformedCookieException {
// Allow all cookies
System.out.println("Allowed cookie: " + cookie.getName() + " "
+ cookie.getValue() + " " + cookie.getPath());
}
};
}
};
((AbstractHttpClient) httpclient).getCookieSpecs().register("EASY", csf);
// Create local HTTP context
HttpContext localContext = new BasicHttpContext();
// Bind custom cookie store to the local context
localContext.setAttribute(ClientContext.COOKIE_STORE, cookieStore);
HttpGet httpget = new HttpGet(doubleClickURL);
// Override the default policy for this request
httpclient.getParams().setParameter(
ClientPNames.COOKIE_POLICY, "EASY");
// Pass local context as a parameter
HttpResponse response = httpclient.execute(httpget, localContext);
HttpEntity entity = response.getEntity();
if (entity != null) {
InputStream instream = entity.getContent();
BufferedReader reader = new BufferedReader(
new InputStreamReader(instream));
instream.close();
// Find action attribute of form
Document document = Jsoup.parse(reader.readLine());
Element form = document.select("form").first();
String optinURL = form.attr("action");
URL connection = new URL(optinURL);
// ... get id Cookie
}
You may have more chance using HtmlUnit, Selenium or jWebUnit for such a task. JSoup does not interpret Javascript, and the Google page your pointing to is full of Javascript that should be executed by a browser to produce what you're seeing.
HtmlUnit is OS independent and does not need anything else installed, but I've never used it for complicated Javascript sites. HtmlUnit can also extract data from the web page like JSoup does, but you can still feed the html to JSoup if you prefer using it.
Finally I found it! I found the following site describing the doubleclick cookie protocol:
Privacy Advisory
Then, is as easy as setting a cookie in that domain with name id and value A. Then make an HTTP request to http://www.google.com/ads/preferences and they'll set a correct ID value.
It is a very specific question but I hope that serves to future viewers.
By the way, I found that amazon.com is for example a member of the Ad-sense Network. An HTTP request to doubleclick is sent by means of script in the main page to:
http://ad.doubleclick.net/adj/amzn.us.gw.atf
There you can find a script that seems the actual code to give you the id cookie. Nevertheless, if you access this with the cookie with value A it will set the id of doubleclick.

Posting to a form based on what HttpGet returns with Apache's HttpClient

I'm posting data to a website form using Apache's HttpClient class. The form is retrieved using the following lines of code:
HttpGet get = new HttpGet(url);
HttpResponse response = client.execute(get);
The website that I'm retrieving the form from requires authentication to access the form. If the request isn't authenticated, the website redirects the request to a login form page that will subsequently redirect back to the original page on successful authentication.
I want to cleanly detect whether or not the GET request returns the login page or the desired form page so that I can either POST login data or form data. The only way I can think of to do this is by reading from the content InputStream of the entity of the response and parsing each line. But that seems somewhat convoluted. I haven't worked with the Apache HttpComponents api before so I'm not sure if this would be the only and best way to accomplish what I want to accomplish.
EDIT: To clarify question, I'm asking if there is a set way to handle forms with Apache's HttpClient. I somewhat know how to achieve what I'm looking to do, but it looks very ugly and I'm hoping there is an easier and faster way to achieve it. For example, if there was some way to do the following:
HttpGet get = new HttpGet(url);
HttpResponse response = client.execute(get);
if(parseElements(response.getEntity()).hasFormWithId("login")) {
// post authentication data
} else {
// post actual form data
}
Because of my inexperience with Apache's HttpClient api, I'm not sure if what I'm looking for in the API is too abstract for the intent of the API.
You can modify the behavior of the HttpClient by setting the HttpClient Parameters
DefaultHttpClient client = new DefaultHttpClient();
client.setDefaultHttpParams(client.getParams().setBoolean(ClientPNames.HANDLE_REDIRECTS, false));
Which disables handling redirects automatically.
See also:
Automatic redirect handling
HTTP Authentication
DefaultHttpClient API

Multiple Difficulties with HttpURLConnection class in Java for Android

--Update--
Apologies for those who helped me, it turns out this is just a problem with Eclipse's debugger. After suspecting that it was leading me wrong, I placed down a couple of System.out.println to watch the variables, and according to them they ARE being changed, and that the debugger was just showing me old information for whatever reason. No clue why that's happening, but the important thing is that the code does apparently actually work.
I'm working on a method to share with twitter for an Android application, and I'm having errors when setting up the HttpURLConnection. I create the connection object as per usual, using the openconnection function of a url then casting it to a HttpURLConnection, and when I subsequently run SetRequestMethod("POST") on the connection, it does absolutely nothing. When I run the code in the debugger line by line, as I go through that line the request method just remains as the default ("GET"). Anyone have any idea as to why this may be happening? I'm getting the same problem with setDoOutput(true) also not changing anything. However, adding a request property does still work. I've been searching around and haven't been able to find anything on this problem, not even another person reporting these problems.
I am not sur whether using HttpURLConnection is the best here.
Did you try the following way?
// Building the POST request
final BasicNameValuePair message = new BasicNameValuePair("yourField", "yourContent");
final List<NameValuePair> list = new ArrayList<NameValuePair>(1);
list.add(message);
final HttpPost httppost = createHttpPost(UrlEncodedFormEntity(list));
// Building the HTTP client
final HttpParams httpParameters = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(httpParameters, YOUR_CHOSEN_CONN_TIMEOUT);
HttpConnectionParams.setSoTimeout (httpParameters, YOUR_CHOSEN_SO_TIMEOUT);
final HttpClient httpClient = new DefaultHttpClient(httpParameters);
// Execution of the POST request
final HttpResponse response = httpClient.execute(httppost);
This is the way I usually do, with no problems.
[EDIT: 04-25-2014] Apache's HttpClient was the best approach for Froyo and former versions. Now, according to this article from Android Developers Blog (written after this Q&A), it is better to use URLConnection.

How to use HttpClient to download continous data stream?

I'm currently using HttpURLConnection to stream live content such as a radio broadcast. However it seems that using HttpClient is a better option since it's well supported by Android and it's a better implementation. Also, there seems to be a logic for automatic reconnection from a lost connection.
My problem is that I can't get this to work. It's always hanging when calling httpclient.execute(...).
What am I doing wrong?
HttpClient httpclient = new DefaultHttpClient();
HttpGet httpget = new HttpGet("http://208.76.243.123:7100");
HttpResponse response = httpclient.execute(httpget);
HttpEntity entity = response.getEntity();
Run it in debugger and when it hangs, call break. Then find the thread that is executing your code and see in stack trace where exactly it blocked. You will see if it blocked on IO or something else is happening. With that data it will be easier to identify the problem.
Are you sure your server understands the HTTP protocol? (I assume yes, it sounds like you had a different client working). It is possible the execute method is blocking because it has not seen a valid Response header yet.
You probably want entity.getContent() which will return a handle to a stream. See this question.

Categories