Java Apache HttpClient Submitting Form - java

I am trying to submit a form on this website, and get back the resulting misspellings from the text area as a string (only the "Reverse letters" checkbox should be selected). I have the code below, adapted from here:
private static void sendPost() throws Exception {
String url = "http://tools.seobook.com/spelling/keywords-typos.cgi";
HttpClient client = new DefaultHttpClient();
HttpPost post = new HttpPost(url);
post.setHeader("User-Agent", "Mozilla/5.0"); // add header
List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
//the input text area
urlParameters.add(new BasicNameValuePair("user_input", "tomato potato"));
//the checkbox
urlParameters.add(new BasicNameValuePair("reverse_letters", "reverse_letters"));
//the submit button (?)
urlParameters.add(new BasicNameValuePair("", "generate typos"));
post.setEntity(new UrlEncodedFormEntity(urlParameters));
HttpResponse response = client.execute(post);
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + post.getEntity());
System.out.println("Response Code : " +
response.getStatusLine().getStatusCode());
BufferedReader rd = new BufferedReader(new InputStreamReader(
response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line + "\n");
}
System.out.println(result.toString());
}
If I copy and paste the lines from the console, and search through it in an editor for the misspellings, I do in fact have the input text and resulting text area text contained in the huge string. The string contains all html however, and I would like only the misspellings as a string. How would I extract only the resulting misspellings from this site, perhaps with a method as part of the Apache HttpClient Library, or I am taking the wrong approach?
Thanks, Dan

I think you are trying to put a square peg in a round hole, Selenium would probably be a better bet.
Apache http client is best used for request and response header handling not for processing the body of a response
An over complicated way would be to split the "result" variable using regex's

Related

How to parse/query a static html page using java and httpclient

Here is the Http Post request using the HttpClient API.This piece of code lets me fetch the entire page content in the raw html format.My requirement is to send some parameters to this method such that it fetches only required data(like in a database) in readable format.By 'readable' I mean without the html tags.I dont know if JSON has to be brought in here.
I went through JSoup , but it seems to be doing the job of a scraper.
So how should I proceed with the html content I currently have?
private void sendPost() throws Exception {
String url = "https://www.elitmus.com/jobs";
HttpClient client = HttpClientBuilder.create().build();
HttpPost post = new HttpPost(url);
// add header
post.setHeader("User-Agent", USER_AGENT);
List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
urlParameters.add(new BasicNameValuePair("sn", "C02G8416DRJM"));
urlParameters.add(new BasicNameValuePair("cn", ""));
urlParameters.add(new BasicNameValuePair("locale", ""));
urlParameters.add(new BasicNameValuePair("caller", ""));
urlParameters.add(new BasicNameValuePair("num", "12345"));
post.setEntity(new UrlEncodedFormEntity(urlParameters));
HttpResponse response = client.execute(post);
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + post.getEntity());
System.out.println("Response Code : " +
response.getStatusLine().getStatusCode());
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.out.println(result.toString());
}
}

Curl to Java Post

How do I do a HTTP GET POST PUT DELETE Request using Java?
I'm using CouchDB and I can post data using cUrl into the database. How do I do the same thing using Java however I cannot find any information on this with good documentation.
curl -X PUT http://anna:secret#127.0.0.1:5984/somedatabase/
Could some please change this cUrl request to Java. Otherwise please recommend me libraries to do so.
Thank You.
You can use HttpClient by Apache.
Here is an example usage of how to call a POST request
String url = "https://your.url.to.post.to/";
HttpClient client = HttpClientBuilder.create().build();
HttpPost post = new HttpPost(url);
List<NameValuePair> urlParameters = new ArrayList<NameValuePair>();
urlParameters.add(new BasicNameValuePair("param1", "value1"));
post.setEntity(new UrlEncodedFormEntity(urlParameters));
HttpResponse response = client.execute(post);
System.out.println("Response Code : "
+ response.getStatusLine().getStatusCode());
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
I do recommend that you check this article for more examples.

Possible redirect issue while trying to log in to website with HttpClient

I've copied Mkyong's Apache HttpClient example almost word for word except for swapping out the deprecated methods and my own log in information (I even managed to copy his typos!): Mkyong's example
private void sendPost(String url, List<NameValuePair> postParams)
throws Exception {
post.setParams(params);
post.setHeader("Host", "accounts.google.com");
post.setHeader("User-Agent", USER_AGENT);
post.setHeader("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8");
post.setHeader("Accept-Language", "en-US,en;q=0.5");
post.setHeader("Cookie", getCookies());
post.setHeader("Connection", "keep-alive");
post.setHeader("Referer", "https://accounts.google.com/ServiceLoginAuth");
post.setHeader("Content-Type", "application/x-www-form-urlencoded");
post.setEntity(new UrlEncodedFormEntity(postParams));
HttpResponse response = client.execute(post);
int responseCode = response.getStatusLine().getStatusCode();
System.out.println("\nSending 'POST' request to URL : " + url);
System.out.println("Post parameters : " + postParams);
System.out.println("Response Code : " + responseCode);
BufferedReader rd = new BufferedReader(
new InputStreamReader(response.getEntity().getContent()));
StringBuffer result = new StringBuffer();
String line = "";
while ((line = rd.readLine()) != null) {
result.append(line);
}
System.out.println(result.toString());
}
The print statement at the end of the method returns that the page has been 'moved temporarily'.
<HTML><HEAD><TITLE>Moved Temporarily</TITLE></HEAD><BODY BGCOLOR="#FFFFFF" TEXT="#000000"><H1>Moved Temporarily</H1>The document has moved here.</BODY></HTML>
I believe this is a redirect, though I am not entirely sure. Here is the main method:
public static void main(String[] args) throws Exception {
String url = "https://accounts.google.com/ServiceLoginAuth";
String gmail = "https://mail.google.com/mail/";
CookieHandler.setDefault(new CookieManager());
HttpCilentExample http = new HttpCilentExample();
String page = http.GetPageContent(url);
List<NameValuePair> postParams =
http.getFormParams(page, "example#gmail.com","examplePassword");
http.sendPost(url, postParams);
System.out.println("past here");
String result = http.GetPageContent(gmail);
System.out.println(result);
}
The code does not process past String result = http.GetPageContent(gmail);
I believe that this has something to do with the issue (this is from Handling HttpClient Redirects)
10.3.3 302 Found ...
If the 302 status code is received in response to a request other
than GET or HEAD, the user agent MUST NOT automatically redirect the
request unless it can be confirmed by the user, since this might
change the conditions under which the request was issued.
I have attempted to override the isRedirected() method, but am unsure as to what I am supposed to do. Am I to return false for all cases, or check the parameters for POST cases?
I am unsure if this is the entire problem and am unsure if I am approaching this correctly at all.
However, I would appreciate any explanations that would help me understand the process of logging into a website using Httpclient.

Getting URL after a redirect using HttpClient.Execute(HttpGet)

I have searched for a while and I am not finding a clear answer. I am trying to log into a webstie.
https://hrlink.healthnet.com/
This website redirects to a login page that is not consitent. I have to post my login credentials to the redirected URL.
Im am trying to code this in Java but I do not understand how to get the URL from the response. It may look a bit messy but I have it this way while I am testing.
HttpGet httpget = new HttpGet("https://hrlink.healthnet.com/");
HttpResponse response = httpclient.execute(httpget);HttpEntity entity = response.getEntity();
String redirectURL = "";
for(org.apache.http.Header header : response.getHeaders("Location")) {
redirectURL += "Location: " + header.getValue()) + "\r\n";
}
InputStream is;
is = entity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(is,"iso-8859-1"),8);
StringBuilder sb = new StringBuilder();
String line = null;
while ((line = reader.readLine()) != null) {
sb.append(line + "\n");
}
is.close();
String result = sb.toString();
I know i get redirected because my result string shows be the actual login page but I am not able to get the new URL.
In FireFox I am using TamperData. When I navigate to this website https://hrlink.healthnet.com/ I have a GET with a 302 - Found and the Location of the Login Page. Then another GET to the actual Login Page
Any help is greatly appreciated thank you.
Check out w3c documentation:
10.3.3 302 Found
The temporary URI SHOULD be given by the Location field in the response. Unless the request method was HEAD, the entity of the response SHOULD contain a short hypertext note with a hyperlink to the new URI(s).
If the 302 status code is received in response to a request other than GET or HEAD, the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.
One solution is to use POST method to break auto-redirecting at client side:
HttpPost request1 = new HttpPost("https://hrlink.healthnet.com/");
HttpResponse response1 = httpclient.execute(request1);
// expect a 302 response.
if (response1.getStatusLine().getStatusCode() == 302) {
String redirectURL = response1.getFirstHeader("Location").getValue();
// no auto-redirecting at client side, need manual send the request.
HttpGet request2 = new HttpGet(redirectURL);
HttpResponse response2 = httpclient.execute(request2);
... ...
}
Hope this helps.

HttpClient is not showing all HTML Inputs on a web page

I am using a Http Get to request a website with a total of 7 Html Inputs on it.
However, when I get the page and output it, only 5 inputs appear (in either my console or outputted to a text file).
The website I'm trying to get the 7 inputs for is an intranet site so it'd be of no use to provide the address.
This is my code/ "http getting" method
//GET a web page and store as string in "htmlpage"
DefaultHttpClient httpclient = new DefaultHttpClient();
//Sometimes I need the below code, but not this time
//httpclient.setRedirectStrategy(new RedirectStrategy());
HttpGet httget = new HttpGet("example-website.aspx");
HttpResponse response = httpclient.execute(httget);
HttpEntity entity = response.getEntity();
InputStream in = entity.getContent();
StringBuffer charBuf = new StringBuffer();
do{char c = (char)in.read();
charBuf.append(c);
}while (charBuf.length() < entity.getContentLength());
String htmlpage = charBuf.toString();
charBuf.delete(0, charBuf.length()-1);
in.close();
EntityUtils.consume(entity);
httget.abort();
FileOutputStream fo = new FileOutputStream("please_have_7_this_time.html");
fo.write(htmlpage.getBytes());
fo.flush();
fo.close();

Categories