How to get info from website once logged on with HTMLUNIT? - java

I have made a post before about this but have gained some more details on how to do it, yet i am still unable to do it properly. This is the main part of code. When i run it i get a whole bunch of warnings related to css in the console. And it wont work. Im trying to get the user's name as i mentioned in the code.If someone could help it would mean a lot to me. The website is my school website: https://lionel2.kgv.edu.hk/login/index.php . I have included the logged on website ( I removed most elements except for my user part ) if that helps. Thanks in advance,
Vijay.
website:
https://drive.google.com/a/kgv.hk/file/d/0B-O_Xw0mAw7tajJhVlRxTkFhOE0/view?usp=sharing
//most of this is from https://gist.github.com/harisupriyanto/6805988
String loginUrl = "http://lionel2.kgv.edu.hk";
int loginFormNum = 1;
String usernameInputName = "nameinput";
String passwordInputName = "passinput";
String submitLoginButtonValue = "Sign In";
// create the HTMLUnit WebClient instance
WebClient wclient = new WebClient();
// configure WebClient based on your desired
wclient.getOptions().setPrintContentOnFailingStatusCode(false);
wclient.getOptions().setCssEnabled(true);
wclient.getOptions().setThrowExceptionOnFailingStatusCode(false);
wclient.getOptions().setThrowExceptionOnScriptError(false);
try {
final HtmlPage loginPage = (HtmlPage)wclient.getPage(loginUrl);
final HtmlForm loginForm = loginPage.getForms().get(loginFormNum);
final HtmlTextInput txtUser = loginForm.getInputByName(usernameInputName);
txtUser.setValueAttribute(username);
final HtmlPasswordInput txtpass = loginForm.getInputByName(passwordInputName);
txtpass.setValueAttribute(password);
final HtmlSubmitInput submitLogin = loginForm.getInputByValue(submitLoginButtonValue);
final HtmlPage returnPage = submitLogin.click();
final HtmlElement returnBody = returnPage.getBody();
//if (//there is a class called "Login info, then print out the nodeValue.) {
// }
} catch(FailingHttpStatusCodeException e) {
e.printStackTrace();
} catch(Exception e) {
e.printStackTrace();
}
}

You most likely do not need the CSS so you could disable it.
To improve performance and reduce warnings and errors I disable/limit as much as possible.
webClient.setJavaScriptTimeout(30 * 1000); // 30s
webClient.getOptions().setTimeout(300 * 1000); // 300s
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setThrowExceptionOnScriptError(false); // no Exceptions because of javascript
webClient.getOptions().setPopupBlockerEnabled(true);

Related

Find a form with Java and htmlUnit

I have written a simple program which should login via a form on a website.
Unfortunately, the form in the html has no name or id.
I use the latest version of HtmlUnit and Java 11.
I tried to find the form with the .getForms () method, but without success.
Html Snippet from Website i try to login
Here is my code to find the form:
//Get the form
HtmlForm form = LoginPage.getFormByName("I tried several options here");
//Get the Submit button
final HtmlButton loginButton = form.getButtonByName("Anmelden");
//Get the text fields for password and username
final HtmlTextInput username = form.getInputByName("text");
final HtmlTextInput password = form.getInputByName("password");
Whatever I tried, I didn't find any form.
This is my connection class if it helps:
public HtmlPage CslPlasmaConnection(){
//Create Webclient to connect to CslPlasma
WebClient CslPlasmaConnection = new WebClient(BrowserVersion.BEST_SUPPORTED);
//helper variable ini with null
HtmlPage CslPlasmaLoginPage = null;
//Get the content from CslPlasma
try {
CslPlasmaLoginPage = CslPlasmaConnection.getPage(URL);
} catch (IOException e) {
e.printStackTrace();
}
//Return CslPlasma Login Page
return CslPlasmaLoginPage;
}
Without knowing the page i can only guess...
Have a look at this answer https://stackoverflow.com/a/54188201/4804091
And try to use the latest page (maybe there is some js that creates the form).
webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10000);
HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
If you're sure this is the only form on the page or you know which form number it is, you can use page.getForms() to get all forms of the page and get yours from the resulting list.
Like so:
HtmlForm form = LoginPage.getForms().get(0); // if it's the only form, its index is 0

HtmlUnit button click

I'm trying to send a message on www.meetme.com but can't figure out how to do it. I can type in the message in the comment area but clicking the Send button doesn't do anything. What am I doing wrong? When I login and press the Login button the page does change and everything is fine. Anyone have any ideas or clues?
HtmlPage htmlPage = null;
HtmlElement htmlElement;
WebClient webClient = null;
HtmlButton htmlButton;
HtmlForm htmlForm;
try{
// Create and initialize WebClient object
webClient = new WebClient(BrowserVersion.FIREFOX_17 );
webClient.setCssEnabled(false);
webClient.setJavaScriptEnabled(false);
webClient.setThrowExceptionOnFailingStatusCode(false);
webClient.setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getCookieManager().setCookiesEnabled(true);
/*webClient.setRefreshHandler(new RefreshHandler() {
public void handleRefresh(Page page, URL url, int arg) throws IOException {
System.out.println("handleRefresh");
}
});*/
htmlPage = webClient.getPage("http://www.meetme.com");
htmlForm = htmlPage.getFirstByXPath("//form[#action='https://ssl.meetme.com/login']");
htmlForm.getInputByName("username").setValueAttribute("blah#gmail.com");
htmlForm.getInputByName("password").setValueAttribute("blah");
//Signing in
htmlButton = htmlForm.getElementById("login_form_submit");
htmlPage = (HtmlPage) htmlButton.click();
htmlPage = webClient.getPage("http://www.meetme.com/member/1234567890");
System.out.println("BEFORE CLICK");
System.out.println(htmlPage.asText());
//type message in text area
HtmlTextArea commentArea = (HtmlTextArea)htmlPage.getFirstByXPath("//textarea[#id='profileQMBody']");
commentArea.setText("Testing");
htmlButton = (HtmlButton) htmlPage.getHtmlElementById("profileQMSend");
htmlPage = (HtmlPage)htmlButton.click();
webClient.waitForBackgroundJavaScript(7000);
//The print is exactly the same as the BEFORE CLICK print
System.out.println("AFTER CLICK");
System.out.println(htmlPage.asText());
}catch(ElementNotFoundException e){
e.printStackTrace();
}catch(Exception e){
e.printStackTrace();
}
Without knowing much about the webpage you're accessing, you just can't perform an AJAX request with JavaScript disabled. If changing that doesn't result in success, then you will have to keep debugging, but make sure JavaScript is enabled.
Additionally, make sure you're using HtmlUnit 1.12 and update all the deprecated methods in your code.
BTW, I'd also recommend to turn may JavaScript warnings off. Check this answer to see how you can do that.

Maintaining login credentials across pages in HTMLunit WebClient

My question is very similar to the one at this page, except that I don't have access to the remote server, nor know how it does its authentication.
I'm trying to maintain logged in status across webpages that I can request using webclient.getPage(). The website I'm accessing uses a standard login form with username, password pair. What I've done before is to create a little function to do that for me:
public static HtmlPage logIn(HtmlPage page) {
HtmlPage nextpage = null;
final HtmlForm form = page.getFormByName("login_form");
final HtmlSubmitInput button = form.getInputByValue("Login");
final HtmlTextInput username = form.getInputByName("username");
final HtmlPasswordInput password = form.getInputByName("password");
username.setValueAttribute("user_foo");
password.setValueAttribute("pwd_bar");
// hit submit button and return the requested page
try {
nextpage = button.click();
} catch (IOException e) {
e.printStackTrace();
}
return nextpage;
}
The problem with this is that I have to manually search the page returned by this function in order to find the link to the page I want. More troubling is that this only works for the page right after login, but for not other pages.
Instead, I would like to hold the login information within the browser simulator, "webclient", so that I can seamlessly access any protected page within the site. In addition to attempting the solution in the previous question (linked above), I have attempted the following solution without success:
private static void setCredentials(WebClient webClient) {
String username = "user_foo";
String password = "pwd_bar";
DefaultCredentialsProvider creds = (DefaultCredentialsProvider) webClient.getCredentialsProvider();//new DefaultCredentialsProvider();
try {
creds.addCredentials(username, password);
webClient.setCredentialsProvider(creds);
}
catch (Exception e){
System.out.println("!!! Problem login in");
e.printStackTrace();
}
Edited: here is the main function showing how I use webClient:
public static void main(String[] args) throws Exception {
// Create and initialize WebClient object
WebClient webClient = new WebClient(/*BrowserVersion.CHROME_16*/);
webClient.setThrowExceptionOnScriptError(false);
webClient.setJavaScriptEnabled(false);
webClient.setCssEnabled(false);
webClient.getCookieManager().setCookiesEnabled(true);
setCredentials(webClient);
HtmlPage subj_page = null;
//visit login page and get it
String url = "http://www.website.com/index.php";
HtmlPage page = (HtmlPage) webClient.getPage(url);
HtmlAnchor anchor = null;
page = logIn(page);
// search for content
page = searchPage(page, "recent articles");
// click on the paper link
anchor = (HtmlAnchor) page.getAnchorByText("recent articles");
page = (HtmlPage) anchor.click();
// loop through found articles
//{{{page
int curr_pg = 1;
int last_pg = 5;
page = webClient.getPage(<starting URL of the first article>); // such URLs look like: "www.website.com/view_articles.php?publication_id=17&page=1"
do {
// find sections on this page;
List <HtmlDivision> sections = new ArrayList<HtmlDivision>();
List <HtmlDivision> artdivs = new ArrayList<HtmlDivision>();
List <HtmlDivision> tagdivs = new ArrayList<HtmlDivision>();
sections = (List<HtmlDivision>) page.getByXPath("//div[#class='article_section']");
artdivs = (List<HtmlDivision>) page.getByXPath("//div[#class='article_head']");
tagdivs = (List<HtmlDivision>) page.getByXPath("//div[#class='article_tag']");
int num_ques = sections.size();
HtmlDivision section, artdiv, tagdiv;
// for every section, get its sub-articles
for (int i = 0; i < num_ques; i++) {
section = sections.get(i);
artdiv = artdivs.get(i);
tagdiv = tagdivs.get(i);
// find the sub-article details and print to xml file
String xml = getXMLArticle(artdiv, section.asText(), tagdiv);
System.out.println(xml);
System.out.println("-----------------------------");
}
//remove IllegalMonitorStateException *
synchronized (webClient) {
webClient.wait(2000); // wait for 2 seconds
}
String href = "?publication_id=17&page=" + curr_pg;
anchor = page.getAnchorByHref(href);
page = anchor.click();
System.out.println("anchor val: " + anchor.getHrefAttribute());
curr_pg++;
} while (curr_pg < last_pg);
//}}}page
webClient.closeAllWindows();
}
Other info: I do not have info about the remote site server's authentication mechanism since I have no access to it, but your help would be great. Thank you!

HtmlUnit can't retrieve page after downloading a file

I'm having this weird problem with HtmlUnit in Java. I am using it to download some data from a website, the process is something like this:
1 - Login
2 - For each element (cars)
----- 3 Search for car
----- 4 Download zip file from a link
The code:
Creation of the webclient:
webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
webClient.setJavaScriptEnabled(true);
webClient.setThrowExceptionOnScriptError(false);
DefaultCredentialsProvider provider = new DefaultCredentialsProvider();
provider.addCredentials(USERNAME, PASSWORD);
webClient.setCredentialsProvider(provider);
webClient.setRefreshHandler(new ImmediateRefreshHandler());
Login:
public void login() throws IOException
{
page = (HtmlPage) webClient.getPage(URL);
HtmlForm form = page.getFormByName("formLogin");
String user = USERNAME;
String password = PASSWORD;
// Enter login and password
form.getInputByName("LoginSteps$UserName").setValueAttribute(user);
form.getInputByName("LoginSteps$Password").setValueAttribute(password);
// Click Login Button
page = (HtmlPage) form.getInputByName("LoginSteps$LoginButton").click();
webClient.waitForBackgroundJavaScript(3000);
// Click on Campa area
HtmlAnchor link = (HtmlAnchor) page.getElementById("ctl00_linkCampaNoiH");
page = (HtmlPage) link.click();
webClient.waitForBackgroundJavaScript(3000);
System.out.println(page.asText());
}
Search for car in website:
private void searchCar(String _regNumber) throws IOException
{
// Open search window
page = page.getElementById("search_gridCampaNoi").click();
webClient.waitForBackgroundJavaScript(3000);
// Write plate number
HtmlInput element = (HtmlInput) page.getElementById("jqg1");
element.setValueAttribute(_regNumber);
webClient.waitForBackgroundJavaScript(3000);
// Click on search
HtmlAnchor anchor = (HtmlAnchor) page.getByXPath("//*[#id=\"fbox_gridCampaNoi_search\"]").get(0);
page = anchor.click();
webClient.waitForBackgroundJavaScript(3000);
System.out.println(page.asText());
}
Download pdf:
try
{
InputStream is = _link.click().getWebResponse().getContentAsStream();
File path = new File(new File(DOWNLOAD_PATH), _regNumber);
if (!path.exists())
{
path.mkdir();
}
writeToFile(is, new File(path, _regNumber + "_pdfs.zip"));
}
catch (Exception e)
{
e.printStackTrace();
}
}
The problem:
The first car works okay, pdf is downloaded, but as soon as I search for a new car, when I get to this line:
page = page.getElementById("search_gridCampaNoi").click();
I get this exception:
Exception in thread "main" java.lang.ClassCastException: com.gargoylesoftware.htmlunit.UnexpectedPage cannot be cast to com.gargoylesoftware.htmlunit.html.HtmlPage
After debugging, I've realized that the moment I make this call:
InputStream is = _link.click().getWebResponse().getContentAsStream();
the return type of page.getElementById("search_gridCampaNoi").click() changes from HtmlPage to WebResponse, so instead of receiving a new page, I'm receiving again the file that I already downloaded.
A couple of screenshots of the debugger showing this situation:
First call, return type OK:
Second call, return type changed and I no longer receive a HtmlPage:
Thanks in advance!
Just in case someone encounters the same problem, I found a workaround.Changing the line:
InputStream is = _link.click().getWebResponse().getContentAsStream();
to
InputStream is = _link.openLinkInNewWindow().getWebResponse().getContentAsStream();
seems to do the trick. Im having problems now when doing several iterations, sometimes it works, sometimes it doesn't but at least I have something now.

Password hacking

I have two files, a list of usernames and a list of passwords. I need to write a program to check each user name with the list of passwords. Then I need to go to a website and see if it logs in. I am not very sure how to go about the comparing and how to simulate the program to log in the website enter the information. Could you please help me out with this? It's a homework problem.
Regardless of the language you choose to implement this in, the basic idea is to simulate log-ins programatically. This can be done by logging in manually and looking at the HTTP headers, then sending "forged" headers programatically, changing the user/password fields.
Most log-ins will use POST and making a POST is not entirely straightforward. If you are allowed to use external libraries, you can try cURL. Simply set the appropriate headers and look at the response to check if your attempt was successful or not. If not, try again with a new combination.
In pseudo code:
bool simulate_login(user, password) :
request = new request(url)
request.set_method('POST')
request.set_header('name', user)
request.set_header('pass', password)
response = request.fetch_reponse()
return response.contains("Login successful")
success = []
foreach user:
foreach password:
if (simulate_login(user, password)):
success.append((user, password))
break
If you would like to use java you can try with HtmlUnit (see: http://htmlunit.sourceforge.net/gettingStarted.html) or if you are allowed Groovy you can go with http://www.gebish.org/
Here is the example from getting started guide that is relevant to your case:
public void login() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("username");
// Do similar for password and that's all
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
webClient.closeAllWindows();
}
If you would like to use java you can try with HtmlUnit (see: http://htmlunit.sourceforge.net/gettingStarted.html) or if you are allowed Groovy you can go with http://www.gebish.org/
Here is the example from getting started guide that is relevant to your case:
public void login() throws Exception {
final WebClient webClient = new WebClient();
// Get the first page
final HtmlPage page1 = webClient.getPage("http://some_url");
// Get the form that we are dealing with and within that form,
// find the submit button and the field that we want to change.
final HtmlForm form = page1.getFormByName("myform");
final HtmlSubmitInput button = form.getInputByName("submitbutton");
final HtmlTextInput textField = form.getInputByName("userid");
// Change the value of the text field
textField.setValueAttribute("username");
// Do similar for password and that's all
// Now submit the form by clicking the button and get back the second page.
final HtmlPage page2 = button.click();
List item

Categories