HtmlUnit unable to find element - java

I am using HtmlUnit to spy a webpage, but it seems like it is unable to get the elements in the main content. I suspect it is because the page is rendered using Vue.js.
This is the page I am spying, I want to get the contents inside <div id="app">
This is the output when I print the page using page.asXml(). The <div id="app"> is empty.
This is the WebClient code I am using, I have enabled JavaScript.
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
WebClient webClient = new WebClient();
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.setJavaScriptErrorListener(new SilenceJavaScriptErrorListner());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
This is the code inside a function where I wait for a certain element inside <div id="app"> to exist before returning. I have used method waitForBackgroundJavaScript() also.
HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
webClient.waitForBackgroundJavaScript(10000);
for (int i = 0; i < 10; i++) {
page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
webClient.waitForBackgroundJavaScript(10000);
log.info("Current page \n" + page.asXml());
List<Object> quoteNumberOptionList = page.getByXPath("someXPath");
if (quoteNumberOptionList.size() > 0) {
break;
}
Thread.sleep(5000);
}

Since you mentioned in the comments above that you can't share the URL (and it likely isn't publicly accessible anyway) I've done a bit of a write up here that may help you Parsing web javascript content to string using android

Related

Click a Html Button using HtmlUnit in Java

I am currently developing a system. I need to click a website button using Java. So I am using HtmlUnit Library. There is a website called https://tempmail.ninja/ that generates temporary emails. What I need is program to click the Generate button in tempmail.ninja and it generates a temporary email. But the problem is it is not clicking. Not generating the email.
Here is my code What I have tried,
try
{
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getCookieManager().setCookiesEnabled(true);
HtmlPage page = webClient.getPage("https://tempmail.ninja");
//Here is button id. Instead of this I used HtmlAnchor, HtmlSubmitInput and etc.
//But any of those didn't work
HtmlButton htmlButton = page.getHtmlElementById("generaEmailTemporal");
htmlButton.click();
webClient.waitForBackgroundJavaScript(5000 * 2);
//Print the generated email but Currently nothing display
HtmlTextInput htmlTextInput = (HtmlTextInput) page.getElementById("emailtemporal");
System.out.println(htmlTextInput.getText());
webClient.close();
} catch(ElementNotFoundException | FailingHttpStatusCodeException | IOException ex) {
Logger.getLogger(WebTesting.class.getName()).log(Level.SEVERE, null, ex);
}
Here is the HTML code for the button. I get this using Inspect Element.
<p class="text-center" id="btnGeneraEmailTemporal">
<button class="btn btn-labeled btn-primary" id="generaEmailTemporal" type="button">
<span class="btn-label"><i class="fa fa-hand-o-right" aria-hidden="true"></i></span> Generate Temp Mail
</button>
</p>
I am newbie to HtmlUnit. So Can anybody help me? I really appreciate that.
Thanks for your Help.
The .click() returns the changed page. So you should o something like this:
HtmlButton htmlButton = page.getHtmlElementById("generaEmailTemporal");
HtmlPage pageAfterClick = (HtmlPage)htmlButton.click();
webClient.waitForBackgroundJavaScript(5000 * 2);
System.out.println(pageAfterClick.asXml()); // often displaying the page-source is more useful during development than .asText()
since waitForBackgroundJavaScript(..) is experimental and does not always work, I prefer polling until an expected text appears.
private static final int AJAX_MAX_TRIES_SECONDS = 30;
private static final int ONE_SEC_IN_MILLISEC = 1000;
/** Waits until the given 'text' appeared or throws an
* WaitingForAjaxTimeoutException if the 'text' does not appear before we timeout.
* #param page
* #param text The text which indicates that ajax has finished updating the page
* #param waitingLogMessage Text for the log-output. Should indicate where in the code we are, and what are we waiting for
* #throws WaitingForAjaxTimeoutException
*/
public static void waitForAjaxCallWaitUntilTextAppears(//
#Nonnull final HtmlPage page, //
#Nonnull final String text, //
#Nonnull final String waitingLogMessage) {
LOGGER.debug("_5fd3fc9247_ waiting for ajax call to complete ... [" + waitingLogMessage + "]");
final StringBuilder waitingdots = new StringBuilder(" ");
for (int i = 0; i < AJAX_MAX_TRIES_SECONDS; i++) {
if (page.asText().contains(text)) {
waitingdots.append(" ajax has finished ['").append(text).append("' appeared]");
LOGGER.debug("_8cd5a34faf_ " + waitingdots);
return;
}
waitingdots.append('.');
final long startTime = System.currentTimeMillis();
while (System.currentTimeMillis() - startTime < ONE_SEC_IN_MILLISEC) {
try {
o.wait(ONE_SEC_IN_MILLISEC);
}
catch (final InterruptedException e) {
// ignore
}
}
}
LOGGER.debug("_de5091bc9e_ "
+ waitingdots.append(" ajax timeout ['").append(text).append("' appeared NOT]").toString());
LOGGER.debug("_f1030addf1_ page source:\n" + page.asXml());
throw new RuntimeException("_ec3df4f228_");
}
I have solved this problem by using Selenium Web Driver. Used selenium to click the button and generate the email and pass it to the HtmlUnit.

How to call post method after setting the value of the form for screen scraping using java

Background : I have a webpage (.aspx) which have few dropdown lists.The list value is getting populated using Ajax call based on the selection of previous dropdown. After selecting the value of all drop down lists we can click on download button and the data will be downloaded based on the downloaded data we need to perform some other operations.
what i already did: I am able to set the drop down data via calling the ajax correctly but sending a post request is a problem. Here is the code snippet/pseudo Code.
Feel free to use any tool along with java
public static void main(String[] args) throws FailingHttpStatusCodeException, IOException {
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
WebRequest request = new WebRequest(new URL(DataDownloader.MY_URL),HttpMethod.POST);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setJavaScriptTimeout(10000);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setTimeout(10000);
HtmlPage page = webClient.getPage(request);
HtmlSelect firstDd = (HtmlSelect) page.getElementById("dd1_id");
List<HtmlOption> firstOption = firstDd.getOptions();
firstDd.setSelectedAttribute(firstOption.get(2), true);
webClient.waitForBackgroundJavaScript(3000);
HtmlPage pgAfterFirstDd = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
HtmlSelect secondDd = (HtmlSelect) pgAfterFirstDd.getElementById("dd2_id");
List<HtmlOption> secondOption = secondDd.getOptions();
secondDd.setSelectedAttribute(secondOption.get(2), true);
webClient.waitForBackgroundJavaScript(10000);
//set the value for all other dropdowns
HtmlPage finalpage = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
HtmlForm form = finalpage.getHtmlElementById("aspnetForm");
webClient.waitForBackgroundJavaScript(10000);
request.setRequestBody("REQUESTBODY");
Page redirectPage = webClient.getPage(request);
// HtmlSubmitInput submitInput=form.getInputByName("btnSubmit");
// submitInput.click();
/*HtmlButton submitButton = (HtmlButton) pageAfterWard.createElement("btnSubmit");
submitButton.setAttribute("type", "submit");
form.appendChild(submitButton);
HtmlPage nextPage = (HtmlPage) submitButton.click();*/
}
Why you hide your error details? Is there any secret? If you like helpful answers you have to provide as many information as possible.
So i do a wild guess...
submitInput.click();
will return a PDF. In this case you have to do something like
Page pdfPage = submitInput.click();
WebResponse resp = pdfPage.getWebResponse();
if("application/pdf".equals(resp.getContentType())) {
.... process the bytes
.... resp.getContentAsStream()
}
HtmlUnit has four kind of pages HtmlPage/XmlPage/TextPage and UnexpectedPage. Binary content like PDF or office documents are handled as UnexpectedPage. Processing this content is up to you.
as you mentioned in the comment under RBRi's Answer that you were getting the typecast error.can you please mention
what the exact error you were getting
what type of file/response you were expecting.
Because the code looks good to me and it should work perfectly..
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17);
Looks like you are using an old version, please use the latest one.
WebRequest request = new WebRequest(new URL(DataDownloader.MY_URL),HttpMethod.POST);
With HtmlUnit you usually do not work with requests. The idea is to work more 'browser like'. Use something like getPage(final URL url).
List<HtmlOption> firstOption = firstDd.getOptions();
firstDd.setSelectedAttribute(firstOption.get(2), true);
Do your work more 'browser like'
firstOption.get(2)setSelected(true);
This will do all the background work for you like deselection the other options and event processing for you.
Regarding submitting the form your idea of
HtmlSubmitInput submitInput=form.getInputByName("btnSubmit");
HtmlPage nextPage = submitInput.click();
looks correct. Maybe your have to wait after that also.
If you still have problems you have to provide the URL you are working with to enable us to reproduce/debug your case.

HTMLUnit Wait For JS Issue

I'm using HTMLUnit in order to put text in a input box and then clicking on a link which is actually a JS call.
The problem comes up when I put text in an input , using inputBox.setValueAttribute("example input"); . In this case , after clicking the button the page does not change at all.
On the other hand , once I delete inputBox.setValueAttribute("example input"); and then click the button, the page content does change and includes an error for empty input.
Below is the code i've used in order to put text in the relevant input and then click the button.
public void addressToBlockPlot(){
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnScriptError(false);
client.setJavaScriptTimeout(10000);
client.getOptions().setJavaScriptEnabled(true);
client.setAjaxController(new NicelyResynchronizingAjaxController());
client.getOptions().setTimeout(10000);
try {
HtmlPage page = client.getPage("http://mapi.gov.il/Pages/LotAddressLocator.aspx");
HtmlInput inputBox = (HtmlInput)page.getHtmlElementById("AddressInput");
final HtmlAnchor a = (HtmlAnchor) page.getElementById("helkaSubmit");
inputBox.setValueAttribute("example input");
a.click();
client.waitForBackgroundJavaScript(2000);
HtmlPage page2= (HtmlPage) client.getCurrentWindow().getEnclosedPage();
System.out.println(page2.asXml());
} catch (Exception e) {
}
}
Any ideas for solving this issue ?
EDIT:
I've tried using inputBox.setValueAttribute(""); , which resulted in receiving the same error that I got when no input text was set at all.
I have almost similar issue, where JS take some time to load, I have solved this issue by setting wait and retries until page load successfully by adding a condition,
int input_length = page.getByXPath("//input").size();
int tries = 5;
while (tries > 0 && input_length < 12) {
tries--;
synchronized (page) {
page.wait(2000);
}
input_length = page.getByXPath("//input").size();
System.out.println("page loaded successfully");
}
here input_length determines that whether required page is loaded or not. you can use similar condition according to structure of your page.

HtmlUnit form refresh with Ajax does not work

I try to fill and submit an HTML form using HtmlUnit. One select element and its options are loaded using <body onLoad="...">.
My problem: I cannot retrieve this select element via getSelectByName, getChildElements etc. (ElementNotFoundException is thrown), although I can see that the data has been loaded when looking at the org.apache.http.wire log.
When printing page.asXml(), I see only the unaltered HTML document.
My code:
public static void main(final String[] args) throws Exception {
final URL url = new URL("http://www.rce-event.de/modules/meldung/annahme.php?oid=471&pid=1&ac=d98482bbf174f62eaaa4664c&tkey=468&portal=www.dachau.de&ortsbox=1&callpopup=1");
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6); // tried also FIREFOX_3
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10000); // tried also Thread.sleep()
// tried also to use webClient.getCurrentWindow().getEnclosedPage() instead of 'page'
final HtmlForm form = page.getFormByName("formular");
// ElementNotFoundException thrown here:
final HtmlSelect select = form.getSelectByName("event.theme");
final HtmlOption option = select.getOptionByText("Sport/Freizeit");
final Page newPage = select.setSelectedAttribute(option, false);
// submit etc.
}
Stacktrace:
Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[select] attributeName=[name] attributeValue=[event.theme]
at com.gargoylesoftware.htmlunit.html.HtmlForm.getSelectByName(HtmlForm.java:449)
at Xyzzy.main(Xyzzy.java:58)
I tried everything written here, here, and here (and even more), without any success.
Update:
I simplified my code and started a bounty.
Your problem is that the select named "event.theme" only gets loaded once the select named "event.datapool" has a value of "1" as selected.
So you need to change the "event.datapool" select value to "1" :
[........]
final HtmlSelect selectBase = form.getSelectByName("event.datapool");
final HtmlOption optionBase = selectBase.getOptionByText("Freizeit / Tourismus");
final Page newPage = selectBase.setSelectedAttribute(optionBase, true);
[........]
But you may have problems because the "HTML" data for the select "event.theme" is loaded via ajax. So I do not think your java "HtmlSelect" class will loaded the select "event.theme" in the form as Javascript does with an actual user interation.
A solution to that would be to:
1. Load your page "http://www.rce-event.de/modules/meldung/annahme.php?oid=471&pid=1&ac=d98482bbf174f62eaaa4664c&tkey=468&portal=www.dachau.de&ortsbox=1&callpopup=1"
2. Load the page "http://www.rce-event.de/modules/meldung/js/xmlhttp_querys.php?get_kat=1&time=1338409551228&id=1&block=kat" > which will return the "event.theme" select data/values
3. Then use the data loaded in step 2 to update the page loaded in step 1 by inserting a "select list with id and name set to <event.theme>" in the HTML element "kat_content"
Then your form/loaded webpage should have the new select named "event.theme" and therfore the following code shouldn't produce errors anymore.
final HtmlSelect select = form.getSelectByName("event.theme");
final HtmlOption option = select.getOptionByText("Sport/Freizeit");
final Page newPage = select.setSelectedAttribute(option, false);

HtmlUnit, trying to get 2 forms, get exception at second form

Currently using HtmlUnit.
Getting first login page is no problem, succesfully logging in, getting next page, "clicking" the link to get the "MyDetails" page.
After getting "MyDetails" page, i want to get the same way as im getting the first login form.
Why i need to get the form is that i want to change the password, and the fields are in a form.
When im trying to get the second form, it gives me exception as follows:
com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[form] attributeName=[name] attributeValue=[form2]
Gives exception at this line of code:
HtmlForm form2 = page3.getFormByName("form2");
Note: the first form name is "form1" & second form name is "form2".
Is this a problem with HtmlUnit?
Code:
try {
WebClient webclient = new WebClient(BrowserVersion.FIREFOX_3_6);
HtmlPage page1 = webclient.getPage("http://www.highveld.mobi/pages/clubvip/login.aspx");
HtmlForm form = page1.getFormByName("form1");
HtmlSubmitInput buttonLogin = form.getInputByName("cmdLogin");
HtmlTextInput cellLogin = form.getInputByName("txtCellNumber");
HtmlPasswordInput passLogin = form.getInputByName("txtLoginPassword");
cellLogin.setValueAttribute(change);
passLogin.setValueAttribute(oldPass);
HtmlPage page2 = buttonLogin.click();
HtmlAnchor link = page2.getAnchorByHref("updateprofile.aspx");
HtmlPage page3 = link.click();
System.out.println(page3.getUrl());
HtmlForm form2 = page3.getFormByName("form2");
HtmlPasswordInput pass = form2.getInputByName("txtPassword");
HtmlPasswordInput passConfirm = form2.getInputByName("txtConfirmPassword");
HtmlSubmitInput button = form2.getInputByName("cmdUpdate");
pass.setValueAttribute(newPass);
passConfirm.setValueAttribute(newPass);
HtmlPage page4 = button.click();
}
First of all, please update to HtmlUnit 2.9 in case you are using an old version.
Secondly, replace this with this:
System.out.println(page3.getUrl());
HtmlForm form2 = page3.getFormByName("form2");
With this:
System.out.println(page3.getUrl());
System.out.println(page3.asXml());
HtmlForm form2 = page3.getFormByName("form2");
And check for the existance of the form2 element, which I'm pretty sure it shouldn't be there as it is throwing a ElementNotFoundException.
I usually use XPath instead of getFormByName, you can give it a try too.
Hope this helps!

Categories