If you go to
https://taxtest.navajocountyaz.gov/Pages/WebForm1.aspx?p=1&apn=205-27-014
view page source and search for grdCPhist in the page source, you won't find it.
But, if you click on Taxes, then click on the CP in the 7th column of the 5th row, THEN view page source, you will find a grdCPhist. There is a table with id="grdCPhist".
I want to access that table from Java code using HtmlUnit.
For that, I developed the program below:
import com.gargoylesoftware.htmlunit.*;
import com.gargoylesoftware.htmlunit.html.*;
import com.gargoylesoftware.htmlunit.javascript.*;
import java.io.*;
public class ClickOnCell {
public static void ClickOnCell () {
try (final WebClient webClient = new WebClient()) {
System.getProperties().put("org.apache.commons.logging.simplelog.defaultlog", "fatal");
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setCssEnabled(false);
webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
HtmlPage page = webClient.getPage("http://taxtest.navajocountyaz.gov/Pages/WebForm1.aspx?p=1&apn=205-27-014");
webClient.waitForBackgroundJavaScriptStartingBefore(10000);
page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener());
HtmlTable grdTaxHistory = (HtmlTable) page.getElementById("grdTaxHistory");
HtmlTableDataCell cpCell = (HtmlTableDataCell) grdTaxHistory.getCellAt(4,6);
System.out.println("cpCell.getTextContent() = " + cpCell.getTextContent());
cpCell.click();
webClient.waitForBackgroundJavaScriptStartingBefore(1000000000);
page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
HtmlTable grdCPHistory = (HtmlTable) page.getElementById("grdCPhist");
System.out.println("grdCPHistory = " + grdCPHistory);
}
catch (Exception e) {
System.out.println("Error: "+ e);
}
}
public static void main(String[] args) {
File file = new File("validParcelIDs.txt");
ClickOnCell();
}
}
I compiled and ran the program using the following two commands:
javac -classpath ".:/opt/htmlunit_2.69.0/*" ClickOnCell.java
java -classpath ".:/opt/htmlunit_2.69.0/*" ClickOnCell
The program compiled fine. No errors or warnings. However, when I ran the program, I got the following output to the screen:
WARNING: Obsolete content type encountered: 'text/javascript'.
Jan 13, 2023 5:51:29 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jan 13, 2023 5:51:29 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jan 13, 2023 5:51:30 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jan 13, 2023 5:51:30 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Jan 13, 2023 5:51:30 PM com.gargoylesoftware.htmlunit.IncorrectnessListenerImpl notify
WARNING: Obsolete content type encountered: 'text/javascript'.
cpCell.getTextContent() =
CP
grdCPHistory = null
What I am unhappy about in the above is the grdCPHistory table being equal to null. That tells me that HtmlUnit was unable to find the table with id="grdCPhist", as if I hadn't put cpCell.click(); in the code.
How do I change the above code to be able to access, from my Java program, the table with id="grdCPhist"?
Since StackOverflow doesn't allow me to thank you after you suggest to me what to try, thanks in advance.
Clicking on the cell has no effect because the action is triggered from the including anchor.
<td align="right">
<a onclick="$('#');" id="grdTaxHistory_lnkViewPayments_4" href="javascript:__doPostBack('grdTaxHistory$ctl06$lnkViewPayments','')">$2948.15</a>
</td>
You have to select the anchor and click on it.
String url = "http://taxtest.navajocountyaz.gov/Pages/WebForm1.aspx?p=1&apn=205-27-014";
try (final WebClient webClient = new WebClient(BrowserVersion.FIREFOX)) {
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setCssEnabled(false);
// don't disable logging if your are hunting for problems
webClient.setJavaScriptErrorListener(new SilentJavaScriptErrorListener());
HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScriptStartingBefore(1_000);
page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
// System.out.println("-----------------------------------------------------");
// System.out.println(page.asNormalizedText());
// System.out.println("-----------------------------------------------------");
// no need to click - the table is already there, only not visible
// page.getAnchorByText("Taxes").click();
// webClient.waitForBackgroundJavaScriptStartingBefore(1_000);
// page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
// System.out.println("-----------------------------------------------------");
// System.out.println(page.asNormalizedText());
// System.out.println("-----------------------------------------------------");
HtmlTable grdTaxHistory = (HtmlTable) page.getElementById("grdTaxHistory");
HtmlTableDataCell cpCell = (HtmlTableDataCell) grdTaxHistory.getCellAt(4,6);
// System.out.println("cpCell.getTextContent() = " + cpCell.getTextContent().trim());
// the action is triggered by the anchor inside the cell
// todo this is a hack - make finding the enclosing anchor more robust
((HtmlAnchor) cpCell.getFirstChild().getNextSibling()).click();
webClient.waitForBackgroundJavaScriptStartingBefore(1_000);
page = (HtmlPage) page.getEnclosingWindow().getEnclosedPage();
HtmlTable grdCPHistory = (HtmlTable) page.getElementById("grdCPhist");
System.out.println("-----------------------------------------------------");
System.out.println(grdCPHistory.asNormalizedText());
System.out.println("-----------------------------------------------------");
}
Related
I am using HtmlUnit to spy a webpage, but it seems like it is unable to get the elements in the main content. I suspect it is because the page is rendered using Vue.js.
This is the page I am spying, I want to get the contents inside <div id="app">
This is the output when I print the page using page.asXml(). The <div id="app"> is empty.
This is the WebClient code I am using, I have enabled JavaScript.
import com.gargoylesoftware.htmlunit.WebClient;
import com.gargoylesoftware.htmlunit.html.HtmlPage;
WebClient webClient = new WebClient();
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.setJavaScriptErrorListener(new SilenceJavaScriptErrorListner());
webClient.setCssErrorHandler(new SilentCssErrorHandler());
This is the code inside a function where I wait for a certain element inside <div id="app"> to exist before returning. I have used method waitForBackgroundJavaScript() also.
HtmlPage page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
webClient.waitForBackgroundJavaScript(10000);
for (int i = 0; i < 10; i++) {
page = (HtmlPage) webClient.getCurrentWindow().getEnclosedPage();
webClient.waitForBackgroundJavaScript(10000);
log.info("Current page \n" + page.asXml());
List<Object> quoteNumberOptionList = page.getByXPath("someXPath");
if (quoteNumberOptionList.size() > 0) {
break;
}
Thread.sleep(5000);
}
Since you mentioned in the comments above that you can't share the URL (and it likely isn't publicly accessible anyway) I've done a bit of a write up here that may help you Parsing web javascript content to string using android
I am currently developing a system. I need to click a website button using Java. So I am using HtmlUnit Library. There is a website called https://tempmail.ninja/ that generates temporary emails. What I need is program to click the Generate button in tempmail.ninja and it generates a temporary email. But the problem is it is not clicking. Not generating the email.
Here is my code What I have tried,
try
{
WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.getOptions().setCssEnabled(true);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getCookieManager().setCookiesEnabled(true);
HtmlPage page = webClient.getPage("https://tempmail.ninja");
//Here is button id. Instead of this I used HtmlAnchor, HtmlSubmitInput and etc.
//But any of those didn't work
HtmlButton htmlButton = page.getHtmlElementById("generaEmailTemporal");
htmlButton.click();
webClient.waitForBackgroundJavaScript(5000 * 2);
//Print the generated email but Currently nothing display
HtmlTextInput htmlTextInput = (HtmlTextInput) page.getElementById("emailtemporal");
System.out.println(htmlTextInput.getText());
webClient.close();
} catch(ElementNotFoundException | FailingHttpStatusCodeException | IOException ex) {
Logger.getLogger(WebTesting.class.getName()).log(Level.SEVERE, null, ex);
}
Here is the HTML code for the button. I get this using Inspect Element.
<p class="text-center" id="btnGeneraEmailTemporal">
<button class="btn btn-labeled btn-primary" id="generaEmailTemporal" type="button">
<span class="btn-label"><i class="fa fa-hand-o-right" aria-hidden="true"></i></span> Generate Temp Mail
</button>
</p>
I am newbie to HtmlUnit. So Can anybody help me? I really appreciate that.
Thanks for your Help.
The .click() returns the changed page. So you should o something like this:
HtmlButton htmlButton = page.getHtmlElementById("generaEmailTemporal");
HtmlPage pageAfterClick = (HtmlPage)htmlButton.click();
webClient.waitForBackgroundJavaScript(5000 * 2);
System.out.println(pageAfterClick.asXml()); // often displaying the page-source is more useful during development than .asText()
since waitForBackgroundJavaScript(..) is experimental and does not always work, I prefer polling until an expected text appears.
private static final int AJAX_MAX_TRIES_SECONDS = 30;
private static final int ONE_SEC_IN_MILLISEC = 1000;
/** Waits until the given 'text' appeared or throws an
* WaitingForAjaxTimeoutException if the 'text' does not appear before we timeout.
* #param page
* #param text The text which indicates that ajax has finished updating the page
* #param waitingLogMessage Text for the log-output. Should indicate where in the code we are, and what are we waiting for
* #throws WaitingForAjaxTimeoutException
*/
public static void waitForAjaxCallWaitUntilTextAppears(//
#Nonnull final HtmlPage page, //
#Nonnull final String text, //
#Nonnull final String waitingLogMessage) {
LOGGER.debug("_5fd3fc9247_ waiting for ajax call to complete ... [" + waitingLogMessage + "]");
final StringBuilder waitingdots = new StringBuilder(" ");
for (int i = 0; i < AJAX_MAX_TRIES_SECONDS; i++) {
if (page.asText().contains(text)) {
waitingdots.append(" ajax has finished ['").append(text).append("' appeared]");
LOGGER.debug("_8cd5a34faf_ " + waitingdots);
return;
}
waitingdots.append('.');
final long startTime = System.currentTimeMillis();
while (System.currentTimeMillis() - startTime < ONE_SEC_IN_MILLISEC) {
try {
o.wait(ONE_SEC_IN_MILLISEC);
}
catch (final InterruptedException e) {
// ignore
}
}
}
LOGGER.debug("_de5091bc9e_ "
+ waitingdots.append(" ajax timeout ['").append(text).append("' appeared NOT]").toString());
LOGGER.debug("_f1030addf1_ page source:\n" + page.asXml());
throw new RuntimeException("_ec3df4f228_");
}
I have solved this problem by using Selenium Web Driver. Used selenium to click the button and generate the email and pass it to the HtmlUnit.
I am writing selenium code to check the contents of a td elements that matches a specific date.
Here is my code:
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver",
"C://chromedriver.exe");
WebDriver driver = new ChromeDriver();
String baseUrl = "http://espn.go.com/nba";
String endPoint = "/team/schedule/_/name/chi/year/2015/chicago-bulls";
// launch browser and direct it to the Base URL
driver.get(baseUrl + endPoint);
WebElement table_element = driver.findElement(By.cssSelector(".mod-container.mod-table.mod-no-header-footer"));
List<WebElement> tr_collection = table_element.findElements(By.xpath("div/table/tbody/tr"));
for (WebElement trElement : tr_collection) {
List<WebElement> td_collection = trElement.findElements(By.xpath("td"));
if (td_collection.get(0).getText().equals("Wed, Oct 29")) {
WebElement tdElement = td_collection.get(2);
System.out.println(tdElement.getText().split("\\n")[1]);
break;
}
}
// close browser
driver.close();
}
Here I am getting all the rows of table and then checking if the first column has value "Wed, Oct 29", if it matches then fetching the 3rd column value.
Is there a way to directly look for a td element whose value matches "Wed, Oct 29"? or a simplified logic for this?
To get the 3rd element, I suppose is the result column, you can use this Xpath:
.//td[contains(text(),'Wed, Oct 29')]/following-sibling::td[2]
The condition is the text in the td to obtain a brother of him, in this case the second brother or the 3rd column.
With this Xpaht you obtain the WebElement and after that you could use the getText function. This function should return the node text and all the children text too.
I'm using HTMLUnit in order to put text in a input box and then clicking on a link which is actually a JS call.
The problem comes up when I put text in an input , using inputBox.setValueAttribute("example input"); . In this case , after clicking the button the page does not change at all.
On the other hand , once I delete inputBox.setValueAttribute("example input"); and then click the button, the page content does change and includes an error for empty input.
Below is the code i've used in order to put text in the relevant input and then click the button.
public void addressToBlockPlot(){
WebClient client = new WebClient(BrowserVersion.FIREFOX_24);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnScriptError(false);
client.setJavaScriptTimeout(10000);
client.getOptions().setJavaScriptEnabled(true);
client.setAjaxController(new NicelyResynchronizingAjaxController());
client.getOptions().setTimeout(10000);
try {
HtmlPage page = client.getPage("http://mapi.gov.il/Pages/LotAddressLocator.aspx");
HtmlInput inputBox = (HtmlInput)page.getHtmlElementById("AddressInput");
final HtmlAnchor a = (HtmlAnchor) page.getElementById("helkaSubmit");
inputBox.setValueAttribute("example input");
a.click();
client.waitForBackgroundJavaScript(2000);
HtmlPage page2= (HtmlPage) client.getCurrentWindow().getEnclosedPage();
System.out.println(page2.asXml());
} catch (Exception e) {
}
}
Any ideas for solving this issue ?
EDIT:
I've tried using inputBox.setValueAttribute(""); , which resulted in receiving the same error that I got when no input text was set at all.
I have almost similar issue, where JS take some time to load, I have solved this issue by setting wait and retries until page load successfully by adding a condition,
int input_length = page.getByXPath("//input").size();
int tries = 5;
while (tries > 0 && input_length < 12) {
tries--;
synchronized (page) {
page.wait(2000);
}
input_length = page.getByXPath("//input").size();
System.out.println("page loaded successfully");
}
here input_length determines that whether required page is loaded or not. you can use similar condition according to structure of your page.
I try to fill and submit an HTML form using HtmlUnit. One select element and its options are loaded using <body onLoad="...">.
My problem: I cannot retrieve this select element via getSelectByName, getChildElements etc. (ElementNotFoundException is thrown), although I can see that the data has been loaded when looking at the org.apache.http.wire log.
When printing page.asXml(), I see only the unaltered HTML document.
My code:
public static void main(final String[] args) throws Exception {
final URL url = new URL("http://www.rce-event.de/modules/meldung/annahme.php?oid=471&pid=1&ac=d98482bbf174f62eaaa4664c&tkey=468&portal=www.dachau.de&ortsbox=1&callpopup=1");
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6); // tried also FIREFOX_3
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
final HtmlPage page = webClient.getPage(url);
webClient.waitForBackgroundJavaScript(10000); // tried also Thread.sleep()
// tried also to use webClient.getCurrentWindow().getEnclosedPage() instead of 'page'
final HtmlForm form = page.getFormByName("formular");
// ElementNotFoundException thrown here:
final HtmlSelect select = form.getSelectByName("event.theme");
final HtmlOption option = select.getOptionByText("Sport/Freizeit");
final Page newPage = select.setSelectedAttribute(option, false);
// submit etc.
}
Stacktrace:
Exception in thread "main" com.gargoylesoftware.htmlunit.ElementNotFoundException: elementName=[select] attributeName=[name] attributeValue=[event.theme]
at com.gargoylesoftware.htmlunit.html.HtmlForm.getSelectByName(HtmlForm.java:449)
at Xyzzy.main(Xyzzy.java:58)
I tried everything written here, here, and here (and even more), without any success.
Update:
I simplified my code and started a bounty.
Your problem is that the select named "event.theme" only gets loaded once the select named "event.datapool" has a value of "1" as selected.
So you need to change the "event.datapool" select value to "1" :
[........]
final HtmlSelect selectBase = form.getSelectByName("event.datapool");
final HtmlOption optionBase = selectBase.getOptionByText("Freizeit / Tourismus");
final Page newPage = selectBase.setSelectedAttribute(optionBase, true);
[........]
But you may have problems because the "HTML" data for the select "event.theme" is loaded via ajax. So I do not think your java "HtmlSelect" class will loaded the select "event.theme" in the form as Javascript does with an actual user interation.
A solution to that would be to:
1. Load your page "http://www.rce-event.de/modules/meldung/annahme.php?oid=471&pid=1&ac=d98482bbf174f62eaaa4664c&tkey=468&portal=www.dachau.de&ortsbox=1&callpopup=1"
2. Load the page "http://www.rce-event.de/modules/meldung/js/xmlhttp_querys.php?get_kat=1&time=1338409551228&id=1&block=kat" > which will return the "event.theme" select data/values
3. Then use the data loaded in step 2 to update the page loaded in step 1 by inserting a "select list with id and name set to <event.theme>" in the HTML element "kat_content"
Then your form/loaded webpage should have the new select named "event.theme" and therfore the following code shouldn't produce errors anymore.
final HtmlSelect select = form.getSelectByName("event.theme");
final HtmlOption option = select.getOptionByText("Sport/Freizeit");
final Page newPage = select.setSelectedAttribute(option, false);