Hey I'm having some trouble clicking a button to submit a comment on a thread using htmlUnit. I get the element by using this
HtmlPage newOne = webClient.getPage(string);
//System.out.println(newOne.asText());
HtmlTextArea area = (HtmlTextArea) newOne.getElementById("ctrl_message_html");
area.focus();
area.type('t');
newOne.getHtmlElementByAccessKey('s').click();
Which throws
com.gargoylesoftware.htmlunit.FailingHttpStatusCodeException: 404 Not Found for http://forums.bukkit.org/threads/plugin-idea-to-get-tons-of-downloads.271185/members/jthort.90885864/post
at com.gargoylesoftware.htmlunit.WebClient.throwFailingHttpStatusCodeExceptionIfNecessary(WebClient.java:514)
at com.gargoylesoftware.htmlunit.WebClient.loadDownloadedResponses(WebClient.java:2067)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.doProcessPostponedActions(JavaScriptEngine.java:717)
at com.gargoylesoftware.htmlunit.javascript.JavaScriptEngine.processPostponedActions(JavaScriptEngine.java:804)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1322)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1265)
at com.gargoylesoftware.htmlunit.html.HtmlElement.click(HtmlElement.java:1213)
at HTMLUnit.main(HTMLUnit.java:87)
When I print out the element, I get this
HtmlSubmitInput[<input type="submit" class="button primary MenuCloser" value="Post" accesskey="s">]
But I don't understand why it would be throwing an exception when it clearly has the right object. I have javascript disabled, along with css. I am using the web browser Chrome.
Edit: By right Object I mean that it's the correct button on the page that I want to click and it's in the form of a HtmlSubmitInput button.
I'm not sure why you think having the right object will mean you won't get any exception. You're not getting a NullPointerException, if that is what you mean by the right object.
Having said that, if you read the exception it is a FailingHttpStatusCodeException with the following message:
404 Not Found for http://forums.bukkit.org/threads/plugin-idea-to-get-tons-of-downloads.271185/members/jthort.90885864/post
So that URL is giving you a 404 error when you perform an HTTP POST on it. You should keep debugging why this is happening. I would enable JavaScript, maybe it is needed there.
Related
I'm attempting to press a Javascript Button on a webpage using HTMLUnit 2.36 which navigates to another page, and so on...:
ScriptResult result = page.executeJavaScript("__doPostBack('LinkBtn_thebutton','')");
Page page = result.getNewPage();
I've attempted to use the code above which causes the following error, supposedly because getNewPage() is no longer supported:
The method GetNewPage() is undefined for type ScriptResult
I've also attempted to add a cast with getJavaScriptResult() as shown below with no luck:
HtmlPage page1 = (HtmlPage) result.getJavaScriptResult();
Causing the following error:
Exception in thread "main" java.lang.ClassCastException: class net.sourceforge.htmlunit.corejs.javascript.Undefined cannot be cast to class com.gargoylesoftware.htmlunit.html.HtmlPage
You are not supposed to cast the result.getJavaScriptResult();, treat it like a void. If your page is going to be redirected, make sure that redirecting is enabled: webClient.getOptions().setRedirectEnabled(true);
I'm trying to go to the next page on an aspx form using JSoup.
I can find the next button itself. I just don't know what to do with it.
The idea is that, for that particular form, if the next button exists, we would simulate a click and go to the next page. But any other solution other than simulating a click would be fine, as long as we get to the next page.
I also need to update the results once we go to the next page.
// Connecting, entering the data and making the first request
...
// Submitting the form
Document searchResults = form.submit().cookies(resp.cookies()).post();
// reading the data. Everything up to this point works as expected
...
// finding the next button (this part also works as expected)
Element nextBtn = searchResults.getElementById("ctl00_MainContent_btnNext");
if (nextBtn != null) {
// click? I don't know what to do here.
searchResults = ??? // updating the search results to include the results from the second page
}
The page itself is www.somePage.com/someForm.aspx, so I can't use the solution stated here:
Android jsoup, how to select item and go to next page
I was unable to find any other suggestions.
Any ideas? What am I missing? Is simulating a click even possible with JSoup? The documentation says nothing about it. But I'm sure people are able to navigate these type of forms.
Also, I'm working with Android, so I can't use HtmlUnit, as stated here:
importing HtmlUnit to Android project
Thank you.
This is not Jsoup work! Jsoup is a parser with a nice DOM API that allows you to deal with wild HTML as if it were well-formed and not crippled with errors and nonsenses.
In your specific case you may be able to scrape the target site directly from your app by finding links and retrieving HTML pages recursively. Something like
private void scrape(String url) {
Document doc = Jsoup.connect(url).get();
// Analyze current document content here...
// Then continue
for (Element link : doc.select(".ctl00_MainContent_btnNext")) {
scrape(link.attr("href"));
}
}
But in the general case what you want to do requires far more functionality that Jsoup provides: a user agent capable of interpreting HTML, CSS and Javascript with a scriptable API that you can call from your app to simulate a click. For example Selenium:
WebDriver driver = new FirefoxDriver();
driver.findElement(By.name("next_page")).click();
Selenium can't be bundled in an Android app, so I suggest you put your Selenium code on a server and make it accessible with some REST API.
Pagination on ASPX can be a pain. The best thing you can do is to use your browser to see the data parameters it sends to the server, then try to emulate this in code.
I've written a detailed tutorial on how to handle it here but it uses the univocity HTML parser (which is commercial closed source) instead of JSoup.
In short, you should try to get a <form> element with id="aspnetForm", and read the form elements to generate a POST request for the next page. The form data usually comes out with stuff such as this:
__EVENTTARGET =
__EVENTARGUMENT =
__VIEWSTATE = /wEPDwUKMTU0OTkzNjExNg8WBB4JU29ydE9yZ ... a very long string
__VIEWSTATEGENERATOR = 32423F7A
... and other gibberish
Then you need to look at each one of these and compare with what your browser sends. Sometimes you need to get values from other elements of the page to generate a similar POST request. You may have to REMOVE some of the parameters you get - again, make your code behave exactly the same as your browser
After some (frustrating) trial and error you will get it working. The server should return a pipe-delimited result, which you can break down and parse. Something like:
25081|updatePanel|ctl00_ContentPlaceHolder1_pnlgrdSearchResult|
<div>
<div style="font-weight: bold;">
... more stuff
|__EVENTARGUMENT||343908|hiddenField|__VIEWSTATE|/wEPDwU... another very long string ...1Pni|8|hiddenField|__VIEWSTATEGENERATOR|32423F7A| other gibberish
From THAT sort of response you need to generate new POST requests for the subsequent pages, for example:
String viewState = substringBetween(ajaxResponse, "__VIEWSTATE|", "|");
Then:
request.setDataParameter("__VIEWSTATE", viewState);
There are will be more data parameters to get from each response. But a lot depends on the site you are targeting.
Hope this helps a little.
I need to test(Selenium) if the links in a given page is valid or not. I found a good post about it here
http://ardesco.lazerycode.com/index.php/2012/07/how-to-download-files-with-selenium-and-why-you-shouldnt/
But the problem is, what if a error page redirects to a custom error page? Then i would get a 200 or a 302 instead of a 404. How should I go about checking the validity of URLs for webpages that redirect their 404s.
You should utilize an assertion of an element on the page with a known specific test. Use a specific reusable function for this.
Then when you hit a page call the function as a check. If you find the specific element present then click the browser back button after recording the URL. If not you can continue your test as desired. There is another post in regards to recursively finding all links and testing them. How to browse a whole website using selenium?
if (checkError()) //calls specific check for the error on the custom error page
{
//Log URL
string badURL = driver.Url();
//Save somewhere in a list for output later...
//navigate to previous page
driver.navigate().Back();
}
Can you use Jsoup to submit a search to Google, but instead of sending your request via "Google Search" use "I'm Feeling Lucky"? I would like to capture the name of the site that would be returned.
I see lots of examples of submitting forms, but never a way to specify a specific button to perform the search or form submission.
If Jsoup won't work, what would?
According to the HTML source of http://google.com the "I am feeling lucky" button has a name of btnI:
<input value="I'm Feeling Lucky" name="btnI" type="submit" onclick="..." />
So, just adding the btnI parameter to the query string should do (the value doesn't matter):
http://www.google.com/search?hl=en&btnI=1&q=your+search+term
So, this Jsoup should do:
String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).get();
System.out.println(document.title());
However, this gave a 403 (Forbidden) error.
Exception in thread "main" java.io.IOException: 403 error loading URL http://www.google.com/search?hl=en&btnI=1&q=balusc
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:387)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at test.Test.main(Test.java:17)
Perhaps Google was sniffing the user agent and discovering it to be Java. So, I changed it:
String url = "http://www.google.com/search?hl=en&btnI=1&q=balusc";
Document document = Jsoup.connect(url).userAgent("Mozilla").get();
System.out.println(document.title());
This yields (as expected):
The BalusC Code
The 403 is however an indication that Google isn't necessarily happy with bots like that. You might get (temporarily) IP-banned when you do this too often.
I'd try HtmlUnit for navigating trough a site, and JSOUP for scraping
Yes it can, if you are able to figure out how Google search queries are made. But this is not allowed by Google, even if you would success with that. You should use their official API to make automated search queries.
http://code.google.com/intl/en-US/apis/customsearch/v1/overview.html
I m trying to access JavaScript function from Servlet code. But I'm getting the error shown below.
Here is the code:
out.println("<FRAME src=\"javascript:parent.newWindow('" + URL+ "') \" scrolling=No noresize />");
And this is the error that occurs in JavaScript:
Object does not support this property or method;
You can't access a Javascript function from your servlet code. Javascript executes on the client (= your user's browser) and the servlet code executes on your server (for example Tomcat, JBoss, whatever you're using).
What are you trying to accomplish with your code? I'm sure there's a simpler way to do it than what you just described.
[edited]
I see you just updated your description, so here's my view:
I'm guessing that you want to display a page to the user and when the page is displayed, you want to open a new window which will display another page using the URL parameter to point its address. If this is the case, you should probably just do this in the first page's onLoad() Javascript event using window.open().
There is no newWindow property on a window object (which is what parent references), so this is not unexpected.
Maybe you are looking for the open method instead?
If so, then:
Putting it as the src of an iframe is a very strange thing to do
It will probably be zapped by pop-up blockers
Ok. You try to generate javascript code inside Servlet code. When you do, your code goes to Web browser and it's seen there as a html document with javascript inside. So, your error rather comes from web browser and links to javascript error. Probably it's newWindow method. To open new window you should call window.open() function, I guess.