I am trying to automate a login page which appears to be using Knockout.js.
HtmlUnit doesnt seem to load the full page, it is missing all the input fields which makes it impossible to actually login.
I have tried ensuring that the JavaScript timeouts are set and have also enabled NicelyResynchronizingAjaxController I am waiting after the page has loaded using:
waitForBackgroundJavaScript,
waitForBackgroundJavaScriptStartingBefore
Thread.sleep (just for
good measure)
I have even checked for additional windows (WebClient.getWebWindows), but there just seems to be the one.
It appears Knockout (assuming it is actually Knockout) is creating the inputs, is this just too much for htmlunit or have I missed something?
This is a know problem (see https://github.com/HtmlUnit/htmlunit/issues/37).
Hopefully i will find some time to figure out what is going wrong here.
Related
I am in the process of writing a program whose purpose is centered around generating custom URLs for intelius.com and then extracting data from them with selenium. I have observed interesting behavior that I am unsure how to address.
My program creates URLs after the following pattern: https://intelius.com/people-search/LASTNAME/CITY-STATE, but I have found that attempting to access these constructed links consistently leads to a timeout error.
For example, http://intelius.com/people-search/Williams/Brooklyn-NY does not load the expected results page
Digging around in the website's source, I have found what appears to be a link validator script — what exactly that means, I do not know — and am unsure how to proceed.
How exactly would I go about authenticating my queries, without programming selenium to manually input the data into the search textbox and to press the submit button? Is my link-construction approach flawed in some blatantly obvious manner? I am a bit lost and would appreciate some direction. Thanks!
I think your problem is using http instead of https, and omitting www from URL. So this works:
https://www.intelius.com/people-search/Williams/Brooklyn-NY
The problem lies in the way the URL being formed. You need to construct and pass the arguments the way the web application understands it. The following works -
https://www.intelius.com/people-search/William-Brooklyn/NY
I started with HtmlUnit recently, had some success scraping some pages and interacting with it, really powerful tool...
But, as far as my knowledge goes, I just retrieved a page with a certain state... My next step is to make HtmlUnit to read the messages from a chat room, constantly, and store/do something when a certain string/regexp matches. I was thinking even about interacting with the chat room.
I'm not sure if HtmlUnit goes that far, I did some research and found something about webDriver, webWindow, etc, maybe I will need to work with Threads to do this....
Can you guys point me in the right direction?
Thank you very much
HtmlUnit tries to simulate as much as possible of real browsers behavior.
If the target website is simple, then HtmlUnit would work. But in some cases, the website is too complex for the current HtmlUnit, you need to isolate a root cause to be fixed.
You can start with WebDriver, and you can easily change the implementation from e.g. ChromeDriver/FirefoxDriver to HtmlUnitDriver with a single line change.
I'm trying to read in the HTML from a webpage and parse information from it using a URLConnection in Java. It works, but the page only loads part of the content, the rest is loaded as the user scrolls down the page. Is there any way for a Java program to trigger this? My program doesn't actually open the webpage in a browser, just a connection to the page. If it's relevant, I can add the URL I'm accessing.
I've been trying to find the answer, and found a few similar topics on here, most of them without answers. However, I eventually made my way to this topic, which sounds like what I need, but I looked at the URLs of the calls being made and they're not always the same, so I can't just type them into the program. I looked at the topic it was supposedly a duplicate of, but that didn't seem to apply to my problem either, unless I misunderstood something. Is there any way to find these URLs each time the program runs, or any way to trick the connection into thinking I'm scrolling down the page? Or can I make a general "request" or "POST" as I've seen in some related topics, that will automatically call the appropriate URL (An explanation of a "POST" would be appreciated as well)?
I ran into this problem yesterday and haven't been able to find a solution to it.
Once a user logs out how do I prevent them from hitting the back button and loading the cached, previous page?
I ran into this post and read the suggested article, but I'm unsure if any of these suggestions are the correct way to handle this problem.
I even ran the sample apps from Play! notably the Forms app and it has the same problem. I thought their apps would at least show how to handle this.
Any help would be appreciated. Thanks.
You can disable the cache in the response's header (no-cache or must revalidate) for every page that needs to check the credentials.
I am wondering if anyone could expand on any of these attempts or has any other ideas for catching JS errors using WebDriver that will work in Firefox, Chrome, Internet Explorer, and Safari.
Here is what's been tried so far:
Attempt – Problem:
JSErrorCollector.jar - Works fine, but is a Firefox only solution.
Inject JS into page source – I injected window.onerror code into the page’s source code using WebDriver, but any initial errors are missed because the injection is too late.
BrowserMob – I can intercept the HTTP response and planned to inject the window.onerror code into response body, but the author has not implemented the getBody() method yet, so only headers can be modified, that I am aware of. The body is always null for all responses. (I was on a webpage where the author talked about implementing getBody() but it hasn’t happened yet and I cannot find it again)
Fiddler – JS will inject correctly, but Fiddler is Windows only so Safari won’t work.
Parent/Child windows – I use javascript to open and store a reference to the test page’s window. The window.onerror code is contained in the parent window so it will not miss startup errors in the child window. I cannot get this to work in anything but Firefox and Chome somewhat. I already asked a question about it here.
Selenium RC – I haven’t tried it because all my tests use WebDriver, but I know it has some kind of method like captureNetworkTraffic(), but I don’t think it can be used in WebDriver.
IE error popup – I was going to use the parent/child solution for Firefox/Chrome and then look for the IE error popup. This popup displays when the setting is checked to display it. The popup is a native Window window (I think) so I cannot use selenium to access it.
Read browser console – I could not find a way to do this in all browsers. In Chrome I found a way to save the console log to a file and then read the file. That is as far as I got.
I would like a solution similar to BrowserMob since it seems like it would be a cross browser solution. Are there any other proxies that can be put in the test and intercept the response? It would have been excellent if the getBody() method was implemented. I also like the parent/child solution because it also seems like a simple, cross browser solution, but it is not working for IE (parent/child question again).
Thanks for any help.
I don't know of any way to directly catch Javascript code errors by a test framework. If I were to guess, I would use PhantomJS. Or, maybe something like MITM Proxy would work?
As a sidenote, if you run Selenium2 Grid Hub with a separate Node, you can pass a Java option to the JVM of the node like this that will allow a proxy through Fiddler to work. Fiddler listens (by default) on port 8888. With this method you can watch packets.
:: batch script: Set JAVA_OPTS java options to JVM
SET "JAVA_OPTS=-Dwebdriver.chrome.^
driver=%CHROMEDRIVER%"
IF "%PROXY_TO_FIDDLER%"=="true" SET "JAVA_OPTS=%JAVA_OPTS% -DproxySet=true^
-Dhttp.proxyHost=127.0.0.1 -Dhttp.proxyPort=8888"
I created scripts you can use to start your grid and node here. It seems to me that you could use this method to also talk to BrowserMob proxy on port 8080? I have not tried that.