Suppose we are on the site http://site1.tld, whose HTML page index.html includes images from another site say http://site2.tld. This other site requires basic authentication to access, and we do have those details.
We are using Selenium RC and starting a Firefox 15.0.1 browser. We are writing our tests on Java 1.7.
We can use Selenium to navigate to the protected page using the username:password#site2.tld URL and hence allow for access.
My question is: is there a way to let the RC use the credentials just whenever it needs to load a resource from the protected site?
Simply put, can it "see" URL's like http:// site2.tld/image.png as if they are http:// username:password#site2.tld?
#rrufai: Your edit does not make my question clearer, therefore I am reverting it in part. However, since you did misinterpret it, I believe a clarification is indeed necessary – I would like Selenium RC to actually read the files on site2.tld as if their location was http:// username:password#site2.tld.
Related
Is it any way to do it ? I have automated script which searches through website with captcha and I noticed that if I open this website manually there is only "I'm not a robot" checkbox but when I open it using selenium there are also several puzzles to solve so server must recognize that my browser "is being controlled by automated test software" as Chrome says.
I tried to use incognito mode but it doesn't help.
The main goal of Captcha is: an application to prevent unauthorized access and automated accessing of the websites using any tools/bots. Selenium(or any Automation Tool) cannot read CAPTCHAs.
So, if y can, mock this service in the staging/test environment.
Also, Captcha is an outside service.
It's not necessary to test Captcha behavior, right?
However (as for everything, a hack is available), you can use some software/websites to convert ‘image to text’ (which would be not ‘always’ correct!)
Another thing, we can set cookies for Captcha. The secret key for captcha should be used in this case.
I want to download a source of a webpage to a file (*.htm) (i.e. entire content with all html markups at all) from this URL:
http://isap.sejm.gov.pl/DetailsServlet?id=WDU20061831353
which works perfectly fine with FileUtils.copyURLtoFile method.
However, the said URL has also some links, for instance one which I'm very interested in:
http://isap.sejm.gov.pl/RelatedServlet?id=WDU20061831353&type=9&isNew=true
This link works perfectly fine If open it with a regular browser, but when I try to download it in Java by means of FileUtils -- I got only a no-content page with single message "trwa ladowanie danych" (which means: "loading data...") but then nothing happens, the target page is not loaded.
Could anyone help me with this? From the URL I can see that the page uses Servlets -- is there a special way to download pages created with servlets?
Regards --
This isn't a servlet issue - that just happens to be the technology used to implement the server, but generally clients don't need to care about that. I strongly suspect it's just that the server is responding with different data depending on the request headers (e.g. User-Agent). I see a very different response when I fetch it with curl compared to when I load it in Chrome, for example.
I suggest you experiment with curl, making a request which looks as close as possible to a request from a browser, and then fiddling until you can find out exactly which headers are involved. You might want to use Wireshark or Fiddler to make it easy to see the exact requests/responses involved.
Of course, even if you can fetch the original HTML correctly, there's still all the Javascript - it would be entirely feasible for the HTML to contain none of the data, but for it to include Javascript which does the actual data fetching. I don't believe that's the case for this particular page, but you may well find it happens for
try using selenium webdriver to the main page
HtmlUnitDriver driver = new HtmlUnitDriver(true);
driver.manage().timeouts().implicitlyWait(30, TimeUnit.SECONDS);
driver.get(baseUrl);
and then navigate to the link
driver.findElement(By.name("name of link")).click();
UPDATE: I checked the following: if I turn off the cookies in Firefox and then try to load my page:
http://isap.sejm.gov.pl/RelatedServlet?id=WDU20061831353&type=9&isNew=true
then I yield the incorrect result just like in my java app (i.e. page with "loading data" message instead of the proper content).
Now, how can I manage the cookies in java to download this page properly then?
So, I have made a proxy-like Java Applet. It lets you navigate a site, handling cookies an everything, and supports authentication on the remote site. The thing is that after logging in on the remote site, I try to navigate by clicking the links and it complains that I have JavaScript disabled. To clarify, the site is functioning perfectly well when accessed directly.
My question is, can I somehow enable JavaScript inside my applet? Is it something that has to do with the browser, is it some HTTP header I must include? Am I missing something in the picture..?
Thanks in advance! :)
I am creating an application in which there will be multiple iframes within the main window- forms will be opened for submission in the main window and each form's target iframe will be one of the many iframes available...
I want to be able to access the response of each form submission, i.e. I want to access content in child iframe from code in the main window.
Please clarify the following for me-
(1) As I understand Same Origin Policy does not permit the above scenario? Am I correct?
(2) Is there some way to enable the access to child iframe, that i require, in any web browser? I saw some posts on SO about this, and even tried some of the solutions, but nothing works (I tried Google Chrome, Firefox 6, Firefox 3.6 and Safari).
(3) In case its not possible to get such data access in browser, then can I get such access by embedding a browser component in my java desktop app? In such case which browser component do you recommend?
Only if the content of the child iframes is loaded from another domain.
Generally not. In some newer browsers, the target domain can use HTTP Access Control headers to allow cross-site requests to be made to it, but there is no way for the source site to make that decision.
I'm not familiar with Java browser components, so I'll let someone else answer this part.
Real World Problem:
I have my app hosted on Heroku, who (to my knowledge) are unable to offer a solution for running a Headless (GUI-less) Browser - such as HTMLUnit - for generating HTML Snapshots for Googlebot to index my AJAX content.
My Proposed Solution:
If you haven't already, I suggest reading Google's Full Specification for Making AJAX Applications Crawlable.
Imagine I have:
a Sinatra app hosted on Heroku on the domain http://example.com
the app has tabs along the top of the page TabA, TabB and TabC
under each tab is SubTab1, SubTab2, SubTab3
onload if the url is http://example.com#!tab=TabA&subtab=SubTab3 then client-side Javascript takes the location.hash and loads in TabA, SubTab3 content via AJAX.
Note: the Hash Bang (#!) is part of the google spec.
I would like to build a simple "web service" hosted on Google App Engine (GAE) that:
Accepts a URL param e.g. http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3 (url param should be URLEncoded)
Runs HTMLUnit to open http://example.com#!tab=TabA&subtab=SubTab3 and run the client-side javascript on the sever.
HTMLUnit returns the DOM once everything is complete (or something like 45 seconds has passed).
The return content could be sent back via JSON/JSONP, or alternatively a URL is return to a file generated and stored on the google app engine server (for file based "cached" results)... open to suggestions here. If a URL to a file was returned then you could CURL to get the source code (aka a HTML Snapshot).
My http://example.com app would need to manage the call to http://htmlsnapshot.appspot.com... basically:
Catch Googlebots call to http://example.com/?_escaped_fragment_=tab=TabA%26subtab=SubTab3 (googlebot crawler escapes certain characters e.g. %26 = &).
Send request from the backend to http://htmlsnapshot.appspot.com?url=http://example.com#!tab=TabA&subtab=SubTab3 (url param should be URLEncoded)
Render the returned HTML Snapshot to the frontend.
Google Indexes the content and we rejoice!
I don't have any experience with Google App Engine or Java or HTMLUnit.
I might be able to figure it out... and will post my results if I do.
Otherwise I feel this is a VERY good opportunity for someone to write a kick-ass blog post that outlines a novices step-by-step guide to setting up a web service like this.
This will introduce more people to the excellent (and free!) Google App Engine. Also it will undoubtably encourage more people to adopt Google's specs for crawlable AJAX content... something we can all benefit from!
As Google's specification gains more acceptance the "hurdle" of setting up a Headless Browser is going to send many devs Googling for answers! Get in now with an answer for fame and glory! (edit: at the very least I will sing your praises).
Hit me up on twitter #_chrisjacob if you would like to discuss solutions.
I have successfully used HTMLunit on AppEngine. My GWT code to do this is available in the gwt-platform project the results I got were similar to that of the HTMLunit-AppEngine test application by Amit Manjhi.
It should be relatively easy to use GWTP current HTMLunit support to do exactly what you describe, although you could likely do it in a simpler app. One problem I see is that AppEngine requests have a 30 second timeout, so you can't have a page that takes HTMLunit longer than that to process.
UPDATE:
It's been a while, but I finally closed the long standing issue about making GWT applications crawlable using GWTP. The documentation is not entirely there, but check out the issue:
http://code.google.com/p/gwt-platform/issues/detail?id=1