I'm trying to get the page https://secure.twitch.tv/login with PhantomJS in Java using Selenium, but on the driver.get(...) I always get crashed with this error. I've tried implementing this:
String [] phantomJsArgs = {"--web-security=no", "--ignore-ssl-errors=yes"};
desireCaps.setCapability(PhantomJSDriverService.PHANTOMJS_GHOSTDRIVER_CLI_ARGS, phantomJsArgs);
But that doesn't seem to make a difference. Does anyone know a workaround?
Here is some code:
private void setup(){
DesiredCapabilities desireCaps = new DesiredCapabilities();
desireCaps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C:\\Users\\Scott\\workspace\\Twitch Bot v2\\libs\\phantomjs.exe");
desireCaps.setCapability("takesScreenshot", true);
String [] phantomJsArgs = {"--disable-web-security"};
desireCaps.setCapability(PhantomJSDriverService.PHANTOMJS_GHOSTDRIVER_CLI_ARGS, phantomJsArgs);
driver = new PhantomJSDriver(desireCaps);
//driver = new HtmlUnitDriver();
}
This is what the console is printing out when I try to grab the twitch page.
It seems you are trying to load the page with async XMLHttpRequest, but the server does not provide cross origin headers (Access-Control-Allow-Origin) in its response. Loading such resource with async XMLHttpRequest is discouraged for security reasons.
To bypass this limitation, add the flag --disable-web-security to phantomJsArgs.
Just another guess what might be going on: phantomjs still defaults to SSL 3.0 requests but lots of websites have disabled SSL 3.0 so these requests will fail. To use more modern protocols use the following option with phantomjs:
--ssl-protocol=any
Related
I'm working on an automated web test stack using Selenium, Java and testNG.
For authentication and safety reasons, I need to enrich the headers of the website I am accessing through Kubernetes.
For example, I can successfully use this CURL command in terminal to retrieve the page I want to access: curl -H 'Host: staging.myapp.com' -H 'X-Forwarded-Proto: https' http://nginx.myapp.svc.cluster.local.
So as you can see, I only need to add 2 headers for Host and X-Forwarded-Proto.
I'm trying to create a proxy that will enrich headers in my #BeforeMethod method for a couple of days, but I'm still stuck, and there are so many shadow areas that I can't find a way to debug anything and understand what's wrong. For now, no matter my code, I keep getting a "No internet" (ERR_PROXY_CONNECTION_FAILED) error page in my driver when I launch it.
For example, one version of my code:
BrowserMobProxy browserMobProxy = new BrowserMobProxyServer();
browserMobProxy.setTrustAllServers(true);
browserMobProxy.addHeader("Host", "staging.myapp.com");
browserMobProxy.addHeader("X-Forwarded-Proto", "https");
browserMobProxy.start();
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setProxy(ClientUtil.createSeleniumProxy(browserMobProxy));
driver = new ChromeDriver(chromeOptions);
driver.get("http://nginx.myapp.svc.cluster.local");
I tried several other code structures like:
defining browserMobProxyServer.addRequestFilter to add headers in requests
using only org.openqa.selenium.Proxy
setting up proxy with setHttpProxy("http://nginx.myapp.svc.cluster.local:8888");
But nothing works, I always get ERR_PROXY_CONNECTION_FAILED.
Anybody have any clue about that? Thanks!
OK so, after days of research, I found out 2 things:
Due to whatever configuration on my Mac, I need to force host Address (other people running the same code had no issue...):
proxy.setHttpProxy(Inet4Address.getLocalHost().getHostAddress() + ":" + browserMobProxyServer.getPort());
I have to manually alter headers via a response filter instead of using .addHeader method:
browserMobProxyServer.addResponseFilter((response, content, messageInfo)->{
//Do something here related to response.headers()
});
I hope it will help some lost souls here.
I need to download files using headless web browser in Java. I checked HtmlUnit where I was able to download file in some simple cases but I was not able to download when Ajax initialized downloading (actually it is more complicated as there are two requests, the first one download the URL where the second request actually download file from the given URL). I have replaced HtmlUnit with Selenium. I already checked two WebDrivers, HtmlUnitDriver and ChromeDriver.
HtmlUnitDriver - similar behaviour like HtmlUnit
ChromeDriver - I am able to download files in visible mode but when I turned on headless mode, files are no longer downloaded
ChromeOptions lChromeOptions = new ChromeOptions();
HashMap<String, Object> lChromePrefs = new HashMap<String, Object>();
lChromePrefs.put("profile.default_content_settings.popups", 0);
lChromePrefs.put("download.default_directory", _PATH_TO_DOWNLOAD_DIR);
lChromeOptions.setExperimentalOption("prefs", lChromePrefs);
lChromeOptions.addArguments("--headless");
return new ChromeDriver(lChromeOptions);
I know that downloading files in headless mode is turned off because of security reasons but there must be some workaround
I used 2.28 httpunit before, few minutes ago I started to work with 2.29 but still it seems that Ajax function stops somewhere. This is the way I retrieve data after click and expect a file data: _link.click().getWebResponse().getContentAsStream()
Does WebConnectionWrapper shows all the requests/responses that are made on the website? Do You know how can I debug this to have better insight? I see that the first part of the Ajax function after link is clicked is being properly called (there are 2 http requests in this function). I even tried to create my custom http request to retrive data/file after first response is fetched inside WebConnectionWrapper -> getResponse but it returns 404 error which indicates that this second request had been somehow done but I dont see any log/debug information neither in _link.click().getWebResponse().getContentAsStream() nor WebConnectionWrapper -> getResponse()
Regarding HtmlUnit you can try this:
Calling click() on a dom element is a sync call. This means, this returns after the response of this call is retrieved and processed. Usually all the JS libs out there doing some async magic (like starting some processing with setTimeout(,10) for various (good) reasons. Your code will be aware of this.
A better approach is to do something like this
Page page = _link.click();
webClient.waitForBackgroundJavaScript(1000);
Sometimes the Ajax requests are doing an redirect to the new content. We have to address this new stuff by checking the current window content
page = page.getEnclosingWindow().getEnclosedPage();
Or maybe better
In case of downloads the (binary) response might be opened in a new window
WebWindow tmpWebWindow = webClient.getCurrentWindow();
tmpWebWindow = tmpWebWindow.getTopWindow();
page = tmpWebWindow.getEnclosedPage();
This might be the response you are looking for.
page.getWebResponse().getContentAsStream();
Its a bit tricky to guess what is going on with your web application. If you like you can reach me via private mail or discuss this in the HtmlUnit user mailing list.
Anyone able to explain to me how I would be able to set cookies for a domain not visited with the use of a plugin with selenium for gecko driver? I have been trying to set a cookie to prevent seeing a login page, but the domain for the cookie is redirecting so I cannot set it by visiting it and cannot figure out how to do it.
I have tried this but looks as though I cannot specify this in selenium as I cannot visit this page.
Cookie cookie11 = new Cookie("SID",
"cookievalue",
".google.com",
"/",
expiry1,
false,
false);
Found a plugin called Cookies Export/import that I am trying to figure out if its possible to use to import the cookies from..
Any help would be appreciated!
If you wish to use the specified extension in order to do this, I recommend looking at the SO Answer on How do you use a firefox plugin within a selenium webdriver program written in java? and you should be good from there.
However, I believe you can achieve this without using an extension, using addCookie() method.
WebDriver driver = new FirefoxDriver();
Cookie cookie = new Cookie("SID",
"cookievalue",
".example.com",
"/",
expiry1,
false,
false);
driver.manage().addCookie(cookie);
driver.get("http://www.example.com/login");
Assuming your cookie details are correct, you should be able to get past the login redirect.
See also:
WebDriver – How to Restore Cookies in New Browser Window
You cannot do that. See https://w3c.github.io/webdriver/webdriver-spec.html#add-cookie
I opened this issue with the spec https://github.com/w3c/webdriver/issues/1238
You need to rebuild the browser without those validations if you want to get passed this issue:
Here is the changes to make to FireFox (marionette) to get passed this:
https://gist.github.com/nddipiazza/1c8cc5ec8dd804f735f772c038483401
At the following website I try to access the login and password forms with HtmlUnit: https://zof.interreport.com/diveport#
However this very simple javascript returns an empty list [].
void homePage() throws Exception{
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage("https://zof.interreport.com/diveport#");
System.out.println(page.getForms());
}
So somehow HtmlUnit doesn't recognize the forms on the page. How can I fix this?
At first: you only show some java code but you talk about javascript - is there anything missing?
Regarding the form. The page you are trying to test is one of these pages that doing some work on the client side. This implies, that after the page is loaded, the real page/dom is created inside your browser by invoking javascript. When using HtmlUnit you have to take care of that. In simple cases it is sufficient to wait for the javacript to be processed.
This code works for me:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage("https://zof.interreport.com/diveport#");
webClient.waitForBackgroundJavaScriptStartingBefore(5000);
System.out.println(page.getForms());
Take care to use the latest SNAPSHOT build of HtmlUnit.
I have not worked on that API but here is the trick
Open same page in your browser by disabling the JavaScript. It is not working.
This means the page loading its content using some JavaScript dom operations.
If you can not get html here there must be some way out in API you are using.
Check with the HtmlUnit api documentation. The class JAVADOC
There is method
public ScriptResult executeJavaScript(String sourceCode)
The key here is if API you are using will not execute the JavaScript on its won and you have to code for it.
I'd first like to start by saying, I've managed this using phantomJS and Selenium. I load phantomjs, load the url (sports.coral.co.uk) and then check my balance. I am however trying to find a more lightweight option.
I have tried manually sending http get/post requests using apache's HttpClient. Monitoring the login process, using postman for chrome, shows 4 requests sent once the login button has been pressed. I have tried editing and re-sending them using postman. However, from what I can tell there's a requestID that gets sent along with the requests. This is generated using the javascript on the page.
var requestId = (new Date().getTime()) + Math.round(Math.random() * 1000000);
var failedTimer = setTimeout('iapiRequestFailed(' + requestId + ')', iapiConf['loginDomainRetryInterval'] * 1000);
iapiRegisterRequestId(requestId, iapiCALLOUT_MESSAGES, failedTimer, request[3], request[4], request[5]);
return;
It looks like the original ID is a random generated number, that then gets registered using another javascript function. I'm guessing the login is partly failing due to me not being able to provide an acceptable requestID. When I re-send the old requests the user is part logged in. Once i click on my account it says an error occurred. The only explanation would be the requestID.
I then decided to give HtmlUnit a go. This seems like the type of thing I require. I did some research on using HttpClient with a javascript engine, such as Rhino and it seems HtmlUnit is the tool for that.
Before I even try to log in to the page, I get errors caused by the javascript on the page.
Heres the simple bit of code I use to connect to the page;
#Test
public void htmlunit() throws Exception {
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = client.getPage("http://sports.coral.co.uk");
System.out.println(page.asText());
client.close();
}
When I comment out the LogFactory bit I can see that there are loads of Warnings thrown,
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Feb 09, 2016 4:33:34 PM com.gargoylesoftware.htmlunit.html.HtmlScript isExecutionNeeded
WARNING: Script is not JavaScript (type: application/ld+json, language: ). Skipping execution. etc...
I'm guessing this means that HtmlUnit isn't compatible with the javascript thats being executed on the page?
I'm not very good with javascript and the scripts on the page are obfuscated, which makes it even harder to read. What I don't understand is, why does the JS get executed without error when using phantomJS or chromeDriver but not HtmlUnit? Is it because the Rhino engine isn't good enough to execute it? Am I missing something obvious?
This code will turn off all the javascript warnings caused by the htmlunit library and not by your code.
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage("http://sports.coral.co.uk");