Java Logging in to a website that uses complex javascript - java

I'd first like to start by saying, I've managed this using phantomJS and Selenium. I load phantomjs, load the url (sports.coral.co.uk) and then check my balance. I am however trying to find a more lightweight option.
I have tried manually sending http get/post requests using apache's HttpClient. Monitoring the login process, using postman for chrome, shows 4 requests sent once the login button has been pressed. I have tried editing and re-sending them using postman. However, from what I can tell there's a requestID that gets sent along with the requests. This is generated using the javascript on the page.
var requestId = (new Date().getTime()) + Math.round(Math.random() * 1000000);
var failedTimer = setTimeout('iapiRequestFailed(' + requestId + ')', iapiConf['loginDomainRetryInterval'] * 1000);
iapiRegisterRequestId(requestId, iapiCALLOUT_MESSAGES, failedTimer, request[3], request[4], request[5]);
return;
It looks like the original ID is a random generated number, that then gets registered using another javascript function. I'm guessing the login is partly failing due to me not being able to provide an acceptable requestID. When I re-send the old requests the user is part logged in. Once i click on my account it says an error occurred. The only explanation would be the requestID.
I then decided to give HtmlUnit a go. This seems like the type of thing I require. I did some research on using HttpClient with a javascript engine, such as Rhino and it seems HtmlUnit is the tool for that.
Before I even try to log in to the page, I get errors caused by the javascript on the page.
Heres the simple bit of code I use to connect to the page;
#Test
public void htmlunit() throws Exception {
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = client.getPage("http://sports.coral.co.uk");
System.out.println(page.asText());
client.close();
}
When I comment out the LogFactory bit I can see that there are loads of Warnings thrown,
WARNING: Obsolete content type encountered: 'application/x-javascript'.
Feb 09, 2016 4:33:34 PM com.gargoylesoftware.htmlunit.html.HtmlScript isExecutionNeeded
WARNING: Script is not JavaScript (type: application/ld+json, language: ). Skipping execution. etc...
I'm guessing this means that HtmlUnit isn't compatible with the javascript thats being executed on the page?
I'm not very good with javascript and the scripts on the page are obfuscated, which makes it even harder to read. What I don't understand is, why does the JS get executed without error when using phantomJS or chromeDriver but not HtmlUnit? Is it because the Rhino engine isn't good enough to execute it? Am I missing something obvious?

This code will turn off all the javascript warnings caused by the htmlunit library and not by your code.
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
WebClient client = new WebClient(BrowserVersion.CHROME);
client.getOptions().setJavaScriptEnabled(true);
client.getOptions().setThrowExceptionOnScriptError(false);
client.getOptions().setThrowExceptionOnFailingStatusCode(false);
HtmlPage page = webClient.getPage("http://sports.coral.co.uk");

Related

How to download file using headless (gui-less) Selenium WebDriver

I need to download files using headless web browser in Java. I checked HtmlUnit where I was able to download file in some simple cases but I was not able to download when Ajax initialized downloading (actually it is more complicated as there are two requests, the first one download the URL where the second request actually download file from the given URL). I have replaced HtmlUnit with Selenium. I already checked two WebDrivers, HtmlUnitDriver and ChromeDriver.
HtmlUnitDriver - similar behaviour like HtmlUnit
ChromeDriver - I am able to download files in visible mode but when I turned on headless mode, files are no longer downloaded
ChromeOptions lChromeOptions = new ChromeOptions();
HashMap<String, Object> lChromePrefs = new HashMap<String, Object>();
lChromePrefs.put("profile.default_content_settings.popups", 0);
lChromePrefs.put("download.default_directory", _PATH_TO_DOWNLOAD_DIR);
lChromeOptions.setExperimentalOption("prefs", lChromePrefs);
lChromeOptions.addArguments("--headless");
return new ChromeDriver(lChromeOptions);
I know that downloading files in headless mode is turned off because of security reasons but there must be some workaround
I used 2.28 httpunit before, few minutes ago I started to work with 2.29 but still it seems that Ajax function stops somewhere. This is the way I retrieve data after click and expect a file data: _link.click().getWebResponse().getContentAsStream()
Does WebConnectionWrapper shows all the requests/responses that are made on the website? Do You know how can I debug this to have better insight? I see that the first part of the Ajax function after link is clicked is being properly called (there are 2 http requests in this function). I even tried to create my custom http request to retrive data/file after first response is fetched inside WebConnectionWrapper -> getResponse but it returns 404 error which indicates that this second request had been somehow done but I dont see any log/debug information neither in _link.click().getWebResponse().getContentAsStream() nor WebConnectionWrapper -> getResponse()
Regarding HtmlUnit you can try this:
Calling click() on a dom element is a sync call. This means, this returns after the response of this call is retrieved and processed. Usually all the JS libs out there doing some async magic (like starting some processing with setTimeout(,10) for various (good) reasons. Your code will be aware of this.
A better approach is to do something like this
Page page = _link.click();
webClient.waitForBackgroundJavaScript(1000);
Sometimes the Ajax requests are doing an redirect to the new content. We have to address this new stuff by checking the current window content
page = page.getEnclosingWindow().getEnclosedPage();
Or maybe better
In case of downloads the (binary) response might be opened in a new window
WebWindow tmpWebWindow = webClient.getCurrentWindow();
tmpWebWindow = tmpWebWindow.getTopWindow();
page = tmpWebWindow.getEnclosedPage();
This might be the response you are looking for.
page.getWebResponse().getContentAsStream();
Its a bit tricky to guess what is going on with your web application. If you like you can reach me via private mail or discuss this in the HtmlUnit user mailing list.

Testing URL Redirect Using Selenium

I want test if a web site URL will redirect to secured site or not. For example, if I type example.com in the address bar, it should redirect to https://example.com.
From Selenium, I tried using both get("") and navigate("") with no luck. It shows an exception as wrong URL. How can I test this or proceed another way?
Even Javascript will not work.
It's very easy to achieve this using get() & getCurrentUrl(). You should type the actual URL, like www.example.com instead of just using example.com. Even tough you type the URL without the www, the browser makes that change automatically but not Selenium, hence it throws an exception. Try something like this:
driver.get("www.example.com");
//add wait for page to load completely
if(driver.getCurrentUrl().startsWith("https"))
System.out.println("Success");
else
System.out.println("Failure");

HtmlUnit can't find forms on website

At the following website I try to access the login and password forms with HtmlUnit: https://zof.interreport.com/diveport#
However this very simple javascript returns an empty list [].
void homePage() throws Exception{
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage("https://zof.interreport.com/diveport#");
System.out.println(page.getForms());
}
So somehow HtmlUnit doesn't recognize the forms on the page. How can I fix this?
At first: you only show some java code but you talk about javascript - is there anything missing?
Regarding the form. The page you are trying to test is one of these pages that doing some work on the client side. This implies, that after the page is loaded, the real page/dom is created inside your browser by invoking javascript. When using HtmlUnit you have to take care of that. In simple cases it is sufficient to wait for the javacript to be processed.
This code works for me:
final WebClient webClient = new WebClient(BrowserVersion.CHROME);
final HtmlPage page = webClient.getPage("https://zof.interreport.com/diveport#");
webClient.waitForBackgroundJavaScriptStartingBefore(5000);
System.out.println(page.getForms());
Take care to use the latest SNAPSHOT build of HtmlUnit.
I have not worked on that API but here is the trick
Open same page in your browser by disabling the JavaScript. It is not working.
This means the page loading its content using some JavaScript dom operations.
If you can not get html here there must be some way out in API you are using.
Check with the HtmlUnit api documentation. The class JAVADOC
There is method
public ScriptResult executeJavaScript(String sourceCode)
The key here is if API you are using will not execute the JavaScript on its won and you have to code for it.

Using Jsoup to get an Element from a page

I want to log in to a https website using Jsoup and make subsequent calls 3-4 services to check whether a job is done or not.
public class JSOUPTester {
public static void main(String[] args){
System.out.println("Inside the JSOUP testing method");
String url = "https://someloginpage.com";
try{
Document doc = Jsoup.connect(url).get();
String S = doc.getElementById("username").text();// LINE 1
String S1 = doc.getElementById("password").text();// LINE 2
}catch(Exception e){
e.printStackTrace();
}
}
}
Exception:
java.lang.NullPointerException
JSOUPTester.main(JSOUPTester.java:7)
I have checked in the chrome that these pages contain elements with id "username" and "password".
The lines above are throwing NullPointerException. What I am doing wrong here?
A Number of things can be the cause of this. Without the URL I can't be certain, but here are some clues:
Some pages load their content via AJAX. Jsoup can#t deal with this, since it does not interpret any JavaScript. You can check for this by downloading the page with curl, or in a browser while turnig off JavaScript. To deal with pages that use JavaScript to render themselves, you can use tools like Selenium webdriver or HTMLUnit.
The webserver of the page that you try to load might require a cookie to be present. You need to look at the network traffic that happens surfing loading of that page. In chrome or firefox you can see this in the developer tools in the network tab.
The webserver might respond differently for different clients. That is why you may have to set the UserAgent string to a known Browser in your JSoup http request.
Jsoup.connect("url").userAgent("Mozilla/5.0")
JSoup has a size limitation of 1MB for the downloaded html source. You can turn this off or set it to a larger value if needed.
Jsoup.connect("url").maxBodySize(0)
Jsoup might timeout on the request. To change timeout behavior use
Jsoup.connect("url").timeout(milliseconds)
There might be other reasons I did not think of now.

SOAP web service calls from Javascript

I'm struggling to successfully make a web service call to a SOAP web service from a web page. The web service is a Java web service that uses JAX-WS.
Here is the web method that I'm trying to call:
#WebMethod
public String sayHi(#WebParam(name="name") String name)
{
System.out.println("Hello "+name+"!");
return "Hello "+name+"!";
}
I've tried doing the web service call using the JQuery library jqSOAPClient (http://plugins.jquery.com/project/jqSOAPClient).
Here is the code that I've used:
var processResponse = function(respObj)
{
alert("Response received: "+respObj);
};
SOAPClient.Proxy = url;
var body = new SOAPObject("sayHi");
body.ns = ns;
body.appendChild(new SOAPObject("name").val("Bernhard"));
var sr = new SOAPRequest(ns+"sayHi",body);
SOAPClient.SendRequest(sr,processResponse);
No response seems to be coming back. When in jqSOAPClient.js I log the xData.responseXML data member I get 'undefined'. In the web service I see the warning
24 Mar 2011 10:49:51 AM com.sun.xml.ws.transport.http.server.WSHttpHandler handleExchange
WARNING: Cannot handle HTTP method: OPTIONS
I've also tried using a javascript library, soapclient.js (http://www.codeproject.com/kb/Ajax/JavaScriptSOAPClient.aspx). The client side code that I use here is
var processResponse = function(respObj)
{
alert("Response received: "+respObj);
};
var paramaters = new SOAPClientParameters();
paramaters.add("name","Bernhard");
SOAPClient.invoke(url,"sayHi",paramaters,true,processResponse);
I've bypassed the part in soapclient.js that fetches the WSDL, since it doesn't work
(I get an: IOException: An established connection was aborted by the software in your host machine on the web service side). The WSDL is only retrieved for the appropriate name space to use, so I've just replaced the variable ns with the actual name space.
I get exactly the same warning on the web service as before (cannot handle HTTP method: OPTIONS) and in the browser's error console I get the error "document is null". When I log the value of req.responseXML in soapclient.js I see that it is null.
Could anyone advise on what might be going wrong and what I should do to get this to work?
I found out what was going on here. It is the same scenario as in this thread: jQuery $.ajax(), $.post sending "OPTIONS" as REQUEST_METHOD in Firefox.
Basically I'm using Firefox and when one is doing a cross domain call (domain of the address of the web service is not the same as the domain of the web page) from Firefox using AJAX, Firefox first sends an OPTIONS HTTP-message (before it transmits the POST message), to determine from the web service if the call should be allowed or not. The web service must then respond to this OPTIONS message to tell if it allows the request to come through.
Now, the warning from JAX-WS ("Cannot handle HTTP method: OPTIONS") suggests that it won't handle any OPTIONS HTTP-messages. That's ok - the web service will eventually run on Glassfish.
The question now is how I can configure Glassfish to respond to the OPTIONS message.
In the thread referenced above Juha says that he uses the following code in Django:
def send_data(request):
if request.method == "OPTIONS":
response = HttpResponse()
response['Access-Control-Allow-Origin'] = '*'
response['Access-Control-Allow-Methods'] = 'POST, GET, OPTIONS'
response['Access-Control-Max-Age'] = 1000
response['Access-Control-Allow-Headers'] = '*'
return response
if request.method == "POST":
# ...
Access-Control-Allow-Origin gives a pattern which indicates which origins (recipient addresses) will be accepted (mine might be a bit more strict than simply allowing any origin) and Access-Control-Max-Age tells after how many seconds the client will have to request permission again.
How do I do this in Glassfish?
Have you actually tested that ws is working properly?
You can use SoapUI for inspecting request/response etc.
When you confirm that ws is working from SoapUI, inspect what is format of raw Soap message. Then try to inspect how it looks before sending with .js method, and compare them.
It might help you understand what is wrong.
Check if this helps
http://bugs.jquery.com/attachment/ticket/6029/jquery-disable-firefox3-cross-domain-magic.patch
it's marked as invalid
http://bugs.jquery.com/ticket/6029
but it might give you some hint
On the other hand, instead to override proper settings for cross-domain scripting might be better if you can create and call local page that will do request to ws and return result.
Or even better, you can create page that will receive url as param and do request to that url and just return result. That way it will be more generic and reusable.

Categories