Is it possible to teach HTMLUnit to ignore certain javascript scripts/files on a web page? Some of them are just out of my control (like jQuery) and I can't do anything with them. Warnings are annoying, for example:
[WARN] com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument:
getElementById(script1299254732492) did a getElementByName for Internet Explorer
Actually I'm using JSFUnit and HTMLUnit works under it.
If you want to avoid exceptions because of any JavaScript errors:
webClient.setThrowExceptionOnScriptError(false);
Well I am yet to find a way for that but I have found an effective workaround. Try implementing FalsifyingWebConnection. Look at the example code below.
public class PinConnectionWrapper extends FalsifyingWebConnection {
public PinConnectionWrapper(WebClient webClient)
throws IllegalArgumentException {
super(webClient);
}
#Override
public WebResponse getResponse(WebRequest request) throws IOException {
WebResponse res = super.getResponse(request);
if(res.getWebRequest().getUrl().toString().endsWith("/toolbar.js")) {
return createWebResponse(res.getWebRequest(), "",
"application/javascript", 200, "Ok");
}
return res;
}
}
In the above code whenever HtmlUnit will request for toolbar.js my code will simply return a fake empty response. You can plug-in your above wrapper class into HtmlUnit as below.
final WebClient webClient = new WebClient(BrowserVersion.FIREFOX_3_6);
new PinConnectionWrapper(webClient);
Take a look at WebClient.setScriptPreProcessor. It will give you the opportunity to modify (or in your case, stop) a given script before it is executed.
Also, if it is just the warnings getting on your nerves I would suggest changing the log level.
If you are interested in ignoring all warning log entries you can set the log level to INFO for com.gargoylesoftware.htmlunit.javascript.host.html.HTMLDocument in the log4j.properties file.
LogFactory.getFactory().setAttribute("org.apache.commons.logging.Log", "org.apache.commons.logging.impl.NoOpLog");
java.util.logging.Logger.getLogger("com.gargoylesoftware.htmlunit").setLevel(Level.OFF);
java.util.logging.Logger.getLogger("org.apache.commons.httpclient").setLevel(Level.OFF);
Insert this code.
Related
I am given a url , I need to get this url html and from there get this site links .
I thought about using headless browsers . I m using java so I would like to sum it up using java process.
an example can be cnn site ...
So far I have tried using :
testCompile 'net.sourceforge.htmlunit:htmlunit:2.32'
#Test
public void htmlUnitTest() throws Exception {
try (final WebClient webClient = new WebClient(BrowserVersion.CHROME)) {
webClient.waitForBackgroundJavaScriptStartingBefore(20000);
webClient.getOptions().setThrowExceptionOnScriptError(false);
final HtmlPage page = webClient.getPage(URL);
WebResponse response = page.getWebResponse();
String content = response.getContentAsString();
List<HtmlAnchor> anchors = page.getAnchors();
System.out.println("anchors.size() : " + anchors.size());
System.out.println("***********");
System.out.println(content);
System.out.println("***********");
try (BufferedWriter writer = new BufferedWriter(new FileWriter("htmlUnit.txt"))) {
writer.write(content);
}
}
}
but the response I am getting the original HTML without being rendered (the java script havent worked and created the page anchors in my case )
can someone recommend on another library , or tell me if I miss using html unit and can suggest a working solution it will be very helpful.
The waitForBackgroundJavaScriptXX methods are not options; you have to call them AFTER getPage(URL) or any other interaction like click().
One of the major differences between HtmlUnit and Selenium is the integration of all parts. In HtmlUnit the javascript engine is part or the implementation, this implies that the api is able to get information about the current status. As a result waitForBackgroundJavaScriptXX methods are only waiting, if there is some javascript pending. If there is none they are no ops.
I have a problem.
I have form in play framework and when I want open new, I use this link: link/projeto/novo
There are some fields that I need to save and I do one validation and If there some field blank, I send one "badrequest", like this:
public static Result grava() throws IOException{
Long id;
Http.Request request = request();
Form<Projeto> projetoFormRequest = projetoForm.bindFromRequest();
listaDeErros = new ArrayList<String>();
Projeto projeto = projetoFormRequest.get();
if(StringUtils.isEmpty(projeto.getNomeProjeto())){
listaDeErros.add(Messages.get("projeto.form.validacao.nomeProjetoObrigatorio"));
}
if(projeto.getTipoProjeto().getIdTipoProjeto()==null){
listaDeErros.add(Messages.get("projeto.form.validacao.tipoDeProjetoObrigatorio"));
}
...
if(listaDeErros.size()>0){
return badRequest(cadastro.render(projetoForm, listaDeErros));
}
...
My routes:
GET /projeto/novo controllers.ProjetoController.cadastroProjeto()
POST /projeto/grava controllers.ProjetoController.grava()
but my link in browser change to link/projeto/grava
I'd like to maintain the same link link/projeto/novo
How could I do this ?
thanks
As you are returning badRequest() in the grava action it stays there... badRequest is a Result NOT a redirect.
You can try to redirect to the cadastroProjeto in case of error, anyway you'll need to pass projetoForm somehow, maybe using cache?
TIP: It's good habit to use English names for actions, models, views, etc - even if routes are Portuguese ;)
I am coding in GWT 2.3 using Eclipse. While I have had coding experience, it has been limited to client-side. My current project involves creating a mapping program, which takes a list of points from an Excel sheet and places them on a predefined image. Now, I have my servlet and my client code connected, and I already have some idea how to read the Excel file.
My current problem: I get the following error when I load my application on Firefox using Development Mode:
Something other than an int was returned from JSNI method '#com.google.gwt.user.client.rpc.impl.ClientSerializationStreamReader::readInt()': JS value of type undefined, expected int
Development Mode's console doesn't give me any errors when I run, those it does tell me there is a [WARN] with two things I'm not using (images which I misnamed, but do not load ever).
Currently, my code is as follows:
In my Floor.java client side code:
MyServiceAsync service = (MyServiceAsync) GWT.create(MyService.class);
AsyncCallback<String> callback = new AsyncCallback<String>() {
public void onFailure(Throwable caught) {
printerModel.setText("FAILED");
String details = caught.getMessage();
printerModel.setText(details);
}
#Override
public void onSuccess(String result) {
//I purposefully have this as an empty method so I could figure out the error
}
};
service.readFile("PrinterList.xls", callback);
In my MyService.java:
>public String readFile(String s);
In `MyServiceImpl.java`:
>public String readFile(String s) {
// TODO Auto-generated method stub
try {
} catch (Exception e) {
}
return "foo";
}
My AsyncCallback type is String, which seems to be causing the error. The method my client code calls returns a single String at this point, "fubar" (for simplicity). I thought that Strings were automatically serializable, but I am not sure. So, how do I get this error to go away? And how do I make the server code serialized?
What the exception says is basically this:
Client was trying to read an object from the data stream. Based on the signature of called method (or some other hint) the stream reader was expecting an int but found undefined instead.
As for the serializability of String, your assumption is correct. They are serializable without any effort on your part.
Without looking at the code and/or exception trace, it's difficult to say anything more.
EDIT:
Your code seems fine to me. Is there a chance that you are mixing GWT versions? That is you compiled your GWT application with 2.3, but the server classpath contains an older GWT jar (or vice versa). Take a look at:
Project GWT version settings. Project-> Properties -> Google -> Web Toolkit. Which version of GWT is selected there?
Compare the GWT settings with Project -> Properties -> Java Build Path -> Libraries. How many GWT related jars do you see there? Which version? Are there more than one gwt-servlet-x.y.jar?
I'm just getting started with HTMLUnit and what I'm looking to do is take a webpage and extract out the raw text from it minus all the html markup.
Can htmlunit accomplish that? If so, how? Or is there another library I should be looking at?
for example if the page contains
<body><p>para1 test info</p><div><p>more stuff here</p></div>
I'd like it to output
para1 test info more stuff here
thanks
http://htmlunit.sourceforge.net/gettingStarted.html indicates that this is indeed possible.
#Test
public void homePage() throws Exception {
final WebClient webClient = new WebClient();
final HtmlPage page = webClient.getPage("http://htmlunit.sourceforge.net");
assertEquals("HtmlUnit - Welcome to HtmlUnit", page.getTitleText());
final String pageAsXml = page.asXml();
assertTrue(pageAsXml.contains("<body class=\"composite\">"));
final String pageAsText = page.asText();
assertTrue(pageAsText.contains("Support for the HTTP and HTTPS protocols"));
}
NB: the page.asText() command seems to offer exactly what you are after.
Javadoc for asText (Inherited from DomNode to HtmlPage)
My cell phone provider offers a limited number of free text messages on their website. I frequently use the service although I hate constantly having a tab open in my browser.
Does anyone know/point me in the right direction of how I could create a jar file/command line utility so I can fill out the appropriate forms on the site. I've always wanted to code up a project like this in Java, just in case anyone asks why I'm not using something else.
Kind Regards,
Lar
Try with Webdriver from Google or Selenium.
Sounds like you need a framework designed for doing functional testing. These act as browsers and can navigate web sites for testing and automation. You don't need the testing functionality, but it would still serve your needs.
Try HtmlUnit, or LiFT, which is a higher-level abstraction built on HtmlUnit.
Use Watij with the Eclipse IDE. When your done, compile as an .exe or run with a batch file.
Here is some sample code I wrote for filling in fields for a Google search, which can be adjusted for the web form you want to control :
package goog;
import junit.framework.TestCase;
import watij.runtime.ie.IE;
import static watij.finders.SymbolFactory.*;
public class GTestCases extends TestCase {
private static watij.runtime.ie.IE activeIE_m;
public static IE attachToIE(String url) throws Exception {
if (activeIE_m==null)
{
activeIE_m = new IE();
activeIE_m.start(url);
} else {
activeIE_m.goTo(url);
}
activeIE_m.bringToFront();
return (activeIE_m);
}
public static String getActiveUrl () throws Exception {
String currUrl = activeIE_m.url().toString();
return currUrl;
}
public void testGoogleLogin() throws Exception {
IE ie = attachToIE("http://google.com");
if ( ie.containsText("/Sign in/") ) {
ie.div(id,"guser").link(0).click();
if ( ie.containsText("Sign in with your") ||
ie.containsText("Sign in to iGoogle with your")) {
ie.textField(name,"Email").set("test#gmail.com");
ie.textField(name,"Passwd").set("test");
if ( ie.checkbox(name,"PersistentCookie").checked() ){
ie.checkbox(name,"PersistentCookie").click();
}
ie.button(name,"signIn").click();
}
}
System.out.println("Login finished.");
}
public void testGoogleSearch() throws Exception {
//IE ie = attachToIE( getActiveUrl() );
IE ie = attachToIE( "http://www.google.com/advanced_search?hl=en" );
ie.div(id,"opt-handle").click();
ie.textField(name,"as_q").set("Watij");
ie.selectList(name,"lr").select("English");
ie.button(value,"Advanced Search").click();
System.out.println("Search finished.");
}
public void testGoogleResult() throws Exception {
IE ie = attachToIE( getActiveUrl() );
ie.link(href,"http://groups.google.com/group/watij").click();
System.out.println("Followed link.");
}
}
It depends on how they are sending the form information.
If they are using a simple GET request, all you need to do is fill in the appropriate url parameters.
Otherwise you will need to post the form information to the target page.
You could use Watij, which provides a Java/COM interface onto Internet Explorer. Then write a small amount of Java code to navigate the form, insert values and submit.
Alternatively, if it's simple, then check out HttpClient, which is a simple Java HTTP client API.
Whatever you do, watch out that you don't contravene your terms of service (easy during testing - perhaps you should work against a mock interface initially?)
WebTest is yet another webapp testing framework that may be easier to use than the alternatives cited by others.
Check out the Apache Commons Net Package. There you can send a POSt request to a page. This is quite low level but may do what you want (if not you might check out the functional testing suites but it is probably not as easy to dig into).
As jjnguy says, you'll need to dissect the form to find out all the parameters.
With them you can form your own request using Apache's HTTP Client and fire it off.