Embedded Browser in the background for crawling JAVA - java

I am looking for a web browser that would run in background in my java application. It would fetch all resources related to a url and create the DOM, run starting js scripts and etc. It would do everything a browser does but it need not have a UI but an API to control the rendered page, execute js scripts and etc on it is needed.
It should support latest HTML, CSS and JS implementations.
Is there something like that out there?

It is easily possible using products like phantom.js. The reason that it can be done easily in phantom.js and not in java is - phantom.js uses V8 scripting engine, the same engine that empowers Google Chrome browser. So effectively Phantom JS is like an invisible google chrome browser. There's no similar support available in Java.
Java does support custom scripting engine Scripting Engine in Java. But that is just one part of the story, you need to be able to load HTML/DOM, interpret CSS etc.
So , my suggestion will be to call phantom.js from your java app. You can explore using JNI to manipulate phantom.js behavior.

Related

Automatically navigating a website with java

A few years ago I made a program in .NET that uses the webbrowser control. With that I was able to automatically log in to a website, navigate, and download pictures. It was GUI based since it was using the webbrowser control. It had the advantage that I could follow along and see if something went wrong.
What is the best way forward to replicate that idea in Java? Is there a similar free control that acts as a webbrowser and gives access to the DOM?
I suspect the optimal way would be to use the Google Chrome Developer tools to replicate the login via GET/POST methods, but at first would prefer the webbrowser approach.
You can use Selenium for that. It is a free (open source) automated testing suite for web applications across different browsers and platforms. It mainly focuses on automating web-based applications.
In Java, You can use Selenium which will give you full control on Web-Browsers as well as DOM.
In Selenium Web Driver is a class which provides full automated control of a browser that we want to use.
This may help You!
Thanks!
You could use the JavaFX webView, class javafx.scene.web.WebView.
It uses a Webkit engine that is HTML 5 compliant and seem to be up to date (it was in java 8 & 9).
The engine has interraction with the JS engine that may help to introspect and navigate.
Example to get the "window" JS object:
JSObject window = (JSObject) webView.getEngine().executeScript("window");
Webview ewample:
JavaFx Webview HTML5 DragAndDrop

Java webdriver for angular application not protractor?

I will be responsible for test automation for angular application. I know that we have a protractor tool but i prefer webdriver with java(feel better with this than javascript and protractor). May i use java with webdriver or i must do it using protractor because selenium will not handle it?
Of course, you can still use the regular Java selenium bindings to test AngularJS applications. It's just that Protractor is simply more suitable/convenient to use for specifically AngularJS applications because of the several unique things it provides:
it works in sync with Angular - it always knows when Angular is "ready" to be interacted with
it provides Angular specific locators like by.model, by.binding, by.repeater etc
it allows you to easily mock AngularJS modules on the fly
it is developed and supported by Google developers (and of course the github community) - meaning it is in sort of a sync with Angular development cycle
it has a very nice and documented API
and many more
It's also important to understand that Protractor is actually a wrapper around the WebDriverJS - JavaScript selenium bindings. And, as a side note, Protractor can also be used to test non-angular apps (just turn the sync off).
There is also ngWebDriver package that might actually be your solution:
We have taken JavaScript from Angular's Protractor project. While
ngWebDriver perfectly compliments the Java version of WebDriver, it
has to pass JavaScript up to the browser to inteoprate with Angular,
and the Protractor project has done the hard work (including testing)
to make that solid, and ngWebDriver benefits from that work.
Also see:
Use protractor with Java
how to implement protractor JavaScript API in Java to use in existing Selenium Java Frameworks

How can I use HTML & CSS to create the user interface for my java app?

I have just read this article: http://docs.oracle.com/javafx/2/webview/jfxpub-webview.htm It shows how to build a java app using the javafx library and also how to use some classes such as WebEngine and WebView to display a web page in the app, basically turning it into a browser.
Here is some relevant info from the article:
The embedded browser component is based on WebKit, an open source web
browser engine. It supports Cascading Style Sheets (CSS), JavaScript,
Document Object Model (DOM), and HTML5.
The embedded browser enables you to perform the following tasks in
your JavaFX applications:
Render HTML content from local and remote URLs
Obtain Web history
Execute JavaScript commands
Perform upcalls from JavaScript to JavaFX
Manage web pop-up windows
Apply effects to the embedded browser
I would basically like to entirely dispense with Java or JavaFX GUI tools, except for those required to display HTML and CSS, as described in the article, and build the entire user interface for my app in HTML and CSS. I would like various HTML buttons to cause events to transpire in my java code.
Does this seem like a good idea? And since it does seem like a good idea to me, I'm also wondering why would anyone ever use any other method to build a GUI in java.
I'm creating an entire desktop application with a single WebView, it is available at github. Basically it's UI is a single HTML file which links a dozen of JS files. To call Java from JS I wrap my requests into json and call a Java facade bean. It is also possible to call JS from Java in same fashion. Though it is possible to call Java directly by invoking a method on a Java bean with parameters of any type, I did have a few application crashes after which I decided to make it completely safe and stay with json. This app uses AngularJS and Twitter Bootstrap to render pages.
I had created a ticket in Oracle's JIRA for better Java integration (JSR-223) inside a WebView and their answer was it could be scheduled for Java 9.
The development is pretty fast, when the process is set up - it's hard to debug the app in the beginning because there is no debugger. Some top-level JS exceptions are not being caught as well. At the moment I'm having no issues with WebView in JavaFX 8. JavaFX 7 is unusable for me because of the problem with fonts.
Answering your last question - I have no idea, but the situation is completely the opposite. For some reason Oracle puts resources for JavaFX native components development, but not for better WebView integration.
You can use a MVC framework such as Struts2 or Spring MVC together with AJAX and build a user interface (the view component) completely with HTML/CSS. Sometimes, using a template engine such as FreeMarker3 also helps replacing default java rendering with pure HTML/CSS solutions.
Vaadin
Vaadin is a sophisticated servlet-based framework that creates server-side based apps that run your pure Java code while automatically rendering the user interface on the client side in HTML, CSS, and JavaScript.
You don't need to know about or do in programming in those web client technologies, Vaadin does all the web-related work for you. No web templates, no pages, none of that. Your Java code simply creates label, field, buttons, and layouts. Vaadin transforms those on-the-fly to be rendered in the browser. Pure Java on the server, no Java at all on the client/browser side.
Xojo
Vaadin is not "yet another web app framework". It really has no direct competitors in terms of architecture except the non-Java Xojo Web Edition which uses its own proprietary OOP language.
It looks a very good idea.
An alternative can be to use Jetty or a similar open source server but you have much more work to do for adapt it to your application.
An other alternative is to build a Java EE application but it is not so much agile for a simple web view of your app, Java EE gives you the possibility to build dynamic web pages using the Java Server Pages and manage the user requests using the Servlets but you have to submit to his structure and this is not useful for your easy application.

Chrome Browser and Java Message Passing

I have developed a Chrome extension and it captures some data in a webpage.
My ultimate goal is to pass this final result to my Java Application.
I have following few options in my mind, but I was not able to find any resources for them yet.
Access the localStorage externally.
Run Chrome browser through the Java app, So I guess we have the control of its data.
If no API found, write the result to a file and access it from the
Java App.
Is there any API to achieve any of the first 2 options? Or any other interface other than the file system?
I checked with berkelium and The Chromium Embedded Framework. But they are just chrome wrappers, and we cannot run a chrome instance from it.
Edit
For the 2nd option I tried with Selenium Webdriver, but I think it hasn't any method to access the localStorage.
It sounds like you are looking for Native Messaging, which allows communication between a Chrome Extension and a native application (e.g. a Java Desktop Application).
There are plenty of question here on SO regarding the implementation of Native Messaging and there is, also, the "official" example.
I suggest the above solution, but if your application will heavily interact with the extension (and you feel like reverse engineering) there is the open-source **[NetBeans Connector Chrome Extension][3]**, which uses a different approach (Sockets or WebSockets - I am not sure).
Take a look at **[this answer][4]** for info on how to get at the sources.
Is there a limitation preventing you from exposing a REST API with your java application?

Developing traditional style webapps with GWT

In many JVM web frameworks survies and Indeed.com trends graphs, GWT looks to be the most (or among the top) popular JVM framework.
But AFAIK, GWT excels when the application is one page app -to some degree of course- (like GMail, Google Reader ..)
Does this mean that new developed Java webapps are one page apps mainly (not traditional request-new page response)?
Is it possible to develop SpringMVC or Struts2 style webapps with GWT? or it is not recommended?
Absolutely, it is definitely possible to build struts apps with GWT; at some level all the GWT is, is a convenient way of writing javascript (in my perl days, we called it 'syntactic sugar').
You can still use GWT-RPC, JSON, or HTML forms to communicate with the server.
You can attach to any arbitrary HTML element using RootPanel.get("id"); to add javascript widgets.
Almost every javascripty component on my employer's website is written in GWT: http://www.cohomefinder.com/ . The backend is an old fork of struts.
And here's a nice, though a bit dated, article about building 'normal' websites in GWT: http://www.canoo.com/blog/2007/03/13/building-a-regular-website-with-the-google-web-toolkit/
I'm not sure, however, that this use of GWT is where it really shines. One downside is a lot of javascript parsing on every page load. Code splitting can help with that, some.

Categories