Reduce HTML using Applets

Reduce HTML using Applets - java

My supervisor has tasked me with programmatically reducing a website's content by looking at the HTML tags to reveal only the core content. Importantly, this particular piece of the project must be written in Java.
Now having learnt about the differences betweenPlugins, Extensions, Applets, and Widgets, I think I want to use an Extension that calls a client-side Applet. My approach was going to be this:
Using the Google-Chrome API, I was going to display a button that
the user can click.
If clicked, the action is to launch a new browser tab that has the
Applet embedded within it.
The applet automatically sources the called tab's HTML code and
filters it.
Once filtered, the reduced copy of the original site appears.
So I have a few questions. To start, is it even possible to use an Extension with an Applet? Moreover, is it possible for an applet to look # another tabs HTML code? If not, is it possible to just reload the original tab with the Applet now embedded within it and complete the function. Thanks.

Javascript is already on most mobile web platforms. Java is not, and there is no reasonable way mobile customers will be able to install Java. Android, which runs many, but not all, mobile devices has a Java run time environment, and is basically a loader for Java apps. But an Apple iPhone is not an Android device... nor is a Windows Phone.
If you want to summarize content on the client, and in Javascript, as I see it you have two choices:
Succeed with some inner burst of genius where dozens of the best expert PhDs in Natural Language Computing have just begun exploring how to extract "true meaning" from text; OR
look at document.title and be done with it.
The 2nd approach assumes that the authors of web pages set titles and set a title appropriate for summarizing their website. This isn't a perfect assumption, but it is OK
most of the time. It is also a lot less expensive than #1
With the 1st approach you can get a head start with a "natural language toolkit" that can do things like scan text for unusual words and phrases. To get a rough idea of the kinds of software that have been built in this area, review wikipedia: Outline of natural language processing:: toolkits. A popular tookit for python is called NLTK. Whether you use a toolkit from java, or python, it means working on the server because the client will not have the storage, network speed, or CPU. For python there are server side app frameworks like django or web2py that can make building out a server app faster, and on Java there are servlets frameworks. Ultimately you'll need a lot of help, training, or luck and as I have hinted above it can easily be beyond the capabilities of a small team of fresh hires, and certainly way beyond what a single new developer eager to prove his/her capabilities can do in a few weeks on their own with limited help.
Most web pages have titles set like this near the beginning of the downloaded HTML:
<head><title>My Furry Kittens!</title></head>
You don't need to write a parser. If you are running in the browser, the title has been parsed into the DOM or Document Object Model already. The string "My Furry Kittens!" in this example would be available in the global variable document.title.
If you like, you could put a button into a plugin and let people push it to summarize the website. Or, they could just look up at the title. It is already on the page. Of course, if the goal is to scrape titles one can avoid writing a parser and use a "fake" headless scriptable browser like phantomJS or similar.
You can read more about document.title on the Mozilla Developer Network. MDN is a great reference for learning how web browsers work. They are the maintainers of the Mozilla Firefox browser. Most of what you can learn there will also work on Chrome, Internet Explorer, and various mobile platforms.
Good Luck!

How about implementing a local proxy server on the mobile device. The browser would just need to be configured to use the proxy, while the custom proxy implementation can transform the requested html however it likes.

Related

Java Desktop app with webview UI

I want some way of creating a dedicated browser window for a browser (chrom-e/ium or firefox). Its content needs to be controlled by a java application (a http call to localhost or better a more direct way of communicating). These two should be bundled together in some way.
A little Background
I want to write a java desktop app but don't want to use Swing or javaFX for the UI. The UI should be written like a one page app and may be ported (at least partially) to the web. I have taken a look at the javafx WebView but would rather have a full fledged browser on my hands. It would also be nice to have a little more control over said browser to send files and read files in a more desktopish way. The only real requirement is that there has to be some java backend behind it and that is has to work offline.
Is something like this possible at all or is it just a pipe dream?

I am very almost a year late for the party, but:
There are a few (that I know) technologies that can help you:
Electron. It is basically what you want, you can use web
technologies to "forge" a desktop app, it's quite well known, I never used it but for what I have read that you can stick almost anything to it's "backend".
JavaFxWebView. There are some really nice ways to use it, you can
even use bootstrap and AngularJs, here is a example (not by me)

Yes it's possible and not all that unusual. Your app can open a default browser as described here -
https://stackoverflow.com/a/10967469/5087125
And then proceed to respond to http requests to your app.

how to extract HTML data from a webpage which scrolls down for a fixed number of times?

I want to extract HTML data from a website using JAVA. The problem is the webpage keeps scrolling down once the user reaches the bottom of the page. Number of times it scrolls down is fixed. My JAVA code can extract only for the 1st part. How do I extract for the remaining scrolls? Is there a way to load the whole page at once with JAVA? ANy help would be appreciated :)

This might be the type of thing that PhantomJS (http://phantomjs.org/) was designed for. It will crawl entire web pages and even execute JavaScript, using a "real" browser in headless mode. I suggest stopping what you're doing with Java and take a look at PhantomJS instead. It could save you a LOT of time. :)

This type of behavior is implemented in the browser, interpreting the user's scrolling actions to load more content via AJAX and dynamically modifying the in-memory DOM in the browser. Consider that your Java runs in a web container on the server, and that web container (i.e. Tomcat, JBoss, etc) provides a huge amount of underlying code so your app doesn't have to worry about the plumbing.
Conceptually, a similar thing occurs at the client, with the DHTML web page running in its own "container" (the browser), which provides a wealth of functionality, from UI to networking, to DOM, etc. If you remove the browser from the equation and replace it with a Java program, you will need to provide the equivalent of the browser in which the DHTML/Javascript can execute.
I believe that HTMLUnit may fill the bill, but have not worked with it personally.

Wondering about capabilities of titanium appcelerator?

We have a web application that uses (java/Java EE, Struts, Hibernate) running on Apache tomcat using MySQL as the DB. It has been up and running for quite a few years, so we have a very large pool of data (millions of row).
We need to convert this web app to a mobile application (cross platform, ios, Android), so we've decided to use the Titanium Appcelerator.
I have quite a few concerns before implementation:
I've heard that titanium gives you very good gui, but what about the functionality? What happens when a user clicks a buttons (sending/retrieving data from db)?
Can I use java to handle this??
I have seen examples of interacting through database, but approx all are using PHP as as a server side language, but nobody knows PHP here.
Though our team has some android exp(all sort of JSON, small client app), I am not sure whether it would be helpful.
Out goal is to convert a huge CRUD web app to a cross platform mobile app (I dont want to lose java on the server). Can Titanium appcelerator handle this?

See the App as something separate. It doesn't matter what is on the other end, as long as you get either JSON or XML (or something else if you prefer).
Titanium Appcelerator is a JavaScript tool that can handle (both build-in) JSON and XML.
To answer your questions:
1: Functionality is really good. It cannot be done by Java, but you'll get events (in JavaScript) which handles click/swipe/press/doubleclick etc. Events are always defined in the Documentation. In your case, the button. You can see what events it can handle there, and what properties you can set.
An example from the docs page adding a button, and having the click event.
var button = Titanium.UI.createButton({
title: 'Hello',
top: 10,
width: 100,
height: 50
});
button.addEventListener('click',function(e)
{
Titanium.API.info("You clicked the button");
});
2: Whatever server side language you use, as long as you export usable content (JSON/XML) it is useable by Titanium. It acts like a client. No need to worry there.
3: as answered above, you can do everything with it you want. On server side you only need to write an API which can handle everything.
I hope this will take away your concerns. If you need more help on other questions, just enter a new question on SO and I'll see them pass by.

As already stated by Topener, Titanium is able to handle your requirements. I'd like to point out something more fundamental:
We need to convert this web app to a mobile application (..), so we've decided to use the Titanium Appcelerator.
I'm somewhat surprised by this reasoning, kinda "We needed a car, so we decided to buy a Nissan." Why not a Ford, a Holden or a Porsche?
There are in fact well over 30 technologies claiming to be able to do cross-platform mobile development. I took a deep look at 16 of them during the course of last year for my master's thesis.
I'd suggest you have a look two other technologies as well. Why? You are converting a web app to a mobile app. Why not consider a framework that allows you to write your app's UI using web technologies? You might be able to port some of the existing UI-code, after all.
PhoneGap (free, now owned by Adobe): You implement the entire app in JavaScript, basically as a WebApp, but you get a native, installable binary that can be distributed using the AppStores. Easy to combine with a SenchaTouch HTML5-UI.
Rhodes (free, now owned by Motorola Systems): You implement the UI in HTML5 and the logic in Ruby. Rhodes provides a really good Object-Mapper and Sync capabilities. As you seem to have quite a bit of data to handle, this could provide a significant advantage over Titanium's SQLite Database. Learning the bits of ruby should not cost you more than a week or so.
If you definitely need a native UI, then the AQUA-Framework might be worth a look... but I havn't tested that one.

How does a browser interact with a Flash Player or a Java Applet?

I've been trying to understand how flash animations or a Java Applet work within a browser.
I can think of a couple of ways -
The Flash Player/Java Applet are machine code that's dynamically linked it, and given
some parameters about the area of the screen that belongs to them; after that, they
run within the same process space.
The browser exposes an API that the player/applet use to talk to it and they live
in a separate process. (Presumably they talk via sockets?) The API could correspond to
openGL/X11/some custom calls.
These possibilities still don't explain things like how a button click can make the
player full-screen, how it can play music, how it can inspect the DOM, etc. For that matter,
is the video displayed by decoding to a sequence of images, and rendering them
one at a time, or is there a more efficient way, e.g., of pushing the deltas in the image?
The Wikipedia page on Java Applets (1)
talks about how the applet is run in a sandbox (presumably a separate process), but
it doesn't say how the browser and the applet communicate.
Perhaps the answer depends on the underlying platform?
Any pointers to systematic discussion of this topic would be appreciated (as would
a reference to the APIs).
(My interest in this stems from an insatiable curiosity.)

I'm pretty sure plugins like Java applets and Flash run via NPAPI in most browsers. I looked into this matter myself some time ago and NPAPI was the answer I found.

In the case of browser and Java applets, the applets are typically run within the Java plugin, which runs as a separate process (you can see it e.g. in the task administrator in Windows).
The plugin creates an object for each applet in the DOM, and you can thus interact with the applet from Javascript. Anyway, calls to the applet that take a while to return do have the effect to freeze the browser, therefore I'd say the communication with the plugin runs in the same thread as the main refresh loop. This seems at least to be the case with Firefox.

Launching a website from within a program, and inputting data to specific fields

Although I've been programming for a few years I've only really dabbled in the web side of things, it's been more application based for computers up until now. I was wondering, in java for example, what library defined function or self defined function I would use to have a program launch a web browser to a certain site? Also as an extension to this how could I have it find a certain field in the website like a search box for instance (if it wasnt the current target of the cursor) and then populate it with a string and submit it to the server? (maybe this is a kind of find by ID scenario?!)
Also, is there a way to control whethere this is visible or not to the user. What I mean is, if I want to do something as a background task whilst the user carries on using the program, I will want the program to be submitting data to a webpage without the whole visual side of things that would interrupt the user?
This may be basic but like I say, I've never tried my hand at it so perhaps if someone could just provide some rough code outlines I'd really appreciate it.
Many thanks

I think Selenium might be what you are looking for.
Selenium allows you to start a Web browser, launch it to a certain website and interact with it. Also, there is a Java API (and a lot of other languages, by the way) allowing you to control the launched browser from a Java application.
There are some tweaking to do, but you can also launch Selenium in background, using a headless Web browser.

as i understand it you want to submit data to a server via the excisting webinterface?
in that case you need to find out how the URL for the request is build and then make a http-call using the corresponding URL
i advice reading this if it involves a POST submit

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.