Download web page using Java after javascript executes - java

I have a problem that doesn't seem to be answered clearly in StackOverflow. I want to download a page using Java and retrieve some data from it in order to give some values to an application that I develop. This page is a betting site so it contains javascrit methods to change the betting values.
In order to do some tests I downloaded the page manually using Ctrl-S and then I made a programm (with FileReader, BufferedReader, etc...) which retrieves the data. This worked perfectly. So I would make a bash script to be executed in order to download the page every time when the user opens my application.
After that I searched for methods who download the page programmatically (I used Jsoup, URL, ...). What I noticed is that the javascript variable values couldn't be printed because the javascript code wasn't executed.
What i want to know is that if there is some way to download programmatically the executed website (download the instance of the javascript values) without having to make some bash script to do it every time before someone opens my app.

Try HtmlUnit. It is used for automatic testing but should fit your purpose well too!

Related

Automate web form filling to export report reports

I have a daily task of downloading a Report[an Excel file] . Before I click the download button, certain fields have to filled and some checkboxes need to be checked. On clicking the executable file on my desktop, the whole process must happen in one go. I m looking for an opensource solution either in Javascript or JQuery to automate this download.
After the information you shared, I'm afraid you'll have to write an application in a desktop programming language. It doesn't really matter which language you use, and I'm not going to make suggestions since I don't know your situation.
The fun part will be determining what the browser does. The browser will basically send a HTTP request, either a POST or a GET (I don't know which one it is since I don't know the page). You'll have to open your developer tools in your browser, check what it sends and how it sends it and where it sends it, then recreate that using your language of choice. You then need to read the response and turn it into an excel file.
Now, this is quite challenging and I don't know how experienced you are. If you are willing to give up the desktop executable requirement, you can write a Greasemonkey script (in Firefox) or a Chrome extension (for Chrome, duh) which can automate this from the browser. Both use Javascript and are arguably easier to create since you don't need to recreate the HTTP request or reverse engineer what the browser does.

how to extract HTML data from a webpage which scrolls down for a fixed number of times?

I want to extract HTML data from a website using JAVA. The problem is the webpage keeps scrolling down once the user reaches the bottom of the page. Number of times it scrolls down is fixed. My JAVA code can extract only for the 1st part. How do I extract for the remaining scrolls? Is there a way to load the whole page at once with JAVA? ANy help would be appreciated :)
This might be the type of thing that PhantomJS (http://phantomjs.org/) was designed for. It will crawl entire web pages and even execute JavaScript, using a "real" browser in headless mode. I suggest stopping what you're doing with Java and take a look at PhantomJS instead. It could save you a LOT of time. :)
This type of behavior is implemented in the browser, interpreting the user's scrolling actions to load more content via AJAX and dynamically modifying the in-memory DOM in the browser. Consider that your Java runs in a web container on the server, and that web container (i.e. Tomcat, JBoss, etc) provides a huge amount of underlying code so your app doesn't have to worry about the plumbing.
Conceptually, a similar thing occurs at the client, with the DHTML web page running in its own "container" (the browser), which provides a wealth of functionality, from UI to networking, to DOM, etc. If you remove the browser from the equation and replace it with a Java program, you will need to provide the equivalent of the browser in which the DHTML/Javascript can execute.
I believe that HTMLUnit may fill the bill, but have not worked with it personally.

Launching a website from within a program, and inputting data to specific fields

Although I've been programming for a few years I've only really dabbled in the web side of things, it's been more application based for computers up until now. I was wondering, in java for example, what library defined function or self defined function I would use to have a program launch a web browser to a certain site? Also as an extension to this how could I have it find a certain field in the website like a search box for instance (if it wasnt the current target of the cursor) and then populate it with a string and submit it to the server? (maybe this is a kind of find by ID scenario?!)
Also, is there a way to control whethere this is visible or not to the user. What I mean is, if I want to do something as a background task whilst the user carries on using the program, I will want the program to be submitting data to a webpage without the whole visual side of things that would interrupt the user?
This may be basic but like I say, I've never tried my hand at it so perhaps if someone could just provide some rough code outlines I'd really appreciate it.
Many thanks
I think Selenium might be what you are looking for.
Selenium allows you to start a Web browser, launch it to a certain website and interact with it. Also, there is a Java API (and a lot of other languages, by the way) allowing you to control the launched browser from a Java application.
There are some tweaking to do, but you can also launch Selenium in background, using a headless Web browser.
as i understand it you want to submit data to a server via the excisting webinterface?
in that case you need to find out how the URL for the request is build and then make a http-call using the corresponding URL
i advice reading this if it involves a POST submit

Java: "Control" External Application

Is it possible to programmatically start an application from Java and then send commands to it and receive the program's output?
I'm trying to realize this scenario:
I want to access a website that uses lots of javascript and special html + css features -> the website isn't properly displayed in swt.browser or any of the other of the available Browser Widgets. But the website can be displayed without any problems in firefox. So I want to run a hidden instance of firefox, load the website and get the data. (It would be nice if FF can be embedded in a JFrame or so..)
Has anybody got an idea how to realize this?
Any help would really be appreciated!
EDIT: The website loads some Javascript that does some html magic and loads some pictures. When I only read the html from the website I see nothing more than some JavaScript calls. But when the website is loaded in a Browser, it displays some images overlayed with text. That's what I'm trying to show the user of my app.
To start Firefox from within the application, you could use:
Runtime runtime = Runtime.getRuntime();
try {
String path = "/path/to/firefox";
Process process = runtime.exec(path + " " + url);
} catch (IOException e) {
// ...
}
To manipulate processes once they have started, one can often use process.getInputStream() and process.getOutputStream(), but that would not help you in the case of Firefox.
You should probably look into ways of solving your specific problem other than trying to interact directly between your application and a browser instance. Consider either moving the whole interface into a Java gui, or doing a web app from the ground up -- not half and half.
See this article - it will teach you how to start a process, read its output and write to its input stream.
However this solution may be not be the best for your problem. What kind of data do you need to get from the Web Page? Would it be better to read the html with an HTTP GET and then parse it with an Html parser?
If you have a text-mode browser available (like links2 on linux) you might want to see how well that can render the page. For example, the command "links -dump http://someurl.com" will format the page as text and exit immediately, resulting in output that might be easily parseable using the methods that Ray Myers and kgiannakakis suggest.
If the website is static, you could use a web scraper like Jericho to load the URL, parse the HTML and wander your way through the DOM to the info you need.
Although a similar feature to what you describe is planned for FireFox in the future, it is not available yet. The feature is dubbed TaskFox, and from the linked wiki, "its aim is to allow users to quickly access information and perform tasks that would normally take several steps to complete."
News of the upcoming TaskFox feature just broke today, in fact. Perhaps you should consider a career being a psychic instead of a programmer.

How to programatically send input to a java app running in a browser window?

Consider the most excellent wordle tag cloud generator:
http://www.wordle.net/create
Entering text into the "textform" textarea and clicking the go button starts up the wordle java applet on that page. No traffic goes back to the server.
How can I cause this to happen programmatically? No hack too cheap!!
background for this question:
"tag cloud" generators?
If you mean starting it programmatically from a browser page, you can use the same type of JavaScript that that page uses, which calls the function Wordle.t() to start the applet.
If you want to call it from a Java program, you can download the Wordle.class or jar file yourself, and call the functions directly.
I'm the creator of Wordle.
In case anyone finds this page in the future, I thought it would be useful to explain that Wordle invokes its applet by constructing an applet tag with a huge <param> containing a sanitized version of whatever text you pasted in. It is the cheapest hack imaginable.

Categories