Fetching the entire web page from a specific URL using Java - java

Can I fetch the entire web page, including CSS and images, using Java? That is basically what happens when using "save as" action in a browser. I can use any free 3rd party library.
edit:
HtmlUnit library seems to be doing exactly what I need. This is how I use it to grab the entire web page:
WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage(new URL("..."));
page.save(new File("..."));

Java has some built in functions that you can utilize to open a stream the external sources say a web server and request a page which would return you the source to the page. You would then need to parse the links to external images and css and requests and save them accordingly.
here is a link to an example of opening a stream to an external source being a website

maybe lobo browser help you. It is an open source free browser completely by java. It has some jar libraries that can be added to your project.

Related

Jsoup get dynamically generated HTML

I can connect to most sites and get the HTML just fine but when trying to connect to a website where most of the content is generated after the initial page load with JavaScript, it does not get any of that data. Is there any way to do this with Jsoup or does it not support it?
JSoup has some basic connection handling included, but it is not a web browser. It excels at parsing static html content. It does not run any javascript, so you are out of luck. However, there are different options that you might follow:
You can analyze the page that you want to retrieve and find out how the content you are interested in gets loaded. Often it is not very hard to tap the original source of the loaded content and work with this. This approach has the benefit that you get what you want with no need of extra libraries and the retrieval will be fast.
You can use a (full) browser and automate the loading of the page. A very good tool for this is selenium webdriver in combination with the headless webkit browser phantomjs. This however requires extra software and extra libraries in your project and will run much much slower than the first solution.

How to display MS-Excel and PDF document in web browser?

In the project I'm currently working on need to display pdf or excel files to users in their web browser.
We are using java to build up server side and jquery as main js lib for front-end.
What should I do to make this possible?
Or say, what jar or js do I need to rely on (perferably a js lib, but, well.. I have no clue right now..)?
Thanks in advance. :)
It's simple buddy, follow these steps:
Open http://www.scribd.com/
Sign up
Sign in
Upload your excel or pdf file
Get it's iframe code
Place the iframe code in your webpage.
Done
Here are some few clews :
For pdf you can use the built-in of your browser if any, or you could use pdf.js.
For Microsoft documents, you should use Apache POI on server side, and maybe convert it in an other format like csv or json to send it back to your js client
You can make use Jquery datatables tool, to enable export to PDF/ excel. I have used this in earlier project, works very well. It's best part is it is configurable.
http://www.datatables.net/

Direct file upload using Ajax or JQuery (with or without a form)

I'm trying to use the second "Direct file upload" method described at the end of the page here: http://www.playframework.org/documentation/2.0/JavaFileUpload
How do I implement the required Ajax/Jquery/Js function that will allow me to use this? Can anyone please provide some hits or snippets?
Thanks.
You cannot upload files using AJAX. At least not in browsers that do not support the HTML5 File API. For those browsers you could use some existing file upload control such as Uploadify, Blueimp File upload, Valums File Uploader, ...
Those controls detect whether the browser supports the File API and it will use it. If it doesn't it will use other techniques such as using hidden <iframe>, Flash, Silverlight, ...

Java embedded browser with resources in memory

We have a Java desktop app with an embedded browser, now using XULRunner (Firefox engine) on SWT. This browser's API allows us to load webs specifying an URI or its HTML content.
What we need is to load HTML webpages including resources but being everything in memory. The best solution would be to provide a listener used when the engine tries to load resources so we can send it the appropriate content.
Any ideas? thank you!
It sounds like you need a small HTTP / web server. There is Jetty, there are also a few smaller ones, just search for "small java web server" or so.
In HTML 5 your can put your resources inside the HTML itself.
So you can use SWT with browser that supports HTML 5 and prepare your webpages to have resources inside HTML 5.
With SWT Browser your can simply do browser.setText(html) to load the page from memory.

How can I get html content from a browser that can do the html correction and js scripting?

I need a solution for getting HTML content from a browser. As rendering in a browser, js will be ran, and if not, js won't be ran. So any html libraries like lxml, beautifulsoup and others are all not gonna work.
I've searched a project named pywebkitgtk, but it's purpose is to create a browser with a front end.
Is there any way to put a url into a "fake browser" and render it and run its all javascript and save it into a html file? I don't need any front-end, just back-end is ok.
I need to use Python or java to do that.
selenium-rc lets you drive an actual browser for your purpose, under control of any of several languages at your choice, which include both Python and Java. Check it out!
For a detailed example of use with Python, see here.

Categories