Export data from a webpage - java

I want to export some data mostly strings and images from a html webpage to the java\android program that I want to write, can anyone give me a hint about that?

You could use the HTTPClient Library of Apache.
On this site Link you can find examples for it.
And if you want to go that way, remember to download the apache logging jar as well, as it is a dependency of the HttpClient Library

Related

Best way to convert HTML page into PDF file

I am writing a new service Convert-HTML-TO-PDF. But now I am confused that what way should I prefer.
What ways I have to implement:
Use Head-less browser and capture the HTML page and convert to PDF
Use Java/Node Lib to convert. Which will create HTML relevant component in PDF file and then render?
Now, please help me to understand what will be the best way to implement a service and why!
[update]
And what will be the advantages and disadvantages of each approach
In my view, the best way forward always depends on what you already have experience with and what approach you take. There is no right or wrong here, everyone has to decide that for themselves based on their preferences.
Each approach has advantages and disadvantages. Some of them are:
Headless Browser:
Advantage:
No large Libs necessary, therefore very memory saving
Disadvantage
the desired browser must be installed on the computer/server
rendering may differ for different browsers
Library:
Advantage:
different libraries available
for the popular libs there is a good documentation and code examples
Disadvantage
When upgrading to a newer version, code usually needs to be adapted.
When upgrading to a newer version, the result may look different.
In my projects I use a headless chrome browser. For this I found an easy to use api on Github, which uses the DevTools of Chrome.
It also includes a simple example how to print a page into a PDF.
For my purposes I have customized this example and write the HTML into a temporary file and then navigate to that file.
// Navigate to HTML-File
page.navigate(htmlTempFile.getAbsolutePath());
I can't say if this is the best way, but for me this was the easiest and most understandable way

Download Jar file from private GitHub repo in Java

I have 3 private GitHub repos, using Java, I would like to login to my account, and download a jar file from the RAW section of the repo. A simple task if the repos were public. But not so when private.
I have thought about using Apache2's HttpClient. However I have no clue (and googling didn't help either) how GitHub's auth is laid out.
I thought there might be some kind of library for GitHub in Java, but the only Lib I can find doesn't allow downloading of files (here). Only logging in to the GitHub auth and pushing commits/fetching repos, etc. Which isn't what I am looking to do.
Any help is greatly appreciated, thanks.
Since I can't create an answer.
Using the API here
You can create a download service and download the repo. I've not worked out how yet, but I'm pretty sure it's possible, and I will update this answer once I've done it. I need to grab the "IRepositoryIdProvider"
You can use Apache common's HttpClient library to give better control of the credentials/cookies/auth stuff to allow you to get the access. I think that is the problem, right?

How do I get the rendered web page page from a URL?

I don't want just the source code. I want the rendered page. This is an important distinction that I apparently cannot make by simply searching Google.
Does anyone know how I can get the rendered page from a URL?
This needs to be done in Java, hopefully without an extra library.
Another solution would be to use HTMLUnit which is a "GUI-less browser for JAVA". It is recommended by Google to generate snapshots of ajax-based webpages to make them crawlable.
You can try using a library that wraps a web browser, for example Berkelium. If you need it in Java, a Google search produced this Java wrapper API for Berkelium (I haven't tried it personally).
sites.google has an example of its use:

Uploading Files in GWT without GAE and Apache Commons

I would just like to ask if there is a way to upload files in GWT WITHOUT using the Google App Engine and Apache Commons archives? I've been searching for ways to upload files in GWT but all of the solutions I find all make use of these two. I would just like to know if there is a way, because our app won't work if we use GAE and Apache Commons... Thank you very much!
Yes, the simplest would be (assuming you are saving files to Blobstore):
On GWT side use FileUpload. Here is an example on how to use it.
On GAE side use BlobStore upload handler.
The other option would be to use gwt-upload with GAE upload handler.
GWT does not have such feature. You need to use some existing library or handle multipart form submits by yourself (FileUpload class). There is for example a gwtupload library which I have used and it worked pretty fine (but it is based on commons-upload AFAIR). There are other libs for sure.
http://code.google.com/p/gwtupload/

Clientside Javascript --> Serverside Java --> user is served a .doc

I am helping someone out with a javascript-based web app (even though I know next to nothing about web development) and we are unsure about the best way to implement a feature we'd like to have.
Basically, the user will be using our tool to view all kinds of boring data in tables, columns, etc. via javascript. We want to implement a feature where the user can click a button or link that then allows the user to download the displayed data in a .doc file.
Our basic idea so far is something like:
call a Java function on the server with the desired data passed in as a String when the link is clicked
generate the .doc file on the server
automatically "open" a link to the file in the client's browser to initiate the download
Is this possible? If so, is it feasible? Or, can you recommend a better solution?
edit: the data does not reside on the server; rather, it is queried from a SQL database
Yep, its possible. Your saviour is the Apache POI library. Its HWPF library will help you generate Microsoft word files using java. The rest is just clever use of HTTP.
Your basic idea sounds a bit Rube-Goldbergesque.
Is the data you want in the document present on the server? If so, then all you need to do is display a plain HTML link with GET parameters that describes the data (i.e. data for customer X from date A to date B). The link will be handled on the server by a Servlet that gets the data and produces the .DOC file as its output to be downloaded by the browser - a very simple one-step process that doesn't even involve any JavaScript.
Passing large amount data as GET/POST around might not be the best idea. You could just pass in the same parameters you used to generate the HTML page earlier. You don't even need to use 3rd party library to generate DOC. You could just generate a plain old HTML file with DOC extension and Word will be happy to open it.
Sounds like Docmosis Java library could help - check out theonline demo since shows it something similar to what you're asking - generating a real doc file from a web site based on selections in the web page. Docmosis can query from databases and run pretty much anywhere.

Categories