Download wav files in HTMLUnit - java

before someone tells me that there's already this question here, i must say i've tried basically every single example i've found.
The url i'm trying to download has a type of 'audio/wav', embedded in a video tag, or at least this is what i see when running Chrome's element inspector.
The matter is, the URL (which i can't post here) does not point to a .wav file nor anything, but to an ASP page, which seems to generate the audio.
So far so good, the problem here is that i can't really download the audio.
Basically my webclient is created like:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38); // Also tried Chrome here.
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = (HtmlPage)webClient.getPage(URL);
I've tried creating an anchor element that links to the page containing the audio file:
HtmlElement createdElement = (HtmlElement) page.createElement("a");
createdElement.setAttribute("id", "link_som");
createdElement.setAttribute("href", "../sound.asp?app=audio");
page.appendChild(createdElement);
HtmlAnchor anc =(HtmlAnchor) page.getElementById("link_som", true); //tried this just to make sure it was returning the right anchor
InputStream inputStream = anc.click().getWebResponse().getContentAsStream();
//Writing the inputStream to a file generates a file which has 0 KB.
Also tried running the javascript that links to new URL through HtmlUnit:
ScriptResult resultado = page.executeJavaScript("window.open('../sound.asp?app=audio');");
webClient.waitForBackgroundJavaScript(5000);
HtmlPage paginaRes = (HtmlPage)resultado.getNewPage();
InputStream inputStream =paginaRes.getWebResponse().getContentAsStream(); //Here the inputStream also generates a 0 KB file
Interesting though, is that in all those cases i tried, if i write the inputStream to the console, it returns the main page source, for example:
int binary = 0;
while ((binary = inputStream.read()) != -1)
{
System.out.print((char)binary); //prints the old page source, and in some other tests, prints nothing.
}
Ps.: When opening the URL on chrome manually, it has an embedded player, on FireFox, it asks for Quicktime.

I am able using htmlunit to get audio element
FYI, my version is 2.15

I have solved this a long time already, then just to let others know.
The solution was giving up HTMLUnit and using Selenium with phamtomJS.

Related

Download a file process by server, with HtmlUnit

I want to automatize file conversion available at:
https://www.gpsvisualizer.com/map_input?form=googleearth.
My problem is that, gpsvisualizer allow standalone conversion, but I have 500 files to convert.
So I used hmtlUnit to automatize the process.
Thank to the following code, I am able to modify "select" such as:
"Output file type"
"Add DEM elevation data"
upload my file and get the url of the redirected html page where I can download the wanted file.
My problem, is that I do not find a way to download the file.
Does any one have suggestion ?
Thank, in advance.
Here is my code:
WebClient webClient = new WebClient();
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setRedirectEnabled(true);
//fetching the web page
String url = "https://www.gpsvisualizer.com/map_input?form=googleearth";
//String url = "https://www.reddit.com/r/scraping/";
HtmlPage page = webClient.getPage(url);
System.out.println(page.getUrl());
System.out.println(page.getTitleText());
//Select set .kml file
HtmlSelect selectFileType = (HtmlSelect) page.getElementByName("googleearth_zip");
System.out.println(selectFileType.getOption(0).asText());
//System.out.println(selectFileType.getOption(1).asText());
HtmlOption kmlFile = selectFileType.getOptionByText(".kml (uncompressed)");
System.out.println(kmlFile.asText());
selectFileType.setSelectedAttribute(kmlFile, true);
//Select add elevation on file
HtmlSelect selectelevation = (HtmlSelect) page.getElementByName("add_elevation");
System.out.println(selectelevation.getOption(4).asText());
HtmlOption europeSRTM1 = selectelevation.getOptionByText("NASA SRTM1 (30m res., NoAm, Europe, more)");
System.out.println(europeSRTM1.asText());
selectelevation.setSelectedAttribute(europeSRTM1, true);
//add file
HtmlForm myForm = page.getFormByName("main");
HtmlFileInput fileInput = myForm.getInputByName("uploaded_file_1");
fileInput.setValueAttribute("/media/Stock/Projets/Suratram/Ressources/Traces_WS/puissance/kml_files/01_douce-signoret.kml");
HtmlElement submitBtn = page.getElementByName("submitted");
//page google
HtmlPage page2 = submitBtn.click();
System.out.println(page2.getUrl());
Because i have no sample file, i can only give some general advice
HtmlUnit is a bit strange about downloads - in general it works like this:
there is no download - every response is loaded into a window; HtmlUnit replaces the content of the current window or creates a new window with an UnknownPage making the content available as stream. The decision for a new window is done based on the content type (and some other factors e.g. target of an anchor). As a rule of thumb you can expect to have the download inside a new window if the real browser shows this download dialog.
What does it mean - i guess your page will return something that is detected as separate download by HtmlUnit. You can ask the WebClient for the available windows (webClient.getWebWindows()) and there might be a new one after the submit/click (maybe you have to add some wait if async js is part of the game). This new window will contain an UnknownPage as enclosedPage. And you can ask the unknown page for the response similar to this
Page newPage = newbWin.getEnclosedPage(); // UnknownPage inside window
WebResponse newResponse = newPage.getWebResponse();
try ...
IOUtils.copy(newResponse.getContentAsStream(), outStream);
catch...
As an alternative you can implement an WebWindowListener (has to be registered at the client) to be informed if a new window gets created.
Hope that helps, if you need more please open an issue at github and provide your input file together with the code to let me reproduce your case.
Here is the answer of my problem.
Following the documentation of HtmlUnit, I had a problem trying to convert the downloading page to "Webwindow" object.
HtmlPage page = webClient.getPage(uri);
WebWindow window = page.getEnclosingWindow();
So finally, I do not need to convert it to "Webwindow".
Just to parse "Anchors" to find mine and catch "webResponse" to get the procedded file.
You can find more detail at:
https://github.com/HtmlUnit/htmlunit/issues/352
Thanks to RBRi for its help.
Best

JSP- How to hide toolbar in backend when viewing PDF on browser?

I have a JSP page that displays a PDF document when it is called. Assuming I generate the URL in this format:
http://localhost:8080/repository/file/view/viewPDF.jsp?fileID=27455
and send it to another user. The user can view the document (id:27455) on his browser with no problem. But let's say I want to hide the PDF toolbar shown so user is not allowed to access that toolbar.
I found that by entering this link:
http://localhost:8080/repository/file/view/viewPDF.jsp?fileID=27455#toolbar=0
Then this above will hide the toolbar but it's vulnerable since the other user can change it's value to 1 and the toolbar appears. I am thinking if it's possible to hide it internally in back end code instead but couldn't figure out how.
My viewPDF.jsp:
<%#page import="java.io.*"%>
<%#include file="../../../WEB-INF/jspf/mcre.jspf" %>
<%
response.setContentType("application/pdf");
boolean debug = true;
try {
String snodeid = request.getParameter("nodeID");
long nodeid = Long.parseLong(snodeid);
Pdfinfo pdf = PPFacade.getPDFInfo(nodeid);
String pdfpath = pdf.getFfullpath();
if (debug) {
System.out.println("=============== PDF STREAM ================");
System.out.println("pdfpath = "+ pdfpath);
}
//int len = (int)new File("D://test.pdf").length();
int len = (int)new File(pdfpath).length();
response.setContentLength(len);
byte[] buf = new byte[len];
FileInputStream pdfin = new FileInputStream(pdfpath);
pdfin.read(buf);
pdfin.close();
OutputStream pdfout = response.getOutputStream();
pdfout.write(buf,0,len);
pdfout.flush();
if (debug) {
System.out.println("=============== END PDF STREAM ================");
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
%>
<head>
PDF
</head>
Of course I know hiding #toolbar is not foolproof since any user with such knowledge can easily bypass it.
As the toolbar is a function of the browser, not the server or the pdf file, there's no way to force it to be shown or not without notifying the browser in some way.
And anyone intercepting the information that goes to the browser can indeed modify that information.
Of course even were you able to prevent that, they could always download the PDF to their file system and open it using any tool that allows reading PDFs, including such tools as they can create themselves.
So no, you can't lock down such things. And why would you even want to?
Closest you could come is embedding the PDF in a div that loads the PDF viewer browser plugin and uses an AJAX request to the server to retrieve the PDF content. But even then someone can intercept the request to the server and replicate that request using say curl and download the stream to a file directly.

How do you open a PDF in a new tab and show it in the browser (don't ask to download)?

I have a link to a PDF and when I click on it I want it to open a new tab and render in the new tab as opposed to asking me to download it. How do I do that?
note, I'm asking this question so I can answer it. This information can be pieced together from other answers, but I'd like it to be all in one place
To open a link in a new tab (PDF or not) you must modify the HTML of that link from
PDF
to
PDF
To open a PDF in the browser you must make a server side change to the response header. In Java, you would do this:
response.setContentType("application/pdf");
response.addHeader("content-disposition", "inline; filename=link_to_pdf.pdf");
Of key importance is the inline. If you put attachment, your browser will try to download it instead. You can read more here.
The secret is using InputStreamResource in method response instead as ResponseEntity:
#GetMapping(path = "/pdf/{key}", produces = MediaType.APPLICATION_PDF_VALUE)
#ResponseBody
public InputStreamResource pdf(#PathVariable("key") String key){
InputStream file = pdfservice.get(key);
return new InputStreamResource(file);
}

PDF export printing in Internet Explorer

protected static byte[] exportReportToPdf(JasperPrint jasperPrint)
throws JRException {
JRPdfExporter exporter = new JRPdfExporter();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
exporter.setParameter(JRExporterParameter.JASPER_PRINT, jasperPrint);
exporter.setParameter(JRExporterParameter.OUTPUT_STREAM, baos);
exporter.setParameter(JRPdfExporterParameter.PDF_JAVASCRIPT,
"this.print({bUI: true,bSilent: false,bShrinkToFit: true});");
exporter.exportReport();
return baos.toByteArray();
}
We are using code like this to export a PDF document from a Jasper application.
The line
exporter.setParameter(JRPdfExporterParameter.PDF_JAVASCRIPT,
"this.print({bUI: true,bSilent: false,bShrinkToFit: true});");
adds JavaScript to send the PDF document directly to the printer.
The expected behavior is that a print dialog will come up with a preview of the PDF document.
This works fine most of the time - except I am having problems about one out of every 5-6 times in Internet Explorer 8 and Firefox.
What happens is - the print preview dialog with the PDF document does not appear or it appears with a blank document in the preview window.
-I've tried a number of different JavaScripts (different params to this.print() via exporter.setParameter
-I've tried setting different response headers such as
response.setContentType("application/pdf");
response.setHeader("Content-disposition","inline; filename=\""
+ reportName
+ "\"");
response.setContentLength(baos.size());
these did not seem to help
This seems to be an IE and FF issue. Has anyone ever dealt with this problem? I need to get it to work across all browsers 100% of the time. Perhaps a different approach to accomplish the goal of sending the PDF document export directly to the printer? or a third party library that will work across browsers?
Maybe it isn't getting a chance to update the UI. The following code delays the print perhaps giving it the chance it needs. I didn't test as I don't have your environment.
exporter.setParameter(JRPdfExporterParameter.PDF_JAVASCRIPT,
"app.setTimeOut('this.print({bUI: true,bSilent: false,bShrinkToFit: true});',200);")

Java URL problem

A webpage contains a link to an executable (i.e. If we click on the link, the browser will download the file on your local machine).
Is there any way to achieve the same functionality with Java?
Thank you
Yes there is.
Here a simple example:
You can have a JSF(Java Server Faces) page, with a supporting backing bean that contains a method annotated with #PostConstruct This means that any action(for example downloading), will occur when the page is created.
There is already a question very similar already, have a look at: Invoke JSF managed bean action on page load
You can use Java's, URL class to download a file, but it requires a little work. You will need to do the following:
Create the URL object point at the file
Call openStream() to get an InputStream
Open the file you want to write to (a FileOutputStream)
Read from the InputStream and write to the file, until there is no more data left to read
Close the input and output streams
It doesn't really matter what type of file you are downloading (the fact that it's an executable file is irrelevant) since the process is the same for any type of file.
Update: It sounds like what you actually want is to plug the URL of a webpage into the Java app, and have the Java app find the link in the page and then download that link. If that is the case, the wording of your question is very unclear, but here are the basic steps I would use:
First, use steps 1 and 2 above to get an InputStream for the page
Use something like TagSoup or jsoup to parse the HTML
Find the <a> element that you want and extract its href attribute to get the URL of the file you need to download (if it's a relative URL instead of absolute, you will need to resolve that URL against the URL of the original page)
Use the steps above to download that URL
Here's a slight shortcut, based on jsoup (which I've never used before, I'm just writing this from snippets stolen from their webpage). I've left out a lot of error checking, but hey, I usually charge for this:
Document doc = Jsoup.connect(pageUrl).get();
Element aElement = doc.getElementsByTag("a").first() // Obviously you may need to refine this
String newUrl = aElement.attr("abs:href"); // This is a piece of jsoup magic that ensures that the destination URL is absolute
// assert newUrl != null
URL fileUrl = new URL(newUrl);
String destPath = fileUrl.getPath();
int lastSlash = destPath.lastIndexOf('/');
if (lastSlash != -1) {
destPath = destPath.substring(lastSlash);
}
// Assert that this is really a valid filename
// Now just download fileUrl and save it to destPath
The proper way to determine what the destination filename should be (unless you hardcode it) is actually to look for the Content-Disposition header, and look for the bit after filename=. In that case, you can't use openStream() on the URL, you will need to use openConnection() instead, to get a URLConnection. Then you can use getInputStream() to get your InputStream and getRequestProperty("Content-Disposition") to get the header to figure out your filename. In case that header is missing or malformed, you should then fall-back to using the method above to determine the destination filename.
You can do this using apache commons IO FileUtils
http://commons.apache.org/io/apidocs/org/apache/commons/io/FileUtils.html#copyURLToFile(java.net.URL, java.io.File)
Edit:
I was able to successfully download a zip file from source forge site (it is not empty), It did some thing like this
import java.io.File;
import java.net.URL;
import org.apache.commons.io.FileUtils;
public class Test
{
public static void main(String args[])
{
try {
URL url = new URL("http://sourceforge.net/projects/gallery/files/gallery3/3.0.2/gallery-3.0.2.zip/download");
FileUtils.copyURLToFile(url, new File("test.zip"));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
I was able successfully download tomcat.exe too
URL url = new URL("http://archive.apache.org/dist/tomcat/tomcat-6/v6.0.16/bin/apache-tomcat-6.0.16.exe");

Categories