I want to automatize file conversion available at:
https://www.gpsvisualizer.com/map_input?form=googleearth.
My problem is that, gpsvisualizer allow standalone conversion, but I have 500 files to convert.
So I used hmtlUnit to automatize the process.
Thank to the following code, I am able to modify "select" such as:
"Output file type"
"Add DEM elevation data"
upload my file and get the url of the redirected html page where I can download the wanted file.
My problem, is that I do not find a way to download the file.
Does any one have suggestion ?
Thank, in advance.
Here is my code:
WebClient webClient = new WebClient();
webClient.getOptions().setCssEnabled(false);
webClient.getOptions().setJavaScriptEnabled(true);
webClient.getOptions().setThrowExceptionOnFailingStatusCode(false);
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setRedirectEnabled(true);
//fetching the web page
String url = "https://www.gpsvisualizer.com/map_input?form=googleearth";
//String url = "https://www.reddit.com/r/scraping/";
HtmlPage page = webClient.getPage(url);
System.out.println(page.getUrl());
System.out.println(page.getTitleText());
//Select set .kml file
HtmlSelect selectFileType = (HtmlSelect) page.getElementByName("googleearth_zip");
System.out.println(selectFileType.getOption(0).asText());
//System.out.println(selectFileType.getOption(1).asText());
HtmlOption kmlFile = selectFileType.getOptionByText(".kml (uncompressed)");
System.out.println(kmlFile.asText());
selectFileType.setSelectedAttribute(kmlFile, true);
//Select add elevation on file
HtmlSelect selectelevation = (HtmlSelect) page.getElementByName("add_elevation");
System.out.println(selectelevation.getOption(4).asText());
HtmlOption europeSRTM1 = selectelevation.getOptionByText("NASA SRTM1 (30m res., NoAm, Europe, more)");
System.out.println(europeSRTM1.asText());
selectelevation.setSelectedAttribute(europeSRTM1, true);
//add file
HtmlForm myForm = page.getFormByName("main");
HtmlFileInput fileInput = myForm.getInputByName("uploaded_file_1");
fileInput.setValueAttribute("/media/Stock/Projets/Suratram/Ressources/Traces_WS/puissance/kml_files/01_douce-signoret.kml");
HtmlElement submitBtn = page.getElementByName("submitted");
//page google
HtmlPage page2 = submitBtn.click();
System.out.println(page2.getUrl());
Because i have no sample file, i can only give some general advice
HtmlUnit is a bit strange about downloads - in general it works like this:
there is no download - every response is loaded into a window; HtmlUnit replaces the content of the current window or creates a new window with an UnknownPage making the content available as stream. The decision for a new window is done based on the content type (and some other factors e.g. target of an anchor). As a rule of thumb you can expect to have the download inside a new window if the real browser shows this download dialog.
What does it mean - i guess your page will return something that is detected as separate download by HtmlUnit. You can ask the WebClient for the available windows (webClient.getWebWindows()) and there might be a new one after the submit/click (maybe you have to add some wait if async js is part of the game). This new window will contain an UnknownPage as enclosedPage. And you can ask the unknown page for the response similar to this
Page newPage = newbWin.getEnclosedPage(); // UnknownPage inside window
WebResponse newResponse = newPage.getWebResponse();
try ...
IOUtils.copy(newResponse.getContentAsStream(), outStream);
catch...
As an alternative you can implement an WebWindowListener (has to be registered at the client) to be informed if a new window gets created.
Hope that helps, if you need more please open an issue at github and provide your input file together with the code to let me reproduce your case.
Here is the answer of my problem.
Following the documentation of HtmlUnit, I had a problem trying to convert the downloading page to "Webwindow" object.
HtmlPage page = webClient.getPage(uri);
WebWindow window = page.getEnclosingWindow();
So finally, I do not need to convert it to "Webwindow".
Just to parse "Anchors" to find mine and catch "webResponse" to get the procedded file.
You can find more detail at:
https://github.com/HtmlUnit/htmlunit/issues/352
Thanks to RBRi for its help.
Best
Related
I have a JSP page that displays a PDF document when it is called. Assuming I generate the URL in this format:
http://localhost:8080/repository/file/view/viewPDF.jsp?fileID=27455
and send it to another user. The user can view the document (id:27455) on his browser with no problem. But let's say I want to hide the PDF toolbar shown so user is not allowed to access that toolbar.
I found that by entering this link:
http://localhost:8080/repository/file/view/viewPDF.jsp?fileID=27455#toolbar=0
Then this above will hide the toolbar but it's vulnerable since the other user can change it's value to 1 and the toolbar appears. I am thinking if it's possible to hide it internally in back end code instead but couldn't figure out how.
My viewPDF.jsp:
<%#page import="java.io.*"%>
<%#include file="../../../WEB-INF/jspf/mcre.jspf" %>
<%
response.setContentType("application/pdf");
boolean debug = true;
try {
String snodeid = request.getParameter("nodeID");
long nodeid = Long.parseLong(snodeid);
Pdfinfo pdf = PPFacade.getPDFInfo(nodeid);
String pdfpath = pdf.getFfullpath();
if (debug) {
System.out.println("=============== PDF STREAM ================");
System.out.println("pdfpath = "+ pdfpath);
}
//int len = (int)new File("D://test.pdf").length();
int len = (int)new File(pdfpath).length();
response.setContentLength(len);
byte[] buf = new byte[len];
FileInputStream pdfin = new FileInputStream(pdfpath);
pdfin.read(buf);
pdfin.close();
OutputStream pdfout = response.getOutputStream();
pdfout.write(buf,0,len);
pdfout.flush();
if (debug) {
System.out.println("=============== END PDF STREAM ================");
}
} catch (Exception e) {
System.out.println(e.getMessage());
}
%>
<head>
PDF
</head>
Of course I know hiding #toolbar is not foolproof since any user with such knowledge can easily bypass it.
As the toolbar is a function of the browser, not the server or the pdf file, there's no way to force it to be shown or not without notifying the browser in some way.
And anyone intercepting the information that goes to the browser can indeed modify that information.
Of course even were you able to prevent that, they could always download the PDF to their file system and open it using any tool that allows reading PDFs, including such tools as they can create themselves.
So no, you can't lock down such things. And why would you even want to?
Closest you could come is embedding the PDF in a div that loads the PDF viewer browser plugin and uses an AJAX request to the server to retrieve the PDF content. But even then someone can intercept the request to the server and replicate that request using say curl and download the stream to a file directly.
before someone tells me that there's already this question here, i must say i've tried basically every single example i've found.
The url i'm trying to download has a type of 'audio/wav', embedded in a video tag, or at least this is what i see when running Chrome's element inspector.
The matter is, the URL (which i can't post here) does not point to a .wav file nor anything, but to an ASP page, which seems to generate the audio.
So far so good, the problem here is that i can't really download the audio.
Basically my webclient is created like:
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_38); // Also tried Chrome here.
webClient.getOptions().setThrowExceptionOnScriptError(false);
webClient.getOptions().setUseInsecureSSL(true);
webClient.getOptions().setPopupBlockerEnabled(false);
webClient.setAjaxController(new NicelyResynchronizingAjaxController());
HtmlPage page = (HtmlPage)webClient.getPage(URL);
I've tried creating an anchor element that links to the page containing the audio file:
HtmlElement createdElement = (HtmlElement) page.createElement("a");
createdElement.setAttribute("id", "link_som");
createdElement.setAttribute("href", "../sound.asp?app=audio");
page.appendChild(createdElement);
HtmlAnchor anc =(HtmlAnchor) page.getElementById("link_som", true); //tried this just to make sure it was returning the right anchor
InputStream inputStream = anc.click().getWebResponse().getContentAsStream();
//Writing the inputStream to a file generates a file which has 0 KB.
Also tried running the javascript that links to new URL through HtmlUnit:
ScriptResult resultado = page.executeJavaScript("window.open('../sound.asp?app=audio');");
webClient.waitForBackgroundJavaScript(5000);
HtmlPage paginaRes = (HtmlPage)resultado.getNewPage();
InputStream inputStream =paginaRes.getWebResponse().getContentAsStream(); //Here the inputStream also generates a 0 KB file
Interesting though, is that in all those cases i tried, if i write the inputStream to the console, it returns the main page source, for example:
int binary = 0;
while ((binary = inputStream.read()) != -1)
{
System.out.print((char)binary); //prints the old page source, and in some other tests, prints nothing.
}
Ps.: When opening the URL on chrome manually, it has an embedded player, on FireFox, it asks for Quicktime.
I am able using htmlunit to get audio element
FYI, my version is 2.15
I have solved this a long time already, then just to let others know.
The solution was giving up HTMLUnit and using Selenium with phamtomJS.
I have a link to a PDF and when I click on it I want it to open a new tab and render in the new tab as opposed to asking me to download it. How do I do that?
note, I'm asking this question so I can answer it. This information can be pieced together from other answers, but I'd like it to be all in one place
To open a link in a new tab (PDF or not) you must modify the HTML of that link from
PDF
to
PDF
To open a PDF in the browser you must make a server side change to the response header. In Java, you would do this:
response.setContentType("application/pdf");
response.addHeader("content-disposition", "inline; filename=link_to_pdf.pdf");
Of key importance is the inline. If you put attachment, your browser will try to download it instead. You can read more here.
The secret is using InputStreamResource in method response instead as ResponseEntity:
#GetMapping(path = "/pdf/{key}", produces = MediaType.APPLICATION_PDF_VALUE)
#ResponseBody
public InputStreamResource pdf(#PathVariable("key") String key){
InputStream file = pdfservice.get(key);
return new InputStreamResource(file);
}
I want to show the file content in new tab in browser. What i have done is this:
int BUFF_SIZE = 102400;
FileInputStream is = null;
byte[] buffer = new byte[BUFF_SIZE];
int a = -1;
try
{
is = new FileInputStream(file);
ByteArrayOutputStream out = new ByteArrayOutputStream();
while((a = is.read(buffer)) != -1)
{
out.write(buffer);
}
out.flush();
out.close();
ServletOutputStream os = null;
os = response.getOutputStream();
os.write(out.toByteArray());
os.close();
is.close();
}
catch(Exception e)
{
// Exception handling
}
But this is leading to download of the file instead of opening the file-content in new tab.
I am not able to find what i am doing wrong.
Any help would be great!!
Actually, all you should need to do now is add JQuery to your webpage, and use JQUery.get. Once you get the html from the servlet, use jquery or javascript to set the text in your tab.
BTW, you might want to set other details on the servlet output stream, like file type, length etc. Just a thought
You could also try this with the omnifaces library
Faces.sendFile(file, false);//true makes it as an attachment
more information on http://omnifaces.org/docs/javadoc/1.8/org/omnifaces/util/Faces.html#sendFile(java.io.File,%20boolean)
A web application might not even know what is a brower. It receives requests through HTTP protocol and send responses through same protocol. The protocol by itsels knows nothing about browsers and tabs.
You must use javascript for anything that happens at browser level. Other answers adviced you to use jQuery. It is a well known javascript library that hides differences between browsers, but there are others around (dojo, extJs, ...) : Google and make your choice.
By the way, if all you want is open an URL in a new tab, that's one of the very few operations that you can do at HTML level. Just look at this example
from W3Schools.com :
Visit W3Schools!
that opens www.w3schools.com in a new tab (if browser has tabs what is now common) or a new window.
I need to download a file from the following URL.
http://www.census.gov/manufacturing/m3/
(Advance report highlights->excel selection menu ->Table 1)
as soon as we click a selection from the drop box, the web browser directly prompts to download and save the excel file.
Can you suggest a method to go about this ?
I am attempting this using HtmlUnit with the following code.
I get a Null pointer at line HtmlOption option = select.getOption(2);
If there is a better solution with another class, I would not mind considering that as well.
public static void startDownload() throws Exception {
final WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.census.gov/manufacturing/m3/");
HtmlSelect select = (HtmlSelect) page.getElementById("advance_xls");
HtmlOption option = select.getOption(2);
webClient.closeAllWindows();/**/
}
Update the code,there was some minor mistake :
final WebClient webClient = new WebClient();
HtmlPage page = webClient.getPage("http://www.census.gov/manufacturing/m3/");
HtmlSelect select = (HtmlSelect) page.getElementByName("advanced_xls");
InputStream is = select.setSelectedAttribute("/manufacturing/m3/adv/table2a.xls",true).getWebResponse().getContentAsStream();
The excel sheet will be downloaded as inputStream then you need to create excel sheet from it.