Certain PDF files are not downloading correctly - java

I have very little experience in JAVA (working on my first real program) been looking for a solution for hours. I have hacked together a small program to download PDF files from a link. It works fine for most links but some of them just don't work.
The connection type for all the links that works show up as application/pdf but some links show a connection of text/html for some reason.
I keep trying to rewrite the code using whatever I can find online but I keep getting the same result.
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.FileOutputStream;
import java.net.ConnectException;
import java.net.URL;
import java.net.URLConnection;
public class Main {
public static void main(String[] args) throws Exception {
String link = "https://www.menards.com/main/items/media/UNITE051/SDS/SpectracideVegetationKillerReadyToUse2-228-714-8845-SDS-Feb16.pdf";
String fileName = "File Name.pdf";
URL url1 = new URL(link);
try {
URLConnection urlConn = url1.openConnection();
byte[] buffer = new byte[1024];
double downloaded = 0.00;
int read = 0;
System.out.println(urlConn.getContentType()); // This shows as text/html but it should be PDF
FileOutputStream fos1 = new FileOutputStream(fileName);
BufferedInputStream is1 = new BufferedInputStream(urlConn.getInputStream());
BufferedOutputStream bout = new BufferedOutputStream(fos1, 1024);
try {
while ((read = is1.read(buffer, 0, 1024)) >= 0) {
bout.write(buffer, 0, read);
downloaded += read;
}
bout.close();
fos1.flush();
fos1.close();
is1.close();
} catch (Exception e) {}
} catch (Exception e) {}
}
}
I need to be able to download the PDF from the link in the code.
This is what is saved in a text document of the PDF:
<html>
<head>
<META NAME="robots" CONTENT="noindex,nofollow">
<script src="/_Incapsula_Resource?SWJIYLWA=5074a744e2e3d891814e9a2dace20bd4,719d34d31c8e3a6e6fffd425f7e032f3">
</script>
<body>
</body></html>

The website implemented a check to make sure I was using a browser. I copied the user agent from chrome and it allowed me to download the PDF.

The URL that you are fetching doesn't point to a PDF file. It is pointing to a HTML file which embeds the PDF file. You probably need to closely look at what is the URL to PDF file. You code seems alright.
Just do a cURL on the URL and see. It will most probably return a HTML file.

Related

OpenShift Java - Use image out of Data dir

I'm trying to create an upload-image button and afterward showing the image on a different jsp page.
I want to do this by uploading into the app-root/data/images folder. This works with the below filepath: filePath = System.getenv("OPENSHIFT_DATA_DIR") + "images/";
But how can I show this image on my jsp? I tried using:
<BODY>
<h1>SNOOP PAGE</h1>
Ga weer terug
<% String filepath = System.getenv("OPENSHIFT_DATA_DIR") + "images/";
out.println("<img src='"+filepath+"logo21.jpg'/>");
%>
<img src="app-root/data/images/logo21.jpg"/>
</BODY>
Both these options don't work. I also read that I need to create a symbolic link. But when I'm in my app-root/data or app-root/data/images or in app-root the command ln -s returns missing file operand
The logo21.jpg does show up in my Git bash
#developercorey is right (gave you +1 đź‘Ť), I just feel the need to explain why:
Your uploaded images ends up in a folder on your server
(String filepath = System.getenv("OPENSHIFT_DATA_DIR") + "images/" is the folder path in the server).
Your rendered HTML "<img src='"+filepath+"logo21.jpg'/> get sent to the client (the user's browser), with the server's filepath url.
Obviously, when the user's browser try to locate the image, using the path of the server, which doesn't exist on the local machine, it won't work.
The best solution, as #developercorey suggested, is to add a new servlet or a filter to serve photos from the OPENSHIFT_DATA_DIR folder:
You'll have a new url mapped to the servlet serving your photo, something like http://your-server/uploaded/
And you can use <img src="http://your-server/uploaded/logo21.jpg" /> in your jsp.
Here's the snippet from How-To: Upload and Serve files using Java Servlets on OpenShift
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.io.PrintWriter;
import javax.activation.MimetypesFileTypeMap;
import javax.servlet.ServletException;
import javax.servlet.annotation.MultipartConfig;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import javax.servlet.http.Part;
#WebServlet(name = "uploads",urlPatterns = {"/uploads/*"})
#MultipartConfig
public class Uploads extends HttpServlet {
private static final long serialVersionUID = 2857847752169838915L;
int BUFFER_LENGTH = 4096;
protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
PrintWriter out = response.getWriter();
for (Part part : request.getParts()) {
InputStream is = request.getPart(part.getName()).getInputStream();
String fileName = getFileName(part);
FileOutputStream os = new FileOutputStream(System.getenv("OPENSHIFT_DATA_DIR") + fileName);
byte[] bytes = new byte[BUFFER_LENGTH];
int read = 0;
while ((read = is.read(bytes, 0, BUFFER_LENGTH)) != -1) {
os.write(bytes, 0, read);
}
os.flush();
is.close();
os.close();
out.println(fileName + " was uploaded to " + System.getenv("OPENSHIFT_DATA_DIR"));
}
}
protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
String filePath = request.getRequestURI();
File file = new File(System.getenv("OPENSHIFT_DATA_DIR") + filePath.replace("/uploads/",""));
InputStream input = new FileInputStream(file);
response.setContentLength((int) file.length());
response.setContentType(new MimetypesFileTypeMap().getContentType(file));
OutputStream output = response.getOutputStream();
byte[] bytes = new byte[BUFFER_LENGTH];
int read = 0;
while ((read = input.read(bytes, 0, BUFFER_LENGTH)) != -1) {
output.write(bytes, 0, read);
output.flush();
}
input.close();
output.close();
}
private String getFileName(Part part) {
for (String cd : part.getHeader("content-disposition").split(";")) {
if (cd.trim().startsWith("filename")) {
return cd.substring(cd.indexOf('=') + 1).trim()
.replace("\"", "");
}
}
return null;
}
}
The best way to serve user uploaded images that you are storing in your OPENSHIFT_DATA_DIR would be to use a servlet as described here: https://forums.openshift.com/how-to-upload-and-serve-files-using-java-servlets-on-openshift?noredirect
This servlet basically takes the path/name of the image that is being requested, reads it from the filesystem and then serves it to the requester.
The OPENSHIFT_DATA_DIR directory is not web-accessible. You can make images stored in the OPENSHIFT_DATA_DIR (aka app-root/data) directory web-accessible by creating a symlink to them from the publicly accessible OPENSHIFT_REPO_DIR.
For one-time use, as a proof of concept:
rhc ssh -a <your_app_name> -n <your_namespace>
ln -sf ${OPENSHIFT_DATA_DIR}images ${OPENSHIFT_REPO_DIR}images
You should now be able to access logo21.jpg at https://<your_app_name>-<your_namespace>.rhcloud.com/images/logo21.jpg, or <img src="/images/logo21.jpg"/>.
The contents of the OPENSHIFT_REPO_DIR are overwritten when you push changes, so you'll want to create the symlink with a deploy hook to re-create it each time you deploy. In .openshift/action_hooks/deploy:
#!/bin/bash
# This deploy hook gets executed after dependencies are resolved and the
# build hook has been run but before the application has been started back
# up again.
# create the images directory if it doesn't exist
if [ ! -d ${OPENSHIFT_DATA_DIR}images ]; then
mkdir ${OPENSHIFT_DATA_DIR}images
fi
# create symlink to uploads directory
ln -sf ${OPENSHIFT_DATA_DIR}images ${OPENSHIFT_REPO_DIR}images
You can upload the file to the DATA DIRECTORY, then copy the file from the DATA DIRECTORY to any folder in the HOME DIRECTORY.
Thereafter you should be able to reference the image as usual in your page but it appears Openshift only displays items from a previous deployment or git push, therefore perhaps it is best to save the file in a database then read it directly from that database.

Insert image into MySQL database

I've been trying for days to do this and got absolutely nowhere. I know it can be done, but I've been trawling SO for answers and got nothing working.
Upload a picture using my REST client
Insert that uploaded picture into the MySQL database.
What I have tried:
Following Load_File doesn't work, I'm using OS X so I don't know how to change ownership of folders etc... how do I do this? I never got an answer in my last post about this. How do I do this?
I've also tried doing it another way: http://examples.javacodegeeks.com/enterprise-java/rest/jersey/jersey-file-upload-example/
This does not work at all. I keep getting the error described in this post: Jersey REST WS Error: "Missing dependency for method... at parameter at index X", but the answer doesn't help me as I still don't know what it should be...
Can anyone please guide me through it?
I'm using a Jersey REST client in Java. Many of the tutorials to do this mention a pom.xml file, I don't have one or know what it is.
Thank you,
Omar
EDIT:
This is the file upload:
package com.omar.rest.apimethods;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import javax.ws.rs.Consumes;
import javax.ws.rs.POST;
import javax.ws.rs.Path;
import javax.ws.rs.core.MediaType;
import javax.ws.rs.core.Response;
import com.sun.jersey.core.header.FormDataContentDisposition;
import com.sun.jersey.multipart.FormDataParam;
#Path("/files")
public class FileUpload {
private String uploadLocationFolder = "/Users/Omar/Pictures/";
#POST
#Path("/upload")
#Consumes(MediaType.MULTIPART_FORM_DATA)
public Response uploadFile(
#FormDataParam("file") InputStream fileInputStream,
#FormDataParam("file") FormDataContentDisposition contentDispositionHeader) {
String filePath = "/Users/Omar/Pictures/" + contentDispositionHeader.getFileName();
// save the file to the server
saveFile(fileInputStream, filePath);
String output = "File saved to server location : " + filePath;
return Response.status(200).entity(output).build();
}
// save uploaded file to a defined location on the server
private void saveFile(InputStream uploadedInputStream,
String serverLocation) {
try {
OutputStream outpuStream = new FileOutputStream(new File(serverLocation));
int read = 0;
byte[] bytes = new byte[1024];
outpuStream = new FileOutputStream(new File(serverLocation));
while ((read = uploadedInputStream.read(bytes)) != -1) {
outpuStream.write(bytes, 0, read);
}
outpuStream.flush();
outpuStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Schema for the table (one I created for testing):
image_id: int auto-incrementing PK, picture: BLOB.
I could make it a file link and just load the image on my website but I can't even get that far yet.
I would recommend storing your image in some kind of cheap, well permissioned flat storage like network storage, and then storing a path to that storage location in the database. If you're storing your image as a blob, the database is going to do something similar to this already anyways, but I believe there will be some overhead involved with making the database manage storing and retrieving these images. These images will eat through a lot of your database's disk space, and if you want to add more space for images, adding space to flat storage should be easier than adding space to a database.

Java Print API scaling with HTML

I am stuck at this now. I have checked almost every popular question on SO site regarding Java Print API to print HTML files (with third-party libraries such as Flying Saucer, iText, CSSBox, etc). But still couldn't get it worked at my end yet.
Here are the links of my previous questions:
https://stackoverflow.com/questions/28106757/java-print-api-prints-html-with-huge-size
How to print HTML and not the code using Java Print API?
Basically I am trying to print the HTML file that contains some CSS with <style> tag. This CSS has classes applied for <table> and <p> tags for example. I cannot change CSS code inside HTML as it should be viewed exactly with this style in browser.
Below is my program
import java.awt.print.PrinterException;
import java.io.File;
import java.io.IOException;
import java.net.MalformedURLException;
import java.net.URL;
import javax.print.PrintException;
import javax.print.PrintService;
import javax.print.PrintServiceLookup;
import javax.print.attribute.HashPrintServiceAttributeSet;
import javax.print.attribute.PrintServiceAttributeSet;
import javax.print.attribute.standard.PrinterName;
import javax.swing.JEditorPane;
public class Print {
public static void main(String[] args) throws PrintException {
String printerName = "\\\\network-path\\myPrinter";
String fileName = "C:\\log\\myLog.html";
URL url = null;
try {
url = (new File(fileName)).toURI().toURL();
} catch (MalformedURLException e) {
e.printStackTrace();
}
JEditorPane editorPane = new JEditorPane();
editorPane.setEditable(false);
if (url != null) {
try {
editorPane.setPage(url);
} catch (IOException e) {
System.err.println("Attempted to read a bad URL: " + url);
}
} else {
System.err.println("Couldn't find file: " + fileName);
}
PrintServiceAttributeSet printServiceAttributeSet = new HashPrintServiceAttributeSet();
printServiceAttributeSet.add(new PrinterName(printerName, null));
PrintService[] printServices = PrintServiceLookup.lookupPrintServices(null, printServiceAttributeSet); // list of printers
PrintService printService = printServices[0];
PrintRequestAttributeSet pras = new HashPrintRequestAttributeSet();
Copies copies = new Copies(1);
pras.add(copies);
pras.add(OrientationRequested.PORTRAIT);
pras.add(MediaSizeName.ISO_A4);
try {
editorPane.print(null, null, false, printService, pras, false);
} catch (PrinterException e) {
throw new PrintException("Print error occurred:" + e.getMessage());
}
}
}
The problem is above code works and I get good print of the above HTML with proper CSS styling. But it just scales up. When the said HTML is opened in IE it looks different and when it is printed by the code what I get is different. I would prefer the print to be same as it is viewed in IE.
I also tried to get it done by passing SimpleDoc object to the printer. My printService supports below formats:
image/gif [B
image/gif java.io.InputStream
image/gif java.net.URL
image/jpeg [B
image/jpeg java.io.InputStream
image/jpeg java.net.URL
image/png [B
image/png java.io.InputStream
image/png java.net.URL
application/x-java-jvm-local-objectref java.awt.print.Pageable
application/x-java-jvm-local-objectref java.awt.print.Printable
application/octet-stream [B
application/octet-stream java.net.URL
application/octet-stream java.io.InputStream
But nothing works with SimpleDoc. I then tried converting HTML to .png using CSSBox. It works but for multipage HTML, generated image is shrunk and is not viewable for printing. With Flying Saucer and iText version 2.0.8 I get NoSuchMethodError. Also even if I get it worked (by compiling the source against the said iText version) the output is broken.
Can someone please help? I would prefer to stick to Java Print API than using any third-party. Am I missing something when using SimpleDoc object approach? What settings need to be set to print above HTML using SimpleDoc object and available printService formats.

How to print an email to pdf programmatically [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I want to generate a PDF document from a "raw" email. This email could containt html or just text. I don't care for attachments.
The resulting pdf should contain the proper formatting (from css and html) and also embedded images.
My first idea was to render the email using an email client like thunderbird and then print it to pdf. Does thunderbird offer such an API or are there java libraries available to print an email to pdf?
I've found a better solution to the one I posted before. saving the email to html, then use jtidy to clean it up to xhtml. and lastly use flying saucer html renderer to save it into pdf.
Here is an example I wrote:
import com.lowagie.text.DocumentException;
import org.w3c.tidy.Tidy;
import org.xhtmlrenderer.pdf.ITextRenderer;
import java.io.*;
import java.util.*;
import javax.mail.*;
public class Email2PDF {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("mail.store.protocol", "imaps");
try {
Session session = Session.getInstance(props, null);
Store store = session.getStore();
//read your latest email
store.connect("imap.gmail.com", "youremail#gmail.com", "password");
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_ONLY);
Message msg = inbox.getMessage(inbox.getMessageCount());
Multipart mp = (Multipart) msg.getContent();
BodyPart bp = mp.getBodyPart(0);
String filename = msg.getSubject();
FileOutputStream os = new FileOutputStream(filename + ".html");
msg.writeTo(os);
//use jtidy to clean up the html
cleanHtml(filename);
//save it into pdf
createPdf(filename);
} catch (Exception mex) {
mex.printStackTrace();
}
}
public static void cleanHtml(String filename) {
File file = new File(filename + ".html");
InputStream in = null;
try {
in = new FileInputStream(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
OutputStream out = null;
try {
out = new FileOutputStream(filename + ".xhtml");
} catch (FileNotFoundException e) {
e.printStackTrace();
}
final Tidy tidy = new Tidy();
tidy.setQuiet(false);
tidy.setShowWarnings(true);
tidy.setShowErrors(0);
tidy.setMakeClean(true);
tidy.setForceOutput(true);
org.w3c.dom.Document document = tidy.parseDOM(in, out);
}
public static void createPdf(String filename)
throws IOException, DocumentException {
OutputStream os = new FileOutputStream(filename + ".pdf");
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(new File(filename + ".xhtml"));
renderer.layout();
renderer.createPDF(os) ;
os.close();
}
}
Enjoy!
I put a piece of software together that converts eml files to pdf's by parsing (and cleaning) the mime/structure, converting it to html and then use wkhtmltopdf to convert it to a pdf file.
It also handles inline images, corrupt mime headers and can use a proxy.
The code is available at github under apache V2 license.
import com.lowagie.text.Document;
import com.lowagie.text.DocumentException;
import com.lowagie.text.Paragraph;
import com.lowagie.text.pdf.PdfWriter;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.*;
import javax.mail.*;
public class Email2PDF {
public static void main(String[] args) {
Properties props = new Properties();
props.setProperty("mail.store.protocol", "imaps");
try {
Session session = Session.getInstance(props, null);
Store store = session.getStore();
store.connect("imap.gmail.com", "youremail#gmail.com", "password");
Folder inbox = store.getFolder("INBOX");
inbox.open(Folder.READ_ONLY);
Message msg = inbox.getMessage(inbox.getMessageCount());
Multipart mp = (Multipart) msg.getContent();
BodyPart bp = mp.getBodyPart(0);
createPdf(msg.getSubject(), (String) bp.getContent());
} catch (Exception mex) {
mex.printStackTrace();
}
}
public static void createPdf(String filename, String body)
throws DocumentException, IOException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(filename + ".pdf"));
document.open();
document.add(new Paragraph(body));
document.close();
}
}
I've used itext as the pdf library
You can read HTML content using email client and then use iText to convert it in to PDF
Look into fpdf and fpdi, both free libraries for PHP are used in the creation of PDF docs.
Since the SMTP protocol has conventions, actually strict rules, you can always count on the first empty line to be the before the content of the message. So you can definitely parse everything after the first part of the line to get the entirety of the message.
For embedded images, you'll need a base 64 decoder (usually) or some other decoder based on the email's attachment encoding type to transform the data into a human readable image.
You could try the Apache PDFbox library.
It seems to have a nice API and it also supports printing. PrintPDF
You would have to run the print command from CLI with your file as a parameter.
Edit: It is Java and open-source.
Hope it helps!

Base64 String corrupt from Java

I have a phonegap plugin I altered. The Java part outputs a base64 string:
package org.apache.cordova;
import java.io.ByteArrayOutputStream;
import java.io.File;
import org.apache.cordova.api.Plugin;
import org.apache.cordova.api.PluginResult;
import org.json.JSONArray;
import android.annotation.TargetApi;
import android.graphics.Bitmap;
import android.os.Environment;
import android.util.Base64;
import android.view.View;
public class Screenshot extends Plugin {
#Override
public PluginResult execute(String action, JSONArray args, String callbackId) {
// starting on ICS, some WebView methods
// can only be called on UI threads
final Plugin that = this;
final String id = callbackId;
super.cordova.getActivity().runOnUiThread(new Runnable() {
//#Override
#TargetApi(8)
public void run() {
View view = webView.getRootView();
view.setDrawingCacheEnabled(true);
Bitmap bitmap = Bitmap.createBitmap(view.getDrawingCache());
view.setDrawingCacheEnabled(false);
File folder = new File(Environment.getExternalStorageDirectory(), "Pictures");
if (!folder.exists()) {
folder.mkdirs();
}
File f = new File(folder, "screenshot_" + System.currentTimeMillis() + ".png");
System.out.println(folder);
System.out.println("screenshot_" + System.currentTimeMillis() + ".png");
ByteArrayOutputStream baos = new ByteArrayOutputStream();
bitmap.compress(Bitmap.CompressFormat.PNG, 100, baos);
byte[] b = baos.toByteArray();
String base64String = Base64.encodeToString(b, Base64.DEFAULT);
String mytextstring = "data:image/png;base64,"+base64String;
System.out.println(mytextstring);
that.success(new PluginResult(PluginResult.Status.OK, mytextstring), id);
}
});
PluginResult imageData = new PluginResult(PluginResult.Status.NO_RESULT);
imageData.setKeepCallback(true);
System.out.println("imageData=============>>>>>"+imageData);
return imageData;
}
}
I then pass this to some Javascript and then send the string to a server. I have checked the string that the .php file receives, and the base64 string is identical. However when I decode the base64 string it seems corrupt. For a better example copy the contents of this text file into a decoder.
http://dl.dropbox.com/u/91982671/base64.txt
Note: When the .php file tries to decode it data:image/png;base64, is infront, I have just removed it for the ease of you pasting it into a decoder.
Decoder found here:
http://www.motobit.com/util/base64-decoder-encoder.asp
All I can think is that for some reason I may not be outputting the base64 string correctly from the Java. Does anyone have any idea whats going on? Or what may cause this?
I played about with this for a good few hours last night and took some of these suggestions into consideration.
Firstly I checked the image before I encoded it. It was fine.
However decoding it before it goes to the Javascript showed that it was corrupted, this meant it had to be something to do with the Java encoding process. To solve this, and I don't claim to 100% understand why it happens, but the the problem seems to lay with this code:
String mytextstring = "data:image/png;base64,"+base64String;
and the way I was adding "data:/image/png;base64," before I sent it to the Javascript and on to the PHP decoder. To resolve this I removed it from the Java code so it became:
String mytextstring = base64String;
And in my JavaScript function that sent it to the server I added it to the string there, this works and I received an uncorrupted image. Just in-case anyone wonders/cares the Javascript function where I add it instead is below:
function returnScreenshotImage(imageData) {
base64string = "data:image/png;base64,"+imageData;
console.log("String: "+base64string);
var url = 'http://www.websitename.co.uk/upload.php';
var params = {image: imageData};
document.basicfrm.oldscreenshotimg.value = document.basicfrm.screenshotimg.value;
// send the data
$.post(url, params, function(data) {
document.basicfrm.screenshotimg.value = data;
});
}
As you can see the line:
base64string = "data:image/png;base64,"+imageData;
Adds the section previously added by the Java. This works now. Hope this helps people in the future. If anyone would care to comment ad explain why this is if they know feel free. :)

Categories