I've done converting PNG to PDF using itext.
How can i convert BASE64 into a PDF File, how can i achieve this.
//b64Image contains base64code from html2canvas, this code is for converting
base64 to png
String b64Image=request.getParameter("img_val");
b64Image = b64Image.replace("data:image/png;base64,", "");
b64Image = b64Image.replace(" ", "+");
byte[] imageByte;
String path = "C:\\Users\\P.Balraj\\Desktop\\Screenshot.png";
BASE64Decoder decoder = new BASE64Decoder();
imageByte = decoder.decodeBuffer(b64Image);
File outputfile = new File(path);
BufferedOutputStream stream = new BufferedOutputStream(
new FileOutputStream(outputfile));
stream.write(imageByte, 0, imageByte.length);
stream.flush();
stream.close();
Thank you
You can add b64 image to pdf like this:
import java.io.FileOutputStream;
//com.lowagie... old version
//com.itextpdf... recent version
import com.itextpdf.text.Document;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.text.Image;
import org.apache.commons.codec.binary.Base64;
public class Image64ToPDF {
public static void main(String ... args) {
Document document = new Document();
//
String b64Image = "iVBORw0KGgoAAAANSUhEUgAAAJcAAACXCAMAAAAvQTlLAAABF1BMVEX////qQzU0qFNChfT7vAU8gvTz9/5pm/bh6f3m7f4edvMhePPqPi/7tgD7uAD7ugAyfvPpOCjpNCIopUv0+vb62NYAnjf97ezU6tnrTkKv2LhCrF7whX4YokLpLRnC4cn+9/bymZPoHgDt8v4zqkHo9Ov0qqX2vrrpOzb8wgB1vob+7s+ExZNUsmv/+e9XkPXT4PzsX1WZzqX4ycbudW374+JiuHdKqU7btQPsuxaTtfj8w0U8lrSux/prrEQDpljF1vsmf+Gmz8SRsD40pGf/4Kj86cGmsjY3oH85maKAqfc/jtQ/k8Q5nZL3qBT81oTtXC/xfCb1lxvvbyHNtyb8y2MAaPK1syznAhj2nzjuZirzjSD80nX4U1R1AAAFc0lEQVR4nO2YC1faSBiGQwCtEAzJxBgICSGFoMilQERbWy+tdXfb3bXdrXvt//8dO1wOEDIZkkmGePbMo+eoR8DHb9755hs4jsFgMBgMBoPBYDAYDMYzxDo+HzsN1x1C3FqjPT62rNSdjpza0LRNU5aEGZJs2rbkNtpHKUqN3UoGCmV8CLKcqTSOrDSsxg3BRjmt3Oyhc75rq/ZQkIOdlmoZd5dmljO0MZVaR7LdnSVtPJRDWs3MBNfahdWRG7ZWSzPJoa/lCFI0qykm7ZidD83oVtOS2W2aWuMKQbHmyDV6Wk7UZK1juseUtGrbOxYOqWLRsLJcsmitvDJUwu8SR2uOUKHRYC033iLCalGJV8xsUaoW14idLSrVam+rliBJ8hQJjoe7q9aRjZWSzQqcn502BM7Tw4y5uUMoVcsa4sY/W2qMz63Vo+G079rm+jOEDJ1RB5N5OVND/k3HXZlRqhY3Dsy8YNeCWqXVrph0q3VcCSwWfnpxMhLFanFOwCoK9rZp7xyWjFa1uOOAvSgPQ5x2DZtWtbga+lg0h6H+oEPr0nH2+wlSi+KUF4rH3L8IMdNNWatfyp3+mtk0k3Zz88LweJiDYl+8YoJJK8xh6V+JOcjpV4+YlOJ7NXMeZlpQ7LeTlZncSFuL+36YW4it1lLIpG3F9RflmoqJ/yzEZKo31FA8lHIrTucNQ0i7RXBry7gI2bRh2OO0rTjuSsx5xMQvJ0IlbSl4BuV8fP0r/XQtu8R6yf7GTRF7kSD2+njo8xK/Yx5/MNiPArHYo9/r8CPOq5CPQOElqdeTfx1LfaxXNgLFV4Ra/dcIL9wTonmV35F6IeJ1laDXG1Kvks/rEBf7iF75Twl64WIf1Wuf0OvM71VK0muQoNcD8/pfeD2HfKH24+Pz9BKfnoPXoe8cEnMJepH2r/4V1XObuN8j54mzxLyIz0fk/IUL/o7mCeS8+qRgvP4oFosF+DH/LEx/mn9XKPu9iOcv7sGvJV5rmCe8COLgjV+MfF71BV/8dmGMiF7qE6Jg5BcPr5eY+5PneYPolYo+qzz5vcMbMDF3fTP1IinYK/+OIN+O3o4vXs60eL7Xiv5CiHiRb0eOs1YLKX6bWxEV7GCA6BMvyL2WCymKH/glauSCvUOknvR0nLF4h0J8fX2z8gITTA9D4k99rHhxix0pXvIeQMSVRKQrWyDuqjMeSnANf7jxevFqPcpr3CK08gPyLjEDtocPm1qwYriuv0k2n/gywuRfXvi1YMVCi+3tI7Sy2XjLCFvYBcIKoofdlPuIVSSfCVdUDbRYuIztIbXipn6KoqO9eNDd/uSXA6QW8Wi/Th0EiBn6lpAp7++QWtnybQJe3CRIDIARrsNqHfUeKZZIueC/HeQFzdRRQP6V+kQFPGi+Lfu3YzHO0bhGUPRni8l3EBugNZoY8/+m+ZOvYmXii9AmXYwYDwxjUtVaremSKkqrpdW7QF2VuHl/t1GxAfEAvYmiBy/lQk3vTTqQSU83DO+Dwf1nz7lduE1KC65LULPwCECQv2i+XTuKYp9AHjQ1hFggzR+X6Y99YG8Q2MVCAfTP8/jnY9yCAsRiVQyAn8vJdVQP1VgV45u/5PM0tOJWDIbsrhjjDoRB29IutmAY8acINC0d12C3APgoQ240FGznx2v1ot6iIlEN6J3brFTs6JEASuDYg8HoRbpBkVHnIy4mMLqUizVHqfIRagbAhOCdFjJaVRCymQG1Q28bIlDqvaDxYb1UemdntVqijXQVkzSgGp3qTnLlQ9FGPZ03NusGABwNe516OlILWvVqd6IbqqoakOkXfdId1XcaqiAUONdrWh2iaS34Q9o+DAaDwWAwGAwGg8FgMJD8B0Y0n3erXn7HAAAAAElFTkSuQmCC"; // .gif and .jpg are ok too!
String output = "c:/output.pdf";
try {
PdfWriter.getInstance(document, new FileOutputStream(output));
document.open();
byte[] decoded = Base64.decodeBase64(b64Image.getBytes());
document.add(Image.getInstance(decoded));
document.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Related
Anybody have any idea about,How to handle unstructured data like Audio,Video and Images using Hbase.I tried for this alot but i didn't get any idea.please any help is appreciated.
Option 1: convert image to byte array and you can prepare put request and insert to table. Similarly audio and video files also can be achieved.
See https://docs.oracle.com/javase/7/docs/api/javax/imageio/package-summary.html
import javax.imageio.ImageIO;
/* * Convert an image to a byte array
*/
private byte[] convertImageToByteArray (String ImageName)throws IOException {
byte[] imageInByte;
BufferedImage originalImage = ImageIO.read(new File(ImageName));
// convert BufferedImage to byte array
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(originalImage, "jpg", baos);
imageInByte = baos.toByteArray();
baos.close();
return imageInByte;
}
Option 2 : You can do that in below way using Apache commons lang API. probably this is best option than above which will be applicable to all objects including image/audio/video etc.. This can be used NOT ONLY for hbase you can save it in hdfs as well
See my answer for more details.
For ex : byte[] mediaInBytes = org.apache.commons.lang.SerializationUtils.serialize(Serializable obj)
for deserializing, you can do this static Object deserialize(byte[] objectData)
see the doc in above link..
Example usage of the SerializationUtils
import java.io.FileInputStream;
import java.io.FileOutputStream;
import org.apache.commons.lang.SerializationUtils;
public class SerializationUtilsTest {
public static void main(String[] args) {
try {
// File to serialize object to it can be your image or any media file
String fileName = "testSerialization.ser";
// New file output stream for the file
FileOutputStream fos = new FileOutputStream(fileName);
// Serialize String
SerializationUtils.serialize("SERIALIZE THIS", fos);
fos.close();
// Open FileInputStream to the file
FileInputStream fis = new FileInputStream(fileName);
// Deserialize and cast into String
String ser = (String) SerializationUtils.deserialize(fis);
System.out.println(ser);
fis.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
Note :jar of apache commons lang always available in hadoop cluster.(not external dependency)
I need to be able to parse the text contained in a file online with a given url, i.e. http://website.com/document.pdf.
I am making a search engine which basically can tell me if the searched word is in some file online, and retrieve the file's URL, so I don't need to download the file but to just read it.
I was looking for a way and found something with InputStream and OpenConnection but didn't managed to actually do it.
I am using jsoup in order to crawl around a website in order to retrieve the URLs, and I was trying to parse it with a Jsoup method, but it does not work.
So what is the best way to do this?
EDIT:
I want to be able to do something like this:
File in = new File("http://website.com/document.pdf");
Document doc = Jsoup.parse(in, "UTF-8");
System.out.println(doc.toString());
You can use URL instead of file for access to the URL. So using Apache Tika you should be able to grab a string of the content this way.
import org.apache.tika.parser.pdf.PDFParser;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.ContentHandler;
public class URLReader {
public static void main(String[] args) throws Exception {
URL url = new URL("http://website.com/document.pdf");
ContentHandler contenthandler = new BodyContentHandler();
Metadata metadata = new Metadata();
PDFParser pdfparser = new PDFParser();
pdfparser.parse(is, contenthandler, metadata, new ParseContext());
System.out.println(contenthandler.toString());
}
}
You can use this code first download the PDF file then read the text with apache lib. you need to add some jar manually.
you need to set your local pdf file address which is by defualt "download.pdf".
import com.gnostice.pdfone.PdfDocument;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.text.PDFTextStripper;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.ConnectException;
import java.net.URL;
import java.net.URLConnection;
public class LoadDocumentFromURL {
public static void main(String[] args) throws IOException {
URL url1 = new URL("https://arxiv.org/pdf/1811.06933.pdf");
byte[] ba1 = new byte[1024];
int baLength;
FileOutputStream fos1 = new FileOutputStream("download.pdf");
try {
// Contacting the URL
// System.out.print("Connecting to " + url1.toString() + " ... ");
URLConnection urlConn = url1.openConnection();
// Checking whether the URL contains a PDF
if (!urlConn.getContentType().equalsIgnoreCase("application/pdf")) {
System.out.println("FAILED.\n[Sorry. This is not a PDF.]");
} else {
try {
// Read the PDF from the URL and save to a local file
InputStream is1 = url1.openStream();
while ((baLength = is1.read(ba1)) != -1) {
fos1.write(ba1, 0, baLength);
}
fos1.flush();
fos1.close();
is1.close();
// Load the PDF document and display its page count
//System.out.print("DONE.\nProcessing the PDF ... ");
PdfDocument doc = new PdfDocument();
try {
doc.load("download.pdf");
// System.out.println("DONE.\nNumber of pages in the PDF is " + doc.getPageCount());
// System.out.println(doc.getAuthor());
// System.out.println(doc.getKeywords());
// System.out.println(doc.toString());
doc.close();
} catch (Exception e) {
System.out.println("FAILED.\n[" + e.getMessage() + "]");
}
} catch (ConnectException ce) {
//System.out.println("FAILED.\n[" + ce.getMessage() + "]\n");
}
}
} catch (NullPointerException npe) {
//System.out.println("FAILED.\n[" + npe.getMessage() + "]\n");
}
File file = new File("your local pdf file address which is download.pdf");
PDDocument document = PDDocument.load(file);
PDFTextStripper pdfStripper = new PDFTextStripper();
String text = pdfStripper.getText(document);
System.out.println(text);
document.close();
}
}
I am new to Java programming. My current project requires me to read embedded(ole) files in an excel sheet and get text contents in them. Examples for reading embedded word file worked fine, however I am unable to find help reading an embedded pdf file. Tried few things by looking at similar examples.... which didn't work out.
http://poi.apache.org/spreadsheet/quick-guide.html#Embedded
I have code below, probably with help I can get in right direction. I have used Apache POI to read embedded files in excel and pdfbox to parse pdf data.
public class ReadExcel1 {
public static void main(String[] args) {
try {
FileInputStream file = new FileInputStream(new File("C:\\test.xls"));
POIFSFileSystem fs = new POIFSFileSystem(file);
HSSFWorkbook workbook = new HSSFWorkbook(fs);
for (HSSFObjectData obj : workbook.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
if(oleName.equals("Acrobat Document")){
System.out.println("Acrobat reader document");
try{
DirectoryNode dn = (DirectoryNode) obj.getDirectory();
for (Iterator<Entry> entries = dn.getEntries(); entries.hasNext();) {
DocumentEntry nativeEntry = (DocumentEntry) dn.getEntry("CONTENTS");
byte[] data = new byte[nativeEntry.getSize()];
ByteArrayInputStream bao= new ByteArrayInputStream(data);
PDFParser pdfparser = new PDFParser(bao);
pdfparser.parse();
COSDocument cosDoc = pdfparser.getDocument();
PDFTextStripper pdfStripper = new PDFTextStripper();
PDDocument pdDoc = new PDDocument(cosDoc);
pdfStripper.setStartPage(1);
pdfStripper.setEndPage(2);
System.out.println("Text from the pdf "+pdfStripper.getText(pdDoc));
}
}catch(Exception e){
System.out.println("Error reading "+ e.getMessage());
}finally{
System.out.println("Finally ");
}
}else{
System.out.println("nothing ");
}
}
file.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Below is the output in eclipse
Acrobat reader document
Error reading Error: End-of-File, expected line
Finally
nothing
The PDF weren't OLE 1.0 packaged, but somehow differently embedded - at least the extraction worked for me.
This is not a general solution, because it depends on how the embedding application names the entries ... of course for PDFs you could check all DocumentNode-s for the magic number "%PDF" - and in case of OLE 1.0 packaged elements this needs to be done differently ...
I think, the real filename of the pdf is somewhere hidden in the \1Ole or CompObj entries, but for the example and apparently for your use case that's not necessary to determine.
import java.io.*;
import java.net.URL;
import org.apache.poi.hssf.usermodel.*;
import org.apache.poi.poifs.filesystem.*;
import org.apache.poi.util.IOUtils;
public class EmbeddedPdfInExcel {
public static void main(String[] args) throws Exception {
NPOIFSFileSystem fs = new NPOIFSFileSystem(new URL("http://jamesshaji.com/sample.xls").openStream());
HSSFWorkbook wb = new HSSFWorkbook(fs.getRoot(), true);
for (HSSFObjectData obj : wb.getAllEmbeddedObjects()) {
String oleName = obj.getOLE2ClassName();
DirectoryNode dn = (DirectoryNode)obj.getDirectory();
if(oleName.contains("Acro") && dn.hasEntry("CONTENTS")){
InputStream is = dn.createDocumentInputStream("CONTENTS");
FileOutputStream fos = new FileOutputStream(obj.getDirectory().getName()+".pdf");
IOUtils.copy(is, fos);
fos.close();
is.close();
}
}
fs.close();
}
}
This question already has answers here:
Converting HTML files to PDF [closed]
(8 answers)
Closed 9 years ago.
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.itextpdf.text.Document;
import com.itextpdf.text.Paragraph;
import com.itextpdf.text.pdf.PdfWriter;
public class GeneratePDF {
public static void main(String[] args) {
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("E:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
document.add(new Paragraph(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}
This is my code to convert HTML to PDF. I am able to convert it but in PDF file it saves as whole HTML while I need to display only text. <html><body> This is my Project </body></html> gets saved to PDF while it should save only This is my Project.
You can do it with the HTMLWorker class (deprecated) like this:
import com.itextpdf.text.html.simpleparser.HTMLWorker;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter.getInstance(document, file);
document.open();
HTMLWorker htmlWorker = new HTMLWorker(document);
htmlWorker.parse(new StringReader(k));
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
or using the XMLWorker, (download from this jar) using this code:
import com.itextpdf.tool.xml.XMLWorkerHelper;
//...
try {
String k = "<html><body> This is my Project </body></html>";
OutputStream file = new FileOutputStream(new File("C:\\Test.pdf"));
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, file);
document.open();
InputStream is = new ByteArrayInputStream(k.getBytes());
XMLWorkerHelper.getInstance().parseXHtml(writer, document, is);
document.close();
file.close();
} catch (Exception e) {
e.printStackTrace();
}
This links might be helpful to convert.
https://code.google.com/p/flying-saucer/
https://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html
If it is a college Project, you can even go for these,
http://pd4ml.com/examples.htm
Example is given to convert HTML to PDF
I am trying to encode images using base64 encoding on the image URL.
But it gives the same encoding for all the URLs.
My code is as follows:
Object namee = url.openConnection().getContentType();
String name = (String) namee;
BufferedImage image = ImageIO.read(url);
//getting image extension from content type
String ext = name.substring(name.lastIndexOf("/") + 1);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, ext, baos);
byte[] imageData = baos.toByteArray();
String base64value = Base64.encodeBase64URLSafeString(imageData);
This Code convert your image file/data to Base64 Format which is nothing but text representation of your Image.
To convert this you need Encoder & Decoder which you will get from http://www.source-code.biz/base64coder/java/. It is File Base64Coder.java you will need.
Now to access this class as per your requirement you will need class below:
import java.io.BufferedInputStream;
import java.io.BufferedOutputStream;
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.FileReader;
import java.io.FileWriter;
import java.io.InputStream;
import java.io.IOException;
import java.io.OutputStream;
public class Base64 {
public static void main(String args[]) throws IOException {
/*
* if (args.length != 2) {System.out.println(
* "Command line parameters: inputFileName outputFileName");
* System.exit(9); } encodeFile(args[0], args[1]);
*/
File sourceImage = new File("back3.png");
File sourceImage64 = new File("back3.txt");
File destImage = new File("back4.png");
encodeFile(sourceImage, sourceImage64);
decodeFile(sourceImage64, destImage);
}
private static void encodeFile(File inputFile, File outputFile) throws IOException {
BufferedInputStream in = null;
BufferedWriter out = null;
try {
in = new BufferedInputStream(new FileInputStream(inputFile));
out = new BufferedWriter(new FileWriter(outputFile));
encodeStream(in, out);
out.flush();
} finally {
if (in != null)
in.close();
if (out != null)
out.close();
}
}
private static void encodeStream(InputStream in, BufferedWriter out) throws IOException {
int lineLength = 72;
byte[] buf = new byte[lineLength / 4 * 3];
while (true) {
int len = in.read(buf);
if (len <= 0)
break;
out.write(Base64Coder.encode(buf, 0, len));
out.newLine();
}
}
static String encodeArray(byte[] in) throws IOException {
StringBuffer out = new StringBuffer();
out.append(Base64Coder.encode(in, 0, in.length));
return out.toString();
}
static byte[] decodeArray(String in) throws IOException {
byte[] buf = Base64Coder.decodeLines(in);
return buf;
}
private static void decodeFile(File inputFile, File outputFile) throws IOException {
BufferedReader in = null;
BufferedOutputStream out = null;
try {
in = new BufferedReader(new FileReader(inputFile));
out = new BufferedOutputStream(new FileOutputStream(outputFile));
decodeStream(in, out);
out.flush();
} finally {
if (in != null)
in.close();
if (out != null)
out.close();
}
}
private static void decodeStream(BufferedReader in, OutputStream out) throws IOException {
while (true) {
String s = in.readLine();
if (s == null)
break;
byte[] buf = Base64Coder.decodeLines(s);
out.write(buf);
}
}
}
In Android you can convert your Bitmap to Base64 for Uploading to Server/Web Service.
Bitmap bmImage = //Data
ByteArrayOutputStream baos = new ByteArrayOutputStream();
bmImage.compress(Bitmap.CompressFormat.JPEG, 100, baos);
byte[] imageData = baos.toByteArray();
String encodedImage = Base64.encodeArray(imageData);
This “encodedImage” is text representation of your Image. You can use this for either uploading purpose or for diplaying directly into HTML Page as below:
<img alt="" src="data:image/png;base64,<?php echo $encodedImage; ?>" width="100px" />
<img alt="" src="data:image/png;base64,/9j/4AAQ...........1f/9k=" width="100px" />
Here is a working example on how to convert image to base64 with Java.
public static String encodeToString(BufferedImage image, String type) {
String base64String = null;
ByteArrayOutputStream bos = new ByteArrayOutputStream();
try {
ImageIO.write(image, type, bos);
byte[] imageBytes = bos.toByteArray();
BASE64Encoder encoder = new BASE64Encoder();
base64String = encoder.encode(imageBytes);
bos.close();
} catch (IOException e) {
e.printStackTrace();
}
return base64String;
}