Some background: in my app, there are some pdf reports. But, these pdf reports need to be "image" based and I was told that the report server is unable to do this. The call to the report server is done from a pl/sql procedure and the result is a blob, so now all I have at my disposal to try to do this conversion is a java stored procedure. Here is what I came up with (using Apache PDFBox):
create or replace and compile java source named "APDFUtil"
as
import oracle.sql.*;
import oracle.jdbc.driver.*;
import java.sql.*;
import oracle.sql.BLOB;
import java.sql.Blob;
import javax.imageio.ImageIO;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.OutputStream;
import java.io.*;
import java.util.*;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.rendering.*;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.common.PDRectangle;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import org.apache.pdfbox.rendering.PDFRenderer;
import org.apache.pdfbox.tools.imageio.ImageIOUtil;
import java.awt.image.BufferedImage;
public class APDFUtil{
static OracleDriver ora = new OracleDriver();
static Connection conn;
static ByteArrayOutputStream out;
static {
try {
conn = ora.defaultConnection();
} catch (Exception ex) {}
}
public static oracle.sql.BLOB flattenPDF (oracle.sql.BLOB value) throws Exception {
if (conn == null) conn = ora.defaultConnection();
BLOB retBlob = BLOB.createTemporary(conn, true, oracle.sql.BLOB.DURATION_SESSION);
/*BEGIN TO_JPG*/
InputStream inputStream = value.getBinaryStream();
PDDocument document = PDDocument.load(inputStream);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int noOfPages = document.getNumberOfPages();
BufferedImage[] pdfJPEG = new BufferedImage[noOfPages];
for (int page = 0; page < noOfPages; ++page) {
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
pdfJPEG[page] = bim;
}
/*write images to new pdf*/
PDDocument documentOut = new PDDocument();
for (int page = 0; page < noOfPages;++page) {
/*get page from old document to determine width and height*/
PDPage oldPage = document.getPage(page);
Float pw = oldPage.getMediaBox().getWidth();
Float ph = oldPage.getMediaBox().getHeight();
PDRectangle rec = new PDRectangle(pw,ph);
PDPage newPage = new PDPage(rec);
documentOut.addPage(newPage);
PDImageXObject pdImage = LosslessFactory.createFromImage(documentOut, pdfJPEG[page]);
PDPageContentStream contents = new PDPageContentStream(documentOut, newPage);
contents.drawImage(pdImage, 0, 0,pw,ph);
contents.close();
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
documentOut.save(out);
documentOut.close();
document.close();
/*END OF TO_JPG*/
/*out - we used to get this back from TO_JPG*/
try {
java.io.OutputStream outStr = retBlob.setBinaryStream(0);
outStr.write(out.toByteArray());
outStr.flush();
} finally {
out.close();
}
return retBlob;
}
}
the pdfbox jars have been loaded into the database
database is oracle 19c standard edition 2 release 19.0.0.0.0
I tried this code as a standalone java project with the exception that the pdf file is being read from the disk, and new file written to the disk and there it works flawlessly.
The issue:
I believe the problem starts at this line: BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
but I don't know what is causing the error or how to debug it (I came to this conclusion by a painstaking process of elimination of code and throwing exceptions) especially since it works in a standalone project. Java stored procedures are not a specialty of mine and this code was pieced together from many different sources online.
Related
We are using PDFBox 2.0.17 (main reason:free) with java 8 to merge two types of PDF documents (Normal PDF/A and converted PDF from Tiff file).
We found that the size of resulting PDF file is quite big - basically the total size of the all PDFs. I am trying to find a way to reduce the resulting file size.
I found a stackoverflow link How to reduce the size of merged PDF/A-1b files with pdfbox or other java library. But it did not seem to help.
Is there any way to reduce the size of resulting PDF through the following ?
Font optimization, Image optimization & compression PDF compression
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.multipdf.PDFMergerUtility;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class MergerTest {
public static void main(String[] args) throws IOException {
File file1 = new File("C:\\Test\\PdfBox_Examples\\doc1.pdf");
File file2 = new File("C:\\Test\\PdfBox_Examples\\doc2.pdf");
//Instantiating PDFMergerUtility class
PDFMergerUtility PDFmerger = new PDFMergerUtility();
//Setting the destination file
PDFmerger.setDestinationFileName("C:\\Test\\PdfBox_Examples\\merged.pdf");
//adding the source files
PDFmerger.addSource(file1);
PDFmerger.addSource(file2);
//Merging the two documents
PDFmerger.mergeDocuments(null);
System.out.println("Documents merged");
File file = new File("C:\\Test\\PdfBox_Examples\\merged.pdf");
PDDocument doc = PDDocument.load(file);
Map<String, COSBase> fontFileCache = new HashMap<>();
for (int pageNumber = 0; pageNumber < doc.getNumberOfPages();
pageNumber++) {
final PDPage page = doc.getPage(pageNumber);
COSDictionary pageDictionary = (COSDictionary)
page.getResources().getCOSObject().getDictionaryObject
(COSName.FONT);
if(pageDictionary !=null) {
for (COSName currentFont : pageDictionary.keySet()) {
COSDictionary fontDictionary = (COSDictionary)
pageDictionary.getDictionaryObject(currentFont);
for (COSName actualFont : fontDictionary.keySet()) {
COSBase actualFontDictionaryObject =
fontDictionary.getDictionaryObject(actualFont);
if (actualFontDictionaryObject instanceof COSDictionary)
{
COSDictionary fontFile = (COSDictionary)
actualFontDictionaryObject;
if (fontFile.getItem(COSName.FONT_NAME) instanceof
COSName) {
COSName fontName = (COSName)
fontFile.getItem(COSName.FONT_NAME);
fontFileCache.computeIfAbsent(fontName.getName(), key ->
fontFile.getItem(COSName.FONT_FILE2));
fontFile.setItem(COSName.FONT_FILE2,
fontFileCache.get(fontName.getName()));
}
}
}
}
}else {
System.out.println("pageDictionary is null - likely Converted PDF
from Tiff");
}
}
final ByteArrayOutputStream baos = new ByteArrayOutputStream();
doc.save(baos);
final File compressed = new
File("C:\\Test\\PdfBox_Examples\\test_compressed.pdf");
baos.writeTo(new FileOutputStream(compressed));
System.out.println("Documents compressed");
}
}
//Note: I have also tested using tiff_1.pdf and tiff_2.pdf as the inputs.
is there a way to automatically compress/optimize images in a spring-boot-application?
As in my application the user can put any images in a folder themselves, I cannot make sure, they are compressed the best way. And as they are not uploaded through the application, I can also not create an optimized version.
So what I would like to do is to compress/optimize the images once they are requested and maybe save them in a kind of "image-cache" for a while.
Or is there a tomcat/apache-module, which already does this kind of things out-of-the box?
Thanks for your help
You can use javax.imageio's classes and interface to compress a given image. Below is an example of image compression of JPG image. You can add the below main method code to your service in spring boot application.
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.Iterator;
import javax.imageio.IIOImage;
import javax.imageio.ImageIO;
import javax.imageio.ImageWriteParam;
import javax.imageio.ImageWriter;
import javax.imageio.stream.ImageOutputStream;
public class ImageCompression {
public static void main(String[] args) throws FileNotFoundException, IOException{
File imageFile = new File("YOUR_IMAGE.jpg");
File compressedImageFile = new File("YOUR_COMPRESSED_IMAGE.jpg");
InputStream inputStream = new FileInputStream(imageFile);
OutputStream outputStream = new FileOutputStream(compressedImageFile);
float imageQuality = 0.3f;
//Create the buffered image
BufferedImage bufferedImage = ImageIO.read(inputStream);
//Get image writers
Iterator<ImageWriter> imageWriters = ImageIO.getImageWritersByFormatName("jpg");
if (!imageWriters.hasNext())
throw new IllegalStateException("Writers Not Found!!");
ImageWriter imageWriter = (ImageWriter) imageWriters.next();
ImageOutputStream imageOutputStream = ImageIO.createImageOutputStream(outputStream);
imageWriter.setOutput(imageOutputStream);
ImageWriteParam imageWriteParam = imageWriter.getDefaultWriteParam();
//Set the compress quality metrics
imageWriteParam.setCompressionMode(ImageWriteParam.MODE_EXPLICIT);
imageWriteParam.setCompressionQuality(imageQuality);
//Created image
imageWriter.write(null, new IIOImage(bufferedImage, null, null), imageWriteParam);
// close all streams
inputStream.close();
outputStream.close();
imageOutputStream.close();
imageWriter.dispose();
}
}
Good evening
I want to fill in the jpg photo file windows properties
Apparently these are the exiftags
[Exif IFD0] Windows XP Title
[Exif IFD0] Windows XP Author
[Exif IFD0] Windows XP Subject
I looked at the side of icafe.jar but have not found these tags.
Can I make it with icafe or other jar library ?
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.util.ArrayList;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.exif.Exif;
import com.icafe4j.image.meta.jpeg.JpegExif;
import com.icafe4j.image.meta.exif.ExifTag;
import com.icafe4j.image.tiff.TiffTag;
import com.icafe4j.image.tiff.FieldType;
fin = new FileInputStream(Fm_filePathIn);
fout = new FileOutputStream(Fm_filePathOut);
List<Metadata> metaList = new ArrayList<Metadata>();
metaList.add(populateExif(JpegExif.class));
Exif populateExif(Class<?> exifClass) throws IOException {
Exif exif = new JpegExif();
exif.addImageField(ExifTag.WINDOWS_XP_AUTHOR, FieldType.WINDOWSXP, "Toto");
exif.addImageField(ExifTag.WINDOWS_XP_KEYWORDS, FieldType.WINDOWSXP, "Copyright;Authorbisou");
// Insert ThumbNailIFD
// Since we don't provide thumbnail image, it will be created later from the input stream
exif.setThumbnailRequired(true);
return exif;
}
fin.close();
fout.close();
Those tags do exist in ICAFE but they are not Exiftag. They are TiffTag. Replace the ExifTag with TiffTag, it will work. Look at the TestMetada.java, it clearly shows that.
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import com.icafe4j.image.meta.Metadata;
import com.icafe4j.image.meta.exif.Exif;
import com.icafe4j.image.meta.jpeg.JpegExif;
import com.icafe4j.image.meta.exif.ExifTag;
import com.icafe4j.image.tiff.TiffTag;
import com.icafe4j.image.tiff.FieldType;
public class TestWindowsXP {
public static void main(String[] args) throws IOException {
FileInputStream fin = new FileInputStream(Fm_filePathIn);
FileOutputStream fout = new FileOutputStream(Fm_filePathOut);
List<Metadata> metaList = new ArrayList<Metadata>();
Exif exif = new JpegExif();
exif.addImageField(TiffTag.WINDOWS_XP_AUTHOR, FieldType.WINDOWSXP, "Toto");
exif.addImageField(TiffTag.WINDOWS_XP_KEYWORDS, FieldType.WINDOWSXP, "Copyright;Authorbisou");
// Insert ThumbNailIFD
// Since we don't provide thumbnail image, it will be created later from the input stream
exif.setThumbnailRequired(true);
metaList.add(exif);
Metadata.insertMetadata(metaList, fin, fout);
fin.close();
fout.close();
}
}
And the following is a screenshot when I right-click the resulting image->show properties. You can see the information you wanted to insert is showing.
Description:
ouput: pdf file
input : index.css, bootstrap.min.css, index.html
Problem: if i use index.css file without bootsrap its working fine, but when i use boot strap its throw exception.
CODE is here:
package test.test1;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import org.apache.commons.codec.Charsets;
import com.google.common.io.CharStreams;
import com.itextpdf.text.Document;
import com.itextpdf.text.DocumentException;
import com.itextpdf.text.PageSize;
import com.itextpdf.text.pdf.PdfWriter;
import com.itextpdf.tool.xml.Pipeline;
import com.itextpdf.tool.xml.XMLWorker;
import com.itextpdf.tool.xml.XMLWorkerHelper;
import com.itextpdf.tool.xml.css.CssFile;
import com.itextpdf.tool.xml.css.StyleAttrCSSResolver;
import com.itextpdf.tool.xml.html.Tags;
import com.itextpdf.tool.xml.parser.XMLParser;
import com.itextpdf.tool.xml.pipeline.css.CSSResolver;
import com.itextpdf.tool.xml.pipeline.css.CssResolverPipeline;
import com.itextpdf.tool.xml.pipeline.end.PdfWriterPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipeline;
import com.itextpdf.tool.xml.pipeline.html.HtmlPipelineContext;
public class Table {
File oFile = new File("c:\\test\\1.pdf");
Document document = new Document(PageSize.A4, 0, 0, 0, 0);
PdfWriter writer =null;
public Table() throws IOException, DocumentException {
oFile.createNewFile();
writer=PdfWriter.getInstance(document,new FileOutputStream(oFile));
InputStream htmlpathtest = Thread.currentThread()
.getContextClassLoader()
.getResourceAsStream("index.html");
String htmlstring = CharStreams.toString(new InputStreamReader(htmlpathtest, Charsets.UTF_8));
InputStream is = new ByteArrayInputStream(htmlstring.getBytes());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, baos);
writer.setInitialLeading(12.5f);
document.open();
HtmlPipelineContext htmlContext = new HtmlPipelineContext(null);
htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());
// CSS
CSSResolver cssResolver = new StyleAttrCSSResolver();
InputStream csspathtest = Thread.currentThread()
.getContextClassLoader()
.getResourceAsStream("css\\index.css");
InputStream csspathtest1 = Thread.currentThread()
.getContextClassLoader()
.getResourceAsStream("css\\bootstrap.min.css");
CssFile cssfiletest = XMLWorkerHelper.getCSS(csspathtest);
cssResolver.addCss(cssfiletest);
cssResolver.addCss(XMLWorkerHelper.getCSS(csspathtest1));
Pipeline<?> pipeline = new CssResolverPipeline(cssResolver,
new HtmlPipeline(htmlContext, new PdfWriterPipeline(
document, writer)));
XMLWorker worker = new XMLWorker(pipeline, true);
XMLParser p = new XMLParser(worker);
p.parse(is);
document.close();
}
public static void main(String[] args) throws IOException, DocumentException { new Table();}
}
Exception:
Exception in thread "main" java.lang.NumberFormatException: For input
string: "100%" at
sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1222)
at java.lang.Float.parseFloat(Float.java:422) at
com.itextpdf.tool.xml.css.FontSizeTranslator.getFontSize(FontSizeTranslator.java:186)
at
com.itextpdf.tool.xml.css.FontSizeTranslator.translateFontSize(FontSizeTranslator.java:165)
at
com.itextpdf.tool.xml.html.AbstractTagProcessor.startElement(AbstractTagProcessor.java:120)
at
com.itextpdf.tool.xml.pipeline.html.HtmlPipeline.open(HtmlPipeline.java:105)
at com.itextpdf.tool.xml.XMLWorker.startElement(XMLWorker.java:103)
at
com.itextpdf.tool.xml.parser.XMLParser.startElement(XMLParser.java:372)
at
com.itextpdf.tool.xml.parser.state.TagEncounteredState.process(TagEncounteredState.java:104)
at
com.itextpdf.tool.xml.parser.XMLParser.parseWithReader(XMLParser.java:237)
at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:215)
at com.itextpdf.tool.xml.parser.XMLParser.parse(XMLParser.java:188)
at test.test1.Table.(Table.java:95) at
test.test1.Table.main(Table.java:104)
AFAIK CSS support from the itext HTML renderer is very limited. I've recently used a more complete HTML to PDF library called flying-saucer that supports CSS2 and some CSS3 features. flying-saucer uses itext as backend. You should give it a go - it might not support all bootstrap features but may be able to still fulfill your requirements.
I am using DOCX4J to convert the DOCX to HTML .I have successfully done the conversion and got the html format.I will be using the html format to embed it as EMAIL body to send an email.But I have some issues which are listed below....
Unable to display images in email body
Losing the spaces and bullets
Please find the code which I have written,
WordprocessingMLPackage wordMLPackage;
wordMLPackage = Docx4J.load(new java.io.File(resourcePath2));
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath(imageFolder + resourcePath2 + "_files");
htmlSettings.setImageTargetUri(imageFolder +resourcePath2.substring(resourcePath2.lastIndexOf("/")+1) + "_files");
htmlSettings.setWmlPackage(wordMLPackage);
OutputStream os;
os = new ByteArrayOutputStream();
Docx4jProperties.setProperty("docx4j.Convert.Out.HTML.OutputMethodXML", true);
Docx4J.toHTML(htmlSettings, os, Docx4J.FLAG_SAVE_FLAT_XML);
DOCX = ((ByteArrayOutputStream)os).toString();
You may add like this in your code
package tcg.doc.web.managedBeans;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import org.apache.poi.xwpf.converter.core.FileImageExtractor;
import org.apache.poi.xwpf.converter.core.FileURIResolver;
import org.apache.poi.xwpf.converter.xhtml.XHTMLConverter;
import org.apache.poi.xwpf.converter.xhtml.XHTMLOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.beans.factory.annotation.Qualifier;
import org.springframework.context.annotation.Scope;
import org.springframework.stereotype.Component;
#Component
#Scope("session")
#Qualifier("ConvertWord")
public class ConvertWord {
private static final String docName = "TestDocx.docx";
private static final String outputlFolderPath = "d:/";
String htmlNamePath = "docHtml.html";
String zipName="_tmp.zip";
File docFile = new File(outputlFolderPath+docName);
File zipFile = new File(zipName);
public void ConvertWordToHtml() {
try {
// 1) Load DOCX into XWPFDocument
InputStream doc = new FileInputStream(new File(outputlFolderPath+docName));
System.out.println("InputStream"+doc);
XWPFDocument document = new XWPFDocument(doc);
// 2) Prepare XHTML options (here we set the IURIResolver to load images from a "word/media" folder)
XHTMLOptions options = XHTMLOptions.create(); //.URIResolver(new FileURIResolver(new File("word/media")));;
// Extract image
String root = "target";
File imageFolder = new File( root + "/images/" + doc );
options.setExtractor( new FileImageExtractor( imageFolder ) );
// URI resolver
options.URIResolver( new FileURIResolver( imageFolder ) );
OutputStream out = new FileOutputStream(new File(htmlPath()));
XHTMLConverter.getInstance().convert(document, out, options);
System.out.println("OutputStream "+out.toString());
} catch (FileNotFoundException ex) {
} catch (IOException ex) {
}
}
public static void main(String[] args) {
ConvertWord cwoWord=new ConvertWord();
cwoWord.ConvertWordToHtml();
System.out.println();
}
public String htmlPath(){
// d:/docHtml.html
return outputlFolderPath+htmlNamePath;
}
public String zipPath(){
// d:/_tmp.zip
return outputlFolderPath+zipName;
}
}
For maven Dependency on pom.xml
<dependency>
<groupId>fr.opensagres.xdocreport</groupId>
<artifactId>org.apache.poi.xwpf.converter.xhtml</artifactId>
<version>1.0.4</version>
</dependency>
or download it from Here
For images to work in an email body, I guess you need to use either a data URI or publish them to a web-reachable location.
In either case, you'll need to write an implementation of:
public interface ConversionImageHandler {
/**
* #param picture
* #param relationship of the image
* #param part of the image, if it is an internal image, otherwise null
* #return uri for the image we've saved, or null
* #throws Docx4JException this exception will be logged, but not propagated
*/
public String handleImage(AbstractWordXmlPicture picture, Relationship relationship, BinaryPart part) throws Docx4JException;
}
and configure docx4j to use it with htmlSettings.setImageHandler.
You can look at some of the existing implementations in the docx4j source code, and take advantage of the helper methods in AbstractConversionImageHandler (eg createEncodedImage if you want data URIs).