Convert image to pdf with iText and Java

Convert image to pdf with iText and Java - java

I succesfully converted image files (gif, png, jpg, bmp) to pdf's using iText 1.3.
I can't change the version since we can't just change versions of a jar obviously in a professional environment.
The problem that I have is that the size of the image in the pdf is larger than the image itself. I am not talking about the file size but about the size of the image when the zoom is set to 100% on both the original image file and the pdf.
The pdf shows the image about a 20% to 30% bigger than the original image.
What am I doing wrong?
public void convertOtherImages2pdf(byte[] in, OutputStream out, String title, String author) throws IOException {
Image image = Image.getInstance(in);
Rectangle imageSize = new Rectangle(image.width() + 1f, image.height() + 1f);
image.scaleAbsolute(image.width(), image.height());
com.lowagie.text.Document document = new com.lowagie.text.Document(imageSize, 0, 0, 0, 0);
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
document.add(image);
document.close();
writer.close();
}

Make MultipleImagesToPdf Class
public void imagesToPdf(String destination, String pdfName, String imagFileSource) throws IOException, DocumentException {
Document document = new Document(PageSize.A4, 20.0f, 20.0f, 20.0f, 150.0f);
String desPath = destination;
File destinationDirectory = new File(desPath);
if (!destinationDirectory.exists()){
destinationDirectory.mkdir();
System.out.println("DESTINATION FOLDER CREATED -> " + destinationDirectory.getAbsolutePath());
}else if(destinationDirectory.exists()){
System.out.println("DESTINATION FOLDER ALREADY CREATED!!!");
}else{
System.out.println("DESTINATION FOLDER NOT CREATED!!!");
}
File file = new File(destinationDirectory, pdfName + ".pdf");
FileOutputStream fileOutputStream = new FileOutputStream(file);
PdfWriter pdfWriter = PdfWriter.getInstance(document, fileOutputStream);
document.open();
System.out.println("CONVERTER START.....");
String[] splitImagFiles = imagFileSource.split(",");
for (String singleImage : splitImagFiles) {
Image image = Image.getInstance(singleImage);
document.setPageSize(image);
document.newPage();
image.setAbsolutePosition(0, 0);
document.add(image);
}
document.close();
System.out.println("CONVERTER STOPTED.....");
}
public static void main(String[] args) {
try {
MultipleImagesToPdf converter = new MultipleImagesToPdf();
Scanner sc = new Scanner(System.in);
System.out.print("Enter your destination folder where save PDF \n");
// Destination = D:/Destination/;
String destination = sc.nextLine();
System.out.print("Enter your PDF File Name \n");
// Name = test;
String name = sc.nextLine();
System.out.print("Enter your selected image files name with source folder \n");
String sourcePath = sc.nextLine();
// Source = D:/Source/a.jpg,D:/Source/b.jpg;
if (sourcePath != null || sourcePath != "") {
converter.imagesToPdf(destination, name, sourcePath);
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
=================================
Here i use itextpdf-5.4.0 Library
Happy Coding :)

You need to scale the image by the dpi.
float scale = 72 / dpi;
I don't know if such an ancient iText has that image information, more recent iText versions have it.

Related

Convert PDF to JPG2000 file(s)

I recently started working on this project where I need to convert a PDF File into a JPEG2000 file(s) - 1 jp2 file per page -.
The goal was to replace a previous pdf to jpeg converter method we had, in order to reduce the size of the output file(s).
Based on a code I found on the internet, I made the pdftojpeg2000 converter method below, and I've been changing the setEncodingRate parameter value and comparing the results.
I managed to get smaller jpeg2000 output files, but the quality is very poor, compared to the Jpeg ones, specially for colored text or images.
Here is what my orginal pdf file looks like:
When I set setEncodingRate to 0.8 it looks like this:
My output file size is 850Ko, which is even bigger than the Jpeg (around 600Ko) ones, and lower quality.
At 0.1 setEncodingRate, the file size is considerably small, 111 Ko, but basically unreadable.
So basically what I'm trying to get here is smaller output files ( <600K ) with a better quality, And I'm wondering if it is feasible with the Jpeg2000 format.
public class ImageConverter {
public void compressor(String inputFile, String outputFile) throws IOException {
J2KImageWriteParam iwp = new J2KImageWriteParam();
PDDocument document = PDDocument.load(new File (inputFile), MemoryUsageSetting.setupMixed(10485760L));
PDFRenderer pdfRenderer = new PDFRenderer(document);
int nbPages = document.getNumberOfPages();
int pageCounter = 0;
BufferedImage image;
for (PDPage page : document.getPages()) {
if (page.hasContents()) {
image = pdfRenderer.renderImageWithDPI(pageCounter, 300, ImageType.RGB);
if (image == null)
{
System.out.println("If no registered ImageReader claims to be able to read the resulting stream");
}
Iterator writers = ImageIO.getImageWritersByFormatName("JPEG2000");
String name = null;
ImageWriter writer = null;
while (name != "com.sun.media.imageioimpl.plugins.jpeg2000.J2KImageWriter") {
writer = (ImageWriter) writers.next();
name = writer.getClass().getName();
System.out.println(name);
}
File f = new File(outputFile+"_"+pageCounter+".jp2");
long s = System.currentTimeMillis();
ImageOutputStream ios = ImageIO.createImageOutputStream(f);
writer.setOutput(ios);
J2KImageWriteParam param = (J2KImageWriteParam) writer.getDefaultWriteParam();
IIOImage ioimage = new IIOImage(image, null, null);
param.setSOP(true);
param.setWriteCodeStreamOnly(true);
param.setProgressionType("layer");
param.setLossless(true);
param.setCompressionMode(J2KImageWriteParam.MODE_EXPLICIT);
param.setCompressionType("JPEG2000");
param.setCompressionQuality(0.01f);
param.setEncodingRate(1.01);
param.setFilter(J2KImageWriteParam.FILTER_53 );
writer.write(null, ioimage, param);
System.out.println(System.currentTimeMillis() - s);
writer.dispose();
ios.flush();
ios.close();
image.flush();
pageCounter++;
}
}
}
public static void main(String[] args) {
String input = "E:/IMGTEST/mail-DOC0002.pdf";
String output = "E:/IMGTEST/mail-DOC0002/docamail-DOC0002-";
ImageConverter imgcv = new ImageConverter();
try {
imgcv.compressor(input, output);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

IText visualization of components

I state that I use NetBeans 8.2. I need to transfer a jframe (which represents an invoice) on a paragraph of IText. Everything works and is displayed. However, I get that the writings and details of the pdf I created are not well defined and displayed.
For example, the strings seem to have a shaded effect so that instead of being only black, they have gray shades. It is particularly annoying when I print: the smaller text is not readable. The same effect does not occur in the video display.
I tried modifying the jframe size and scaling the graphic2d instance. Then I returned to normal size and I modified the scale of the paragraph and so I got a result a bit 'better but not satisfactory.
Any suggestions?
Edit: this is the code of the method that manages the creation of the paragraph:
private Paragraph creaParPdf(float scaleFactorW, float scaleFactorH) throws BadElementException, IOException {
BufferedImage img = new BufferedImage(fatt.getWidth(), fatt.getHeight(),
BufferedImage.TYPE_INT_ARGB);
Graphics2D img2D = img.createGraphics();
img2D.setRenderingHint(RenderingHints.KEY_ANTIALIASING,
RenderingHints.VALUE_ANTIALIAS_ON);
fatt.paint(img2D);
Paragraph p = new Paragraph();
com.itextpdf.text.Image itextImg =
com.itextpdf.text.Image.getInstance(img, Color.white, false);
itextImg.scaleAbsolute(PageSize.A4.getWidth()*scaleFactorW, PageSize.A4.getHeight()*scaleFactorH);
p.add(itextImg);
return p;
}
And this is the code of the method that manages the creation of the pdf:
public void printPDF() throws FileNotFoundException, DocumentException, SQLException, ClassNotFoundException, IOException{
fatt = this;
Document d = new Document(PageSize.A4,10,10,10,10);
d.setMargins(10, 10, 0, 10);
String salvPath = "C:\\"+cliente+".pdf";
filePDF = new File(salvPath);
filePDF.getParentFile().mkdirs();
if (filePDF.exists())
out.println("Il file " + salvPath + " esiste");
else try {
if (filePDF.createNewFile())
out.println("Il file " + salvPath + " è stato creato");
else
out.println("Il file " + salvPath + " non può essere creato");
} catch (IOException ex) {
getLogger(JFrameStart.class.getName()).log(SEVERE, null, ex);
}
FileOutputStream fs = new FileOutputStream (salvPath);
PdfWriter writer = PdfWriter.getInstance(d, fs);
writer.setFullCompression();
d.open ();
try {
float scaleFactorW = 0.98f;
float scaleFactorH = 0.98f;
float offset = 0.0f;
p = creaParPdf(scaleFactorW,scaleFactorH);
} catch (BadElementException ex) {
Logger.getLogger(JFrameTracc.class.getName()).log(Level.SEVERE, null, ex);
}
d.add(p);
d.close();
}
This is the final result, by printing the pdf you can see how the smaller parts are practically illegible:
enter image description here

Read data from image in PDF

I am using iText java TextExtraction to read text from PDF file. I use below code and it works fine for PDF in English Now I have PDF containing data as image. I want to read data from that image
public class pdfreader {
public static void main(String[] args) throws IOException, DocumentException, TransformerException {
String SRC = "";
String DEST = "";
for (String s : args) {
SRC = args[0];
DEST = args[1];
}
File file = new File(DEST);
file.getParentFile().mkdirs();
new pdfreader().readText(SRC, DEST);
}
public void readText(String src, String dest) throws IOException, DocumentException, TransformerException {
try {
PdfReader pdfReader = new PdfReader(src);
PdfReaderContentParser PdfParser = new PdfReaderContentParser(
pdfReader);
PrintWriter out = new PrintWriter(new FileOutputStream(
dest));
TextExtractionStrategy textStrategy;
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
textStrategy = PdfParser.processContent(i,
new SimpleTextExtractionStrategy());
out.println(textStrategy.getResultantText());
}
out.flush();
out.close();
pdfReader.close();
} catch (Exception e) {
e.printStackTrace();
}
}
}

You can implement an OCR workflow with iText. As Amedee already hinted, this is something we have tried at iText, with very promising results.
The algorithm (high level):
Implement IEventListener to parse pages of your document
Look out for ImageRenderInfo events, they are fired when the PDF parser hits an image
You can call getImage() on the event and ultimately get a BufferedImage
Feed the BufferedImage to Tesseract
Apply the coordinate transform (tesseract does not use the same coordinate space as iText)
Now that you have the texf in the image, and the location, you can use iText to overlay text on your PDF. Or simply extract it.

iText doesn't support OCR to extract text from images. Try to use Tesseract or something else.

Get Image from the document using Apache POI

I am using Apache Poi to read images from docx.
Here is my code:
enter code here
public Image ReadImg(int imageid) throws IOException {
XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
BufferedImage jpg = null;
List<XWPFPictureData> pic = doc.getAllPictures();
XWPFPictureData pict = pic.get(imageid);
String extract = pict.suggestFileExtension();
byte[] data = pict.getData();
//try to read image data using javax.imageio.* (JDK 1.4+)
jpg = ImageIO.read(new ByteArrayInputStream(data));
return jpg;
}
It reads images properly but not in order wise.
For example, if document contains
image1.jpeg
image2.jpeg
image3.jpeg
image4.jpeg
image5.jpeg
It reads
image4
image3
image1
image5
image2
Could you please help me to resolve it?
I want to read the images order wise.
Thanks,
Sithik

public static void extractImages(XWPFDocument docx) {
try {
List<XWPFPictureData> piclist = docx.getAllPictures();
// traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator = piclist.iterator();
int i = 0;
while (iterator.hasNext()) {
XWPFPictureData pic = iterator.next();
byte[] bytepic = pic.getData();
BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("D:/imagefromword/" + pic.getFileName()));
i++;
}
} catch (Exception e) {
System.exit(-1);
}
}

File not found when inserting image file into PDF using itext

How to add images and design header, footer to pdf using itext?
I have written this ,but getting exception file not found.
Image image = Image.getInstance("\resources\image.gif");
thanks

I used the following code to insert an image from the classpath. Typically useful when you need to include an image that is not accessible from a public url.
Image img = Image.getInstance(getClass().getClassLoader().getResource("MyImage.jpg"));
In my case, I use maven, so I put MyImage.jpg in src/main/resources

take a look at this example
import java.io.*;
import com.lowagie.text.*;
import com.lowagie.text.pdf.*;
public class CreatePDF{
public static void main(String arg[])throws Exception{
try{
Document document=new Document();
FileOutputStream fos=new FileOutputStream("C:/header-footer.pdf");
PdfWriter writer = PdfWriter.getInstance(document, fos);
document.open();
Image image1 = Image.getInstance("C:/image1.jpg");
Image image2 = Image.getInstance("C:/image2.jpg");
image1.setAbsolutePosition(0, 0);
image2.setAbsolutePosition(0, 0);
PdfContentByte byte1 = writer.getDirectContent();
PdfTemplate tp1 = byte1.createTemplate(600, 150);
tp1.addImage(image2);
PdfContentByte byte2 = writer.getDirectContent();
PdfTemplate tp2 = byte2.createTemplate(600, 150);
tp2.addImage(image1);
byte1.addTemplate(tp1, 0, 715);
byte2.addTemplate(tp2, 0, 0);
Phrase phrase1 = new Phrase(byte1 + "", FontFactory.getFont(FontFactory.TIMES_ROMAN, 7, Font.NORMAL));
Phrase phrase2 = new Phrase(byte2 + "", FontFactory.getFont(FontFactory.TIMES_ROMAN, 7, Font.NORMAL));
HeaderFooter header = new HeaderFooter(phrase1, true);
HeaderFooter footer = new HeaderFooter(phrase2, true);
document.setHeader(header);
document.setFooter(footer);
document.close();
System.out.println("File is created successfully showing header and footer.");
}
catch (Exception ex){
System.out.println(ex);
}
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Convert image to pdf with iText and Java - java

You need to scale the image by the dpi. float scale = 72 / dpi; I don't know if such an ancient iText has that image information, more recent iText versions have it.

Related

Convert PDF to JPG2000 file(s)

IText visualization of components

Read data from image in PDF

Get Image from the document using Apache POI

File not found when inserting image file into PDF using itext

Categories

Resources