Make Tess4J get image from PDF file

Make Tess4J get image from PDF file - java

How to make Tess4J get image from PDF file?
I'm sarted on the transformation image file to text using OCR (Tess4J). It works fine, I have tested on image and it is great.
File imageFile = new File("D:\\HEAD2.png");
Tesseract instance = Tesseract.getInstance(); // JNA Interface Mapping
// Tesseract1 instance = new Tesseract1(); // JNA Direct Mapping
try {
String result = instance.doOCR(imageFile);
System.out.println(result);
} catch (TesseractException e) {
System.err.println(e.getMessage());
}
But I'm facing this problem. I would parse a pdf file that contains image so. I don't kow how to do And I have not found any exemple Tess4J with pdf
I tested this example with Asprise, but I don't find any example like this on Tess4J
import com.asprise.util.pdf.PDFReader;
import com.asprise.util.ocr.OCR;
PDFReader reader = new PDFReader(new File("my.pdf"));
reader.open(); // open the file.
int pages = reader.getNumberOfPages();
for(int i=0; i < pages; i++) {
BufferedImage img = reader.getPageAsImage(i);
// recognizes both characters and barcodes
String text = new OCR().recognizeAll(image);
System.out.println("Page " + i + ": " + text);
}
reader.close(); // finally, close the file.

make use of pdfutilities.convertpdf2png and use it like you did before with images.

Tess4j has a dependency on pdfbox, so you can use this library. It could be something like this:
import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.ImageType;
import org.apache.pdfbox.rendering.PDFRenderer;
PDDocument document = PDDocument.load(new File("YOUR_PDF_FILE_PATH"));
PDFRenderer pdfRenderer = new PDFRenderer(document);
ITesseract tesseract = new Tesseract();
tesseract.setDatapath("tessdata");
tesseract.setLanguage("spa");
for (int page = 0; page < document.getNumberOfPages(); page++) {
BufferedImage bufferedImage = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
try {
String str = tesseract.doOCR(bufferedImage);
System.out.println(str);
} catch (TesseractException ex) {
Logger.getLogger(OCR.class.getName()).log(Level.SEVERE, null, ex);
}
}
document.close();
I'm using here Tessj4 4.5 and pdf-box 2.0.
You can also check
https://colwil.com/how-to-extract-text-from-a-scanned-pdf-using-ocr-in-java/.

Related

Read multi page Tiff image and write to a pdf in Java

I'm trying to convert a multi page tiff to a pdf using PDFBox and not been successful. I'm not able to use apache imaging-commons in the company as its not a stable release.
Problem: Not able to read a multi tiff and write to a pdf.
Working solution so far: Only the first page is getting written and saved to pdf. Also when a tiff is a single page, it works.
Below is the code:
PDDocument doc = new PDDocument();
log.info("Read Image");
log.info("Process Image parts");
//Get the number of pages
int pages = 0;
try(ImageInputStream imageInputStream = ImageIO.createImageInputStream(new File("src/main/resources/output/testpdf.tiff"))) {
if (imageInputStream != null && imageInputStream.length() != 0) {
Iterator<ImageReader> iteratorIO = ImageIO.getImageReaders(imageInputStream);
if (iteratorIO != null && iteratorIO.hasNext()) {
ImageReader reader = iteratorIO.next();
reader.setInput(imageInputStream);
pages = reader.getNumImages(true);
log.info("Number of pages in the tiff is " + pages);
}
}
}
//Need a reader here for different page ?
for (int i=0; i<pages; i++) {
BufferedImage bimage = ImageIO.read(file);
PDPage page = new PDPage();
doc.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(doc, page);
try {
// the .08F can be tweaked. Go up for better quality,
// but the size of the PDF will increase
PDImageXObject image = JPEGFactory.createFromImage(doc, bimage, 0.08f);
Dimension scaledDim = getScaledDimension(new Dimension(image.getWidth(), image.getHeight()),
new Dimension((int) page.getMediaBox().getWidth(), (int) page.getMediaBox().getHeight()));
contentStream.drawImage(image, 1, 1, scaledDim.width, scaledDim.height);
} finally {
contentStream.close();
}
}
doc.save("src/main/resources/output/testpdf.pdf");
doc.close();
Do I need to come up with a reader which is not provided by ImageIO?
OR
Do I need to split the tiff multi page to individual pages and then write to a pdf?
I've not worked with image manipulations much, but appreciate the level of quality the ImageIO after the conversion process!
Thanks

Try this, you need PDFBox jar and sun.jai.codec jar
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.Iterator;
import javax.imageio.ImageIO;
import javax.imageio.ImageReader;
import javax.imageio.stream.ImageInputStream;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDPageContentStream;
import org.apache.pdfbox.pdmodel.graphics.image.LosslessFactory;
import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;
import com.sun.media.jai.codec.FileSeekableStream;
import com.sun.media.jai.codec.ImageCodec;
import com.sun.media.jai.codec.ImageDecoder;
import com.sun.media.jai.codec.SeekableStream;
import com.sun.media.jai.codec.TIFFDecodeParam;
public class FinalTtoP {
public static void main(String args[]) throws IOException
{
PDDocument document=new PDDocument();
File file = new File("C:/nn.tif"); //Enter Tiff file path
ImageInputStream isb = ImageIO.createImageInputStream(file);
Iterator<ImageReader> iterator = ImageIO.getImageReaders(isb);
if (iterator == null || !iterator.hasNext())
{
throw new IOException("Image file format not supported by ImageIO: ");
}
ImageReader reader = (ImageReader) iterator.next();
iterator = null;
reader.setInput(isb);
int nbPages = reader.getNumImages(true);
System.out.println(nbPages);
for(int p=0;p<nbPages;p++)
{
BufferedImage bufferedImage = reader.read(p);
PDPage page = new PDPage();
document.addPage(page);
PDImageXObject i = LosslessFactory.createFromImage(document, bufferedImage);
PDPageContentStream content =new PDPageContentStream(document, page);
content.drawImage(i, 0,0 ,page.getMediaBox().getWidth(),page.getMediaBox().getHeight());
content.close();
}
document.save("C:/nnnnm.pdf"); //Enter path to save your file with .pdf extension
document.close();
}
}

Refer to this code it will improve speed and it still slow then need to use itext.
public static byte[] convertTiffToPdf(File tiffFile) throws IOException {
ByteArrayOutputStream outStream = null;
PDDocument document = null;
ImageInputStream imgInputStream = null;
try {
outStream = new ByteArrayOutputStream();
document = new PDDocument();
PDRectangle pageSize = PDRectangle.LETTER;
int noOfPages = 0;
imgInputStream = ImageIO.createImageInputStream(tiffFile);
Iterator<ImageReader> iterator = ImageIO.getImageReaders(imgInputStream);
if (iterator == null || !iterator.hasNext()) {
throw new IOException("Image file format not supported by ImageIO: ");
}
ImageReader reader = (ImageReader) iterator.next();
iterator = null;
reader.setInput(imgInputStream);
noOfPages = reader.getNumImages(true);
for (int i = 0; i < noOfPages; i++) {
PDPageContentStream content = null;
try {
BufferedImage bufferedImage = reader.read(i);
PDPage page = new PDPage(pageSize);
document.addPage(page);
// PDImageXObject imgObject = LosslessFactory.createFromImage(document, bufferedImage); //Commented for PR 1028
PDImageXObject imgObject = CCITTFactory.createFromFile(document, tiffFile, i); //PR 1028
//PDImageXObject imgObject = JPEGFactory.createFromImage(document, bufferedImage);
content = new PDPageContentStream(document, page);
content.drawImage(imgObject, 0, 0, pageSize.getWidth(), pageSize.getHeight());
} catch(Exception e) {
e.printStackTrace();
} finally {
content.close();
}
}
document.save(outStream);
byte[] fileBytes = outStream.toByteArray();
return fileBytes;
} finally {
if (document != null) {
document.close();
}
if (imgInputStream != null) {
imgInputStream.close();
}
if (outStream != null) {
outStream.close();
}
}
}

You can use CCITTFactory.createFromFile(PDDocument document, File file, int number) which works for most bitonal tiff files. If that one doesn't work (because the TIFF file is tiled or in color), then read the individual pages into BufferedImage objects (see here) and then use LosslessFactory.createFromImage(PDDocument document, BufferedImage image) with the result.

How to get image of a PDF page(included text). not images in a PDF page

I have tried a PDF page to image, But just extracted each images in the PDF page. not page image.
Below Code :
public class ExtractionPDFtoThumbImgs {
static String filePath = "/Users/tmdtjq/Downloads/PDFTest/test.pdf";
static String outputFilePath = "/Users/tmdtjq/Downloads/PDFTest/pageimages";
public static void change(File inputFile, File outputFolder) throws IOException {
//TODO check the input file exists and is PDF
//TODO for the treatment of PDF encrypted
PDDocument doc = null;
try {
doc = PDDocument.load(inputFile);
List<PDPage> allPages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i <allPages.size(); i++) {
PDPage page = allPages.get(i);
page.convertToImage();
BufferedImage image = page.convertToImage();
ImageIO.write(image, "jpg", new File(outputFolder.getAbsolutePath() + File.separator + (i + 1) + ".jpg"));
}
} finally {
if (doc != null) {
doc.close();
}
}
}
public static void main(String[] args) {
File inputFile = new File(ExtractionPDFtoThumbImgs.filePath);
File outputFolder = new File(ExtractionPDFtoThumbImgs.outputFilePath);
if(!outputFolder.exists()){
outputFolder.mkdirs();
}
try {
ExtractionPDFtoThumbImgs.change(inputFile, outputFolder);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Above code extract images in PDF page. not convert image in PDF page(included text).
Are there converting tools (PDF page to image) or Converting PDFBox class?
Please Suggest how to get image of a PDF page(included text). not to get images in a PDF page.

Try pdftocairo, it's part of poppler.
I was using imagemagick to convert PDF to images, but it relies on Ghostscript which is sometimes picky about the PDF you feed it so it was hit or miss...
So far pdftocairo has been solid.
http://poppler.freedesktop.org

Text is missing when converting pdf file into image in java using pdfbox

I want to convert a PDF page to image file. Text is missing when I convert a PDF page to image using java.
The file which I want to convert 46_2.pdf after converting it shown me like 46_2.png
Code:
import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;
import javax.imageio.ImageIO;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
public class ConvertPDFPageToImageWithoutText {
public static void main(String[] args) {
try {
String oldPath = "C:/PDFCopy/46_2.pdf";
File oldFile = new File(oldPath);
if (oldFile.exists()) {
PDDocument document = PDDocument.load(oldPath);
List<PDPage> list = document.getDocumentCatalog().getAllPages();
for (PDPage page : list) {
BufferedImage image = page.convertToImage();
File outputfile = new File("C:/PDFCopy/image.png");
ImageIO.write(image, "png", outputfile);
document.close();
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}

Since you're using PDFBox, try using PDFImageWriter.writeToImage instead of PDPage.convertToImage. This post seems relevant to what you are trying to do.

I had the same problem. I found an article(unfortunally can't remember where because I've read hundred of them). There an author complained that appeared such problems in PDFBox after they updated the Java version to 7.21. So I'm using 7.17 and it works for me:)

Use the latest version of PDFBox(I am using 2.0.9) and add JAI Image I/O dependency from here. This is sample running code on JAVA 7.
public void pdfToImageConvertorUsingPdfBox(String inputPdfPath) throws Exception {
File sourceFile = new File(inputPdfPath);
String formatName = "png";
if (sourceFile.exists()) {
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
int count = document.getNumberOfPages();
for (int i = 0; i < count; i++) {
BufferedImage image = pdfRenderer.renderImageWithDPI(i, 200, ImageType.RGB);
String output = FilenameUtils.removeExtension(inputPdfPath) + "_" + (i + 1) + "." + formatName;
ImageIO.write(image, formatName, new File(output));
}
document.close();
} else {
logger.error(sourceFile.getName() + " File not exists");
}
}

Get Image from the document using Apache POI

I am using Apache Poi to read images from docx.
Here is my code:
enter code here
public Image ReadImg(int imageid) throws IOException {
XWPFDocument doc = new XWPFDocument(new FileInputStream("import.docx"));
BufferedImage jpg = null;
List<XWPFPictureData> pic = doc.getAllPictures();
XWPFPictureData pict = pic.get(imageid);
String extract = pict.suggestFileExtension();
byte[] data = pict.getData();
//try to read image data using javax.imageio.* (JDK 1.4+)
jpg = ImageIO.read(new ByteArrayInputStream(data));
return jpg;
}
It reads images properly but not in order wise.
For example, if document contains
image1.jpeg
image2.jpeg
image3.jpeg
image4.jpeg
image5.jpeg
It reads
image4
image3
image1
image5
image2
Could you please help me to resolve it?
I want to read the images order wise.
Thanks,
Sithik

public static void extractImages(XWPFDocument docx) {
try {
List<XWPFPictureData> piclist = docx.getAllPictures();
// traverse through the list and write each image to a file
Iterator<XWPFPictureData> iterator = piclist.iterator();
int i = 0;
while (iterator.hasNext()) {
XWPFPictureData pic = iterator.next();
byte[] bytepic = pic.getData();
BufferedImage imag = ImageIO.read(new ByteArrayInputStream(bytepic));
ImageIO.write(imag, "jpg", new File("D:/imagefromword/" + pic.getFileName()));
i++;
}
} catch (Exception e) {
System.exit(-1);
}
}

Splitting a multipage TIFF image into individual images (Java)

Been tearing my hair on this one.
How do I split a multipage / multilayer TIFF image into several individual images?
Demo image available here.
(Would prefer a pure Java (i.e. non-native) solution. Doesn't matter if the solution relies on commercial libraries.)

You can use the Java Advanced Imaging library, JAI, to split a mutlipage TIFF, by using an ImageReader:
ImageInputStream is = ImageIO.createImageInputStream(new File(pathToImage));
if (is == null || is.length() == 0){
// handle error
}
Iterator<ImageReader> iterator = ImageIO.getImageReaders(is);
if (iterator == null || !iterator.hasNext()) {
throw new IOException("Image file format not supported by ImageIO: " + pathToImage);
}
// We are just looking for the first reader compatible:
ImageReader reader = (ImageReader) iterator.next();
iterator = null;
reader.setInput(is);
Then you can get the number of pages:
nbPages = reader.getNumImages(true);
and read pages separatly:
reader.read(numPage)

A fast but non JAVA solution is tiffsplit. It is part of the libtiff library.
An example command to split a tiff file in all it's layers would be:
tiffsplit image.tif
The manpage says it all:
NAME
tiffsplit - split a multi-image TIFF into single-image TIFF files
SYNOPSIS
tiffsplit src.tif [ prefix ]
DESCRIPTION
tiffsplit takes a multi-directory (page) TIFF file and creates one or more single-directory (page) TIFF files
from it. The output files are given names created by concatenating a prefix, a lexically ordered suffix in the
range [aaa-zzz], the suffix .tif (e.g. xaaa.tif, xaab.tif, xzzz.tif). If a prefix is not specified on the
command line, the default prefix of x is used.
OPTIONS
None.
BUGS
Only a select set of ‘‘known tags’’ is copied when splitting.
SEE ALSO
tiffcp(1), tiffinfo(1), libtiff(3TIFF)
Libtiff library home page: http://www.remotesensing.org/libtiff/

I used this sample above with a tiff plugin i found called imageio-tiff.
Maven dependency:
<dependency>
<groupId>com.tomgibara.imageio</groupId>
<artifactId>imageio-tiff</artifactId>
<version>1.0</version>
</dependency>
I was able to get the buffered images from a tiff resource:
Resource img3 = new ClassPathResource(TIFF4);
ImageInputStream is = ImageIO.createImageInputStream(img3.getInputStream());
Iterator<ImageReader> iterator = ImageIO.getImageReaders(is);
if (iterator == null || !iterator.hasNext()) {
throw new IOException("Image file format not supported by ImageIO: ");
}
// We are just looking for the first reader compatible:
ImageReader reader = (ImageReader) iterator.next();
iterator = null;
reader.setInput(is);
int nbPages = reader.getNumImages(true);
LOGGER.info("No. of pages for tiff file is {}", nbPages);
BufferedImage image1 = reader.read(0);
BufferedImage image2 = reader.read(1);
BufferedImage image3 = reader.read(2);
But then i found another project called apache commons imaging
Maven dependency:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-imaging</artifactId>
<version>1.0-SNAPSHOT</version>
</dependency>
In one line you can get the buffered images:
List<BufferedImage> bufferedImages = Imaging.getAllBufferedImages(img3.getInputStream(), TIFF4);
LOGGER.info("No. of pages for tiff file is {} using apache commons imaging", bufferedImages.size());
Then write to file sample:
final Map<String, Object> params = new HashMap<String, Object>();
// set optional parameters if you like
params.put(ImagingConstants.PARAM_KEY_COMPRESSION, new Integer(TiffConstants.TIFF_COMPRESSION_CCITT_GROUP_4));
int i = 0;
for (Iterator<BufferedImage> iterator1 = bufferedImages.iterator(); iterator1.hasNext(); i++) {
BufferedImage bufferedImage = iterator1.next();
LOGGER.info("Image type {}", bufferedImage.getType());
File outFile = new File("C:\\tmp" + File.separator + "shane" + i + ".tiff");
Imaging.writeImage(bufferedImage, outFile, ImageFormats.TIFF, params);
}
Actually testing performance, apache is alot slower...
Or use an old version of iText, which is alot faster:
private ByteArrayOutputStream convertTiffToPdf(InputStream imageStream) throws IOException, DocumentException {
Image image;
ByteArrayOutputStream out = new ByteArrayOutputStream();
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, out);
writer.setStrictImageSequence(true);
document.open();
RandomAccessFileOrArray ra = new RandomAccessFileOrArray(imageStream);
int pages = TiffImage.getNumberOfPages(ra);
for (int i = 1; i <= pages; i++) {
image = TiffImage.getTiffImage(ra, i);
image.setAbsolutePosition(0, 0);
image.scaleToFit(PageSize.A4.getWidth(), PageSize.A4.getHeight());
document.setPageSize(PageSize.A4);
document.newPage();
document.add(image);
}
document.close();
out.flush();
return out;
}

This is how I did it with ImageIO:
public List<BufferedImage> extractImages(InputStream fileInput) throws Exception {
List<BufferedImage> extractedImages = new ArrayList<BufferedImage>();
try (ImageInputStream iis = ImageIO.createImageInputStream(fileInput)) {
ImageReader reader = getTiffImageReader();
reader.setInput(iis);
int pages = reader.getNumImages(true);
for (int imageIndex = 0; imageIndex < pages; imageIndex++) {
BufferedImage bufferedImage = reader.read(imageIndex);
extractedImages.add(bufferedImage);
}
}
return extractedImages;
}
private ImageReader getTiffImageReader() {
Iterator<ImageReader> imageReaders = ImageIO.getImageReadersByFormatName("TIFF");
if (!imageReaders.hasNext()) {
throw new UnsupportedOperationException("No TIFF Reader found!");
}
return imageReaders.next();
}
I took part of the code from this blog.

All the proposed solutions require reading the multipage image page by page and write the pages back to new TIFF images. Unless you want to save the individual pages to different image format, there is no point in decoding the image. Given the special structure of the TIFF image, you can split a multipage TIFF into single TIFF images without decoding.
The TIFF tweaking tool (part of a larger image related library - "icafe" I am using is written from scratch with pure Java. It can delete pages, insert pages, retain certain pages, split pages from a multiple page TIFF as well as merge multipage TIFF images into one TIFF image without decompressing them.
After trying with the TIFF tweaking tool, I am able to split the image into 3 pages: page#0, page#1, and page#2
NOTE1: The original demo image for some reason contains "incorrect" StripByteCounts value 1 which is not the actual bytes needed for the images strip. It turns out that the image data are not compressed, so the actual bytes for each image strip could be figured out through other TIFF field values such as RowsPerStrip, SamplesPerPixel, ImageWidth, etc.
NOTE2: Since in splitting the TIFF, the above mentioned library doesn't need to decode and re-encode the image. So it's fast and it also keeps the original encoding and additional metadata of each pages!

It works to set the compression to default param.setCompression(32946);.
public static void doitJAI(String mutitiff) throws IOException {
FileSeekableStream ss = new FileSeekableStream(mutitiff);
ImageDecoder dec = ImageCodec.createImageDecoder("tiff", ss, null);
int count = dec.getNumPages();
TIFFEncodeParam param = new TIFFEncodeParam();
param.setCompression(32946);
param.setLittleEndian(false); // Intel
System.out.println("This TIF has " + count + " image(s)");
for (int i = 0; i < count; i++) {
RenderedImage page = dec.decodeAsRenderedImage(i);
File f = new File("D:/PSN/SCB/SCAN/bin/Debug/Temps/test/single_" + i + ".tif");
System.out.println("Saving " + f.getCanonicalPath());
ParameterBlock pb = new ParameterBlock();
pb.addSource(page);
pb.add(f.toString());
pb.add("tiff");
pb.add(param);
RenderedOp r = JAI.create("filestore",pb);
r.dispose();
}
}

The below code will convert the multiple tiff into individual's and produces an Excel sheet with list of tiff images.
You need to create a folder in the C drive and place your TIFF images into it then run this code.
Need to import the below jars.
1.sun-as-jsr88-dm-4.0-sources
2./sun-jai_codec
3.sun-jai_core
import java.awt.AWTException;
import java.awt.Robot;
import java.awt.image.RenderedImage;
import java.awt.image.renderable.ParameterBlock;
import java.io.File;
import java.io.IOException;
import javax.media.jai.JAI;
import javax.media.jai.RenderedOp;
import com.sun.media.jai.codec.FileSeekableStream;
import com.sun.media.jai.codec.ImageCodec;
import com.sun.media.jai.codec.ImageDecoder;
import com.sun.media.jai.codec.TIFFEncodeParam;
import java.io.*;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Calendar;
import javax.swing.JOptionPane;
import org.apache.poi.hssf.usermodel.HSSFSheet;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Row;
public class TIFF_Sepreator {
File folder = new File("C:/FAX/");
public static void infoBox(String infoMessage, String titleBar)
{
JOptionPane.showMessageDialog(null, infoMessage, "InfoBox: " + titleBar, JOptionPane.INFORMATION_MESSAGE);
}
public void splitting() throws IOException, AWTException
{
boolean FinalFAXFolder = (new File("C:/Final_FAX")).mkdirs();
File[] listOfFiles = folder.listFiles();
String dateFormat = new SimpleDateFormat("yyyyMMdd_HHmmss").format(Calendar.getInstance().getTime());
try{
if (listOfFiles.length > 0)
{
for(int file=0; file<listOfFiles.length; file++)
{
System.out.println(listOfFiles[file]);
FileSeekableStream ss = new FileSeekableStream(listOfFiles[file]);
ImageDecoder dec = ImageCodec.createImageDecoder("tiff", ss, null);
int count = dec.getNumPages();
TIFFEncodeParam param = new TIFFEncodeParam();
param.setCompression(TIFFEncodeParam.COMPRESSION_GROUP4);
param.setLittleEndian(false); // Intel
System.out.println("This TIF has " + count + " image(s)");
for (int i = 0; i < count; i++)
{
RenderedImage page = dec.decodeAsRenderedImage(i);
File f = new File("C:\\Final_FAX\\"+dateFormat+ file +i + ".tif");
System.out.println("Saving " + f.getCanonicalPath());
ParameterBlock pb = new ParameterBlock();
pb.addSource(page);
pb.add(f.toString());
pb.add("tiff");
pb.add(param);
RenderedOp r = JAI.create("filestore",pb);
r.dispose();
}
}
TIFF_Sepreator.infoBox("Find your splitted TIFF images in location 'C:/Final_FAX/' " , "Done :)");
WriteListOFFilesIntoExcel();
}
else
{
TIFF_Sepreator.infoBox("No files was found in location 'C:/FAX/' " , "Empty folder");
System.out.println("No files found");
}
}
catch(Exception e)
{
TIFF_Sepreator.infoBox("Unabe to run due to this error: " +e , "Error");
System.out.println("Error: "+e);
}
}
public void WriteListOFFilesIntoExcel(){
File[] listOfFiles = folder.listFiles();
ArrayList<File> files = new ArrayList<File>(Arrays.asList(folder.listFiles()));
try {
String filename = "C:/Final_FAX/List_Of_Fax_Files.xls" ;
HSSFWorkbook workbook = new HSSFWorkbook();
HSSFSheet sheet = workbook.createSheet("FirstSheet");
for (int file=0; file<listOfFiles.length; file++) {
System.out.println(listOfFiles[file]);
Row r = sheet.createRow(file);
r.createCell(0).setCellValue(files.get(file).toString());
}
FileOutputStream fileOut = new FileOutputStream(filename);
workbook.write(fileOut);
fileOut.close();
System.out.println("Your excel file has been generated!");
}
catch(Exception ex){
TIFF_Sepreator.infoBox("Unabe to run due to this error: " +ex , "Error");
System.out.println("Error: "+ex);
}
}
public static void main(String[] args) throws IOException, AWTException {
new TIFF_Sepreator().splitting();
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Make Tess4J get image from PDF file - java

make use of pdfutilities.convertpdf2png and use it like you did before with images.

Related

Read multi page Tiff image and write to a pdf in Java

How to get image of a PDF page(included text). not images in a PDF page

Text is missing when converting pdf file into image in java using pdfbox

Get Image from the document using Apache POI

Splitting a multipage TIFF image into individual images (Java)

Categories

Resources