We are attempting to generate documents using iText that are formed largely from "template" files - smaller PDF files that are combined together into one composite file using the PdfContentByte.addTemplate method. We then automatically and silently print the new file using the *nix command lp. This usually works; however occasionally, some files that are generated will fail to print. The document proceeds through all queues and arrives at the printer proper (a Lexmark T652n, in this case), its physical display gives a message of pending progress, and even its mechanical components whir up in preparation - then, the printing job vanishes spontaneously without a trace, and the printer returns to being ready.
The oddity in how specific this issue tends to be. For starters, the files in question print without fail when done manually through Adobe PDF Viewer, and can be read fine by editors like Adobe Live Cycle. Furthermore, the content of the file effects whether it is plagued by this issue, but not in a clear way - adding a specific template 20 times could cause the problem, while doing it 19 or 21 times might be fine, or using a different template will change the pattern entirely and might cause it to happen instead after 37 times. Generating a document with the exact same content will be consistent on whether or not the issue occurs, but any subtle and seemingly irrelevant change in content will change whether the problem happens.
While it could be considered a hardware issue, the fact remains that certain iText-generated files have this issue while others do not. Is our method of file creation sometimes creating files that are somehow considered corrupt only to the printer and only sometimes?
Here is a relatively small code example that generates documents using the repetitive template method similar to our main program. It uses this file as a template and repeats it a specified number of times.
public class PDFFileMaker {
private static final int INCH = 72;
final private static float MARGIN_TOP = INCH / 4;
final private static float MARGIN_BOTTOM = INCH / 2;
private static final String DIREC = "/pdftest/";
private static final String OUTPUT_FILEPATH = DIREC + "cooldoc_%d.pdf";
private static final String TEMPLATE1_FILEPATH = DIREC + "template1.pdf";
private static final Rectangle PAGE_SIZE = PageSize.LETTER;
private static final Rectangle TEMPLATE_SIZE = PageSize.LETTER;
private ByteArrayOutputStream workingBuffer;
private ByteArrayOutputStream storageBuffer;
private ByteArrayOutputStream templateBuffer;
private float currPosition;
private int currPage;
private int formFillCount;
private int templateTotal;
private static final int DEFAULT_NUMBER_OF_TIMES = 23;
public static void main (String [] args) {
System.out.println("Starting...");
PDFFileMaker maker = new PDFFileMaker();
File file = null;
try {
file = maker.createPDF(DEFAULT_NUMBER_OF_TIMES);
}
catch (Exception e) {
e.printStackTrace();
}
if (file == null || !file.exists()) {
System.out.println("File failed to be created.");
}
else {
System.out.println("File creation successful.");
}
}
public File createPDF(int inCount) throws Exception {
templateTotal = inCount;
String sFilepath = String.format(OUTPUT_FILEPATH, templateTotal);
workingBuffer = new ByteArrayOutputStream();
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
startPDF();
doMainSegment();
finishPDF(sFilepath);
return new File(sFilepath);
}
private void startPDF() throws DocumentException, FileNotFoundException {
Document d = new Document(PAGE_SIZE);
PdfWriter w = PdfWriter.getInstance(d, workingBuffer);
d.open();
d.add(new Paragraph(" "));
d.close();
w.close();
currPosition = 0;
currPage = 1;
formFillCount = 1;
}
protected void finishPDF(String sFilepath) throws DocumentException, IOException {
//Transfers data from buffer 1 to builder file
PdfReader r = new PdfReader(workingBuffer.toByteArray());
PdfStamper s = new PdfStamper(r, new FileOutputStream(sFilepath));
s.setFullCompression();
r.close();
s.close();
}
private void doMainSegment() throws FileNotFoundException, IOException, DocumentException {
File fTemplate1 = new File(TEMPLATE1_FILEPATH);
for (int i = 0; i < templateTotal; i++) {
doTemplate(fTemplate1);
}
}
private void doTemplate(File f) throws FileNotFoundException, IOException, DocumentException {
PdfReader reader = new PdfReader(new FileInputStream(f));
//Transfers data from the template input file to temporary buffer
PdfStamper stamper = new PdfStamper(reader, templateBuffer);
stamper.setFormFlattening(true);
AcroFields form = stamper.getAcroFields();
//Get size of template file via looking for "end" Acrofield
float[] area = form.getFieldPositions("end");
float size = TEMPLATE_SIZE.getHeight() - MARGIN_TOP - area[4];
//Requires Page Break
if (size >= PAGE_SIZE.getHeight() - MARGIN_TOP - MARGIN_BOTTOM + currPosition) {
PdfReader subreader = new PdfReader(workingBuffer.toByteArray());
PdfStamper substamper = new PdfStamper(subreader, storageBuffer);
currPosition = 0;
currPage++;
substamper.insertPage(currPage, PAGE_SIZE);
substamper.close();
subreader.close();
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
}
//Set Fields
form.setField("field1", String.format("Form Text %d", formFillCount));
form.setField("page", String.format("Page %d", currPage));
formFillCount++;
stamper.close();
reader.close();
//Read from working buffer, stamp to storage buffer, stamp template from template buffer
reader = new PdfReader(workingBuffer.toByteArray());
stamper = new PdfStamper(reader, storageBuffer);
reader.close();
reader = new PdfReader(templateBuffer.toByteArray());
PdfImportedPage page = stamper.getImportedPage(reader, 1);
PdfContentByte cb = stamper.getOverContent(currPage);
cb.addTemplate(page, 0, currPosition);
stamper.close();
reader.close();
//Reset buffers - working buffer takes on storage buffer data, storage and template buffers clear
workingBuffer = storageBuffer;
storageBuffer = new ByteArrayOutputStream();
templateBuffer = new ByteArrayOutputStream();
currPosition -= size;
}
Running this program with a DEFAULT_NUMBER_OF_TIMES of 23 produces this document and causes the failure when sent to the printer. Changing it to 22 times produces this similar-looking document (simply with one less "line") which does not have the problem and prints successfully. Using a different PDF file as a template component completely changes these numbers or makes it so that it may not happen at all.
While this problem is likely too specific and with too many factors for other people to reasonably be expected to reproduce, the question of possibilities remains. What about the file generation could cause this unusual behavior? What might cause one file to be acceptable to a specific printer but another, generated in the same manner in different only in seemingly non-trivial ways, to be unacceptable? Is there a bug in iText produced by using the stamper template commands too heavily? This has been a long-running bug with us for a while now, so any assistance is appreciate; additionally, I am willing to answer questions or have extended conversations in chat as necessary in an effort to get to the bottom of this.
The design of your application more or less abuses the otherwise perfectly fine PdfStamper functionality.
Allow me to explain.
The contents of a page can be expressed as a stream object or as an array of a stream objects. When changing a page using PdfStamper, the content of this page is always an array of stream objects, consisting of the original stream object or the original array of stream objects to which extra elements are added.
By adding the same template creating a PdfStamper object over and over again, you increase the number of elements in the page contents array dramatically. You also introduce a huge number of q and Q operators that save and restore the stack. The reason why you have random behavior is clear: the memory and CPU available to process the PDF can vary from one moment to another. One time, there will be sufficient resources to deal with 20 q operators (saves the state), the next time there will only be sufficient resources to deal with 19. The problem occurs when the process runs out of resources.
While the PDFs you're creating aren't illegal according to ISO-32000-1, some PDF processors simply choke on these PDFs. iText is a toolbox that allows you to create PDFs that can make me very happy when I look under the hood, but it also allows you to create horrible PDFs if you don't use the toolbox wisely. The latter is what happened in your case.
You should solve this be reusing the PdfStamper instance instead of creating a new PdfStamper over and over again. If that's not possible, please post another question, using less words, explaining exactly what you want to achieve.
Suppose that you have many different source files with PDF snippets that need to be added to a single page. For instance: suppose that each PDF snippet was a coupon and you need to create a sheet with 30 coupons. Than you'd use a single PdfWriter instance, import pages with getImportedPage() and add them at the correct position using addTemplate().
Of course: I have no idea what your project is about. The idea of coupons of a page was inspired by your test PDF.
Related
I'm using iText (v 2.1.7) and I need to find the size, in bytes, of a specific page.
I've written the following code:
public static long[] getPageSizes(byte[] input) throws IOException {
PdfReader reader;
reader = new PdfReader(input);
int pageCount = reader.getNumberOfPages();
long[] pageSizes = new long[pageCount];
for (int i = 0; i < pageCount; i++) {
pageSizes[i] = reader.getPageContent(i+1).length;
}
reader.close();
return pageSizes;
}
But it doesn't work properly. The reader.getPageContent(i+1).length; instruction returns very small values (<= 100 usually), even for large pages that are more than 1MB, so clearly this is not the correct way to do this.
But what IS the correct way? Is there one?
Note: I've already checked this question, but the offered solution consists of writing each page of the PDF to disk and then checking the file size, which is extremely inefficient and may even be wrong, since I'm assuming this would repeat the PDF header and metadata each time. I was searching for a more "proper" solution.
Well, in the end I managed to get hold of the source code for the original program that I was working with, which only accepted PDFs as input with a maximum "page size" of 1MB. Turns out... what it actually means by "page size" was fileSize / pageCount -_-^
For anyone that actually needs the precise size of a "standalone" page, with all content included, I've tested this solution and it seems to work well, tho it probably isn't very efficient as it writes out an entire PDF document for each page. Using a memory stream instead of a disk-based one helps, but I don't know how much.
public static int[] getPageSizes(byte[] input) throws IOException {
PdfReader reader;
reader = new PdfReader(input);
int pageCount = reader.getNumberOfPages();
int[] pageSizes = new int[pageCount];
for (int i = 0; i < pageCount; i++) {
try {
Document doc = new Document();
ByteArrayOutputStream bous = new ByteArrayOutputStream();
PdfCopy copy= new PdfCopy(doc, bous);
doc.open();
PdfImportedPage page = copy.getImportedPage(reader, i+1);
copy.addPage(page);
doc.close();
pageSizes[i] = bous.size();
} catch (DocumentException e) {
e.printStackTrace();
}
}
reader.close();
return pageSizes;
}
I am trying to compress PDF document in Java. The original file size is 1.5-2 MB and we need to bring it down to less than 1 MB. I tried using iText compression on it, however the results are not that effective and file size is still greater than 1 MB.
byte[] mergedFileContent = byteArrayOS.toByteArray();
reader = new PdfReader(mergedFileContent);
PdfStamper stamper = new PdfStamper(reader, byteArrOScomp);
stamper.setFullCompression();
stamper.close();
reader.close();
Has anyone worked on something similar? Any inputs would be appreciated.
You might want to look into the official iText examples, in particular the sample HelloWorldCompression is about applying different degrees of compression both at initial PDF creation time and as a post-processing step.
The following method from that sample may help you along.
public void compressPdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest), PdfWriter.VERSION_1_5);
stamper.getWriter().setCompressionLevel(9);
int total = reader.getNumberOfPages() + 1;
for (int i = 1; i < total; i++) {
reader.setPageContent(i, reader.getPageContent(i));
}
stamper.setFullCompression();
stamper.close();
reader.close();
}
If you wonder how I found it: I googled for "itextpdf example full compression" and it was the second result. (The first find contains the same method but is not from the official iText site.)
You could gzip, zip, etc. the file afterwards. It isn't really a PDF compression format, but if you are constrained and want better compression then compressing the entire thing may have good results since it can compress meta-level data.
I have for example 1000 images and their names are all very similar, they just differ in the number. "ImageNmbr0001", "ImageNmbr0002", ....., ImageNmbr1000 etc.;
I would like to get every image and store them into an ImageProcessor Array.
So, for example, if I use a method on element of this array, then this method is applied on the picture, for example count the black pixel in it.
I can use a for loop the get numbers from 1 to 1000, turn them into a string and create substrings of the filenames to load and then attach the string numbers again to the file name and let it load that image.
However I would still have to turn it somehow into an element I can store in an array and I don't a method yet, that receives a string, in fact the file path and returns the respective ImageProcessor that is stored at it's end.
Also my approach at the moment seems rather clumsy and not too elegant. So I would be very happy, if someone could show me a better to do that using methods from those packages:
import ij.ImagePlus;
import ij.plugin.filter.PlugInFilter;
import ij.process.ImageProcessor;
I think I found a solution:
Opener opener = new Opener();
String imageFilePath = "somePath";
ImagePlus imp = opener.openImage(imageFilePath);
ImageProcesser ip = imp.getProcessor();
That do the job, but thank you for your time/effort.
I'm not sure if I undestand what you want exacly... But I definitly would not save each information of each image in separate files for 2 reasons:
- It's slower to save and read the content of multiple files compare with 1 medium size file
- Each file adds overhead (files need Path, minimum size in disk, etc)
If you want performance, group multiple image descriptions in single description files.
If you dont want to make a binary description file, you can always use a Database, which is build for it, performance in read and normally on save.
I dont know exacly what your needs, but I guess you can try make a binary file with fixed size data and read it later
Example:
public static void main(String[] args) throws IOException {
FileOutputStream fout = null;
FileInputStream fin = null;
try {
fout = new FileOutputStream("description.bin");
DataOutputStream dout = new DataOutputStream(fout);
for (int x = 0; x < 1000; x++) {
dout.writeInt(10); // Write Int data
}
fin = new FileInputStream("description.bin");
DataInputStream din = new DataInputStream(fin);
for (int x = 0; x < 1000; x++) {
System.out.println(din.readInt()); // Read Int data
}
} catch (Exception e) {
} finally {
if (fout != null) {
fout.close();
}
if (fin != null) {
fin.close();
}
}
}
In this example, the code writes integers in "description.bin" file and then read them.
This is pretty fast in Java, since Java uses "channels" for files by default
I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks
We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function
Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.
Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0
I have been trying to split one big PDF file to multiple pdf files based on its size. I was able to split it but it only creates one single file and rest of the file data is lost. Means it does not create more than one files to split it. Can anyone please help? Here is my code
public static void main(String[] args) {
try {
PdfReader Split_PDF_By_Size = new PdfReader("C:\\Temp_Workspace\\TestZip\\input1.pdf");
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream("C:\\Temp_Workspace\\TestZip\\File1.pdf"));
document.open();
int number_of_pages = Split_PDF_By_Size.getNumberOfPages();
int pagenumber = 1; /* To generate file name dynamically */
// int Find_PDF_Size; /* To get PDF size in bytes */
float combinedsize = 0; /* To convert this to Kilobytes and estimate new PDF size */
for (int i = 1; i < number_of_pages; i++ ) {
float Find_PDF_Size;
if (combinedsize == 0 && i != 1) {
document = new Document();
pagenumber++;
String FileName = "File" + pagenumber + ".pdf";
copy = new PdfCopy(document, new FileOutputStream(FileName));
document.open();
}
copy.addPage(copy.getImportedPage(Split_PDF_By_Size, i));
Find_PDF_Size = copy.getCurrentDocumentSize();
combinedsize = (float)Find_PDF_Size / 1024;
if (combinedsize > 496 || i == number_of_pages) {
document.close();
combinedsize = 0;
}
}
System.out.println("PDF Split By Size Completed. Number of Documents Created:" + pagenumber);
}
catch (Exception i)
{
System.out.println(i);
}
}
}
(BTW, it would have been great if you had tagged your question with itext, too.)
PdfCopy used to close the PdfReaders it imported pages from whenever the source PdfReader for page imports switched or the PdfCopy was closed. This was due to the original intended use case to create one target PDF from multiple source PDFs in combination with the fact that many users forget to close their PdfReaders.
Thus, after you close the first target PdfCopy, the PdfReader is closed, too, and no further pages are extracted.
If I interpret the most recent checkins into the iText SVN repository correctly, this implicit closing of PdfReaders is in the process of being removed from the code. Therefore, with one of the next iText versions, your code may work as intended.