i didn't found anything specific to that topic, therefore i'm using this board and hope to get a question/help.
What i have: i have a pdf-document, which is very big (adobe reader says: 21.399,4 x 15.123,7 mm, zoom-factory, when opened is ~1,64% screen-filling!). It's a construction drawing, but this doesn't matter. When i'm using sublime to analyze the structure, i can find the following:
pdf-Version is: 1.6
CropBox[0.0 0.0 14400.0 10177.0]
/Rotate 0/Type/Page/UserUnit 4.2125
What i need: a smaller document, because of the big size, i can't go on with processing the file
What i tried: using iText to reduce the UserUnit to default "1" for the first step. This should make some things easier. My code in the java-programm looks like this:
PdfReader reader = new PdfReader(inFile.getAbsolutePath());
try (FileOutputStream outStream = new FileOutputStream(outFile)) {
PdfStamper pdfStamper = new PdfStamper(reader, outStream);
PdfWriter writer = pdfStamper.getWriter();
writer.setUserunit(1f);
pdfStamper.close();
}
another thing, which i tried, was:
PdfDictionary pageDict;
pageDict = reader.getCatalog();
pageDict.put(PdfName.USERUNIT, new PdfNumber(1f));
Both things didn't work, so my questions are:
Is it possible to change UserUnits of an existing file? Or do i need to create a new one with the properties i want to have and then writing the content of the existing pdf in my new one?
if it's possible: what else do i need to do to change the UserUnits?
With greetings from Heidelberg,
sincerly
D. Pfizenmaier
Please take a look at the ScaleRotate example. It was written in answer to the question Rotating in Itextsharp while preserving comment location & orientation where you'll find more info about the user unit.
This is the code that is relevant:
PdfReader reader = new PdfReader(src);
int n = reader.getNumberOfPages();
PdfDictionary page;
for (int p = 1; p <= n; p++) {
page = reader.getPageN(p);
page.put(PdfName.USERUNIT, new PdfNumber(1f));
}
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
stamper.close();
reader.close();
Note that you are getting the root of the document using getCatalog() but the name you give to the variable sounds as if you assume that it is a page dictionary. That won't work...
Related
I am programmatically trying to create an pdf by superimposing two pdf files using itextpdf. The PDF that was made goes into this flattening process for some reason, how do I skip flattening or make the process faster.
PdfReader reader = new PdfReader(template);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
PdfReader r;
PdfImportedPage page;
int i=1;
for (String path : patterns) {
r = new PdfReader(path);
for(int j=1;j<=r.getNumberOfPages();j++) {
page = stamper.getImportedPage(r, j);
PdfContentByte canvas = stamper.getUnderContent(i++);
canvas.addTemplate(page, 0, 0);
stamper.getWriter().freeReader(r);
}
r.close();
}
stamper.close();
The PDF that was generated from Adobe Illustrator had an masked image instead of a proper component. I am sorry if the answer seems vague but I am not a designer but the flattening process happens when the one or more of the original PDFs that are being merged aren't proper.
I have below iText code, I want to copy one page from src pdf file to other pdf file(I have existing PdfStamper, here it is mainPdfStamper).
PdfReader srcReader = new PdfReader(new FileInputStream("source.pdf"));
File file = File.createTempFile("temporary", ".pdf");
PdfStamper pdfStamper = new PdfStamper(srcReader, new FileOutputStream(file));
PdfImportedPage importedPage = pdfStamper.getImportedPage(srcReader, 1);
// copying extracted page from src pdf to existing pdf
mainPdfStamper.getOverContent(1).addTemplate(importedPage, 10,10);
pdfStamper.close();
srcReader.close();
This is not working and I am not aware of how to achieve this. In short, I want to copy one page from source pdf to existing pdf. Please help.
UPDATE
Below code worked as per the answer from Bruno.
PdfReader reader2 = new PdfReader(srcPdf.getAbsolutePath());
PdfImportedPage page = pdfStamper.getImportedPage(reader2, 1);
stamper.insertPage(1, reader2.getPageSize(1));
pdfStamper.getUnderContent(1).addTemplate(page, 100, 100);
// Close the stamper and the readers
pdfStamper.close();
reader2.close();
Please read the documentation, for instance chapter 6 of iText in Action. If you go to section 6.3.4 ("Inserting pages into an existing document"), you'll find the InsertPages example.
You only need this code if p is the page number indicating where you want to insert the page, main_file is the path to your main file and to_be_inserted the path to the file that needs to be inserted (dest is the path to the resulting file):
PdfReader reader = new PdfReader(main_file);
PdfReader reader2 = new PdfReader(to_be_inserted);
// Create a stamper
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
// Create an imported page to be inserted
PdfImportedPage page = stamper.getImportedPage(reader2, 1);
stamper.insertPage(p, reader2.getPageSize(1));
stamper.getUnderContent(i).addTemplate(page, 0, 0);
// Close the stamper and the readers
stamper.close();
reader.close();
reader2.close();
This is only one way to combine pages from two files. You can also use PdfCopy for this purpose. The advantage of using PdfCopy is the fact that you'll preserve the interactive features of the interactive page. When using PdfStamper, you'll lose any interactive feature (e.g. all links) that were present in the inserted page.
I have a pdf with 15gb size and I intented reading it with iText library, when my code read the class PdfReader the wait time is about 5 minutes for get the object PdfReader. This is slower than I'd like.
When this code is executed i have wait about five minutes.
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(filename), null);
Next, if I have de PdfReader object I can get any page very fast.
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream("result.pdf"));
document.open();
for (int i = 2400000; i <= 2400010; i++) {
copy.addPage(copy.getImportedPage(reader, i));
}
document.close();
Is there any way to read a pdf more efficienty??, i think could read just the xref table and write in the filesystem and next read this bytes and cast in a PdfReader but i don't know if that is possible.
More general, is there a library that can usefully work with PDFs that don't fit in RAM?
I´m generating PDFs using iText and it works fine. But I need a way to import html styled informations from an existing PDF at some point.
I know i could just use the XMLWorker class to generate the text directly from html in my own document. But cause I´m not sure whether it actually supports all html features I´m looking to work around this.
Therefore a PDF is generated from html using XSLT. The content of this PDF then should be copied to my document.
There are two ways discribed in the book ("iText in Action").
One that parses the PDF and gets you the text (or other informations) from the document using PdfReaderContentParser and TextExtractionStrategy.
It looks like this:
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
TextExtractionStrategy strategy;
for(int i=1;i<=reader.getNumberOfPages();i++){
strategy = parser.processContent(i, new LocationTextExtractionStrategy());
document.add(new Chunk(strategy.getResultantText()));
}
But this only prints plain text to the document. Obviously there are more ExtractionStrategys and maybe one of them does exactly what i want but i couldn´t find it yet.
The second way is to copy an itextpdf.text.Image of each side of the PDF to your document. This is obviously not a good idea, cause it will add the entire page to your document even if there is only one line of text in the existing PDF. Its done like this:
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
PdfReader reader = new PdfReader(pdf);
PdfImportedPage page;
for(int i=1;i<=reader.getNumberOfPages();i++){
page = writer.getImportedPage(reader,i);
document.add(Image.getInstance(page));
}
Like I said this copys all the empty lines at the end of the PDF aswell, but i need to continue my text immediatly after the last line of text.
If I could convert this itext.text.Image into a java.awt.BufferedImage I could use getSubImage(); and informations i can extract from the PDF to cut away all the empty lines. But i wasn´t able to find a way to to this.
This are the two ways i found. But cause none of them is suitable for my purpose as they are my question is:
Is there a way to import everything except the empty lines at the end, but including text-style informations, tables and everything else from a PDF to my document using iText?
You can trim away empty space of the XSLT generated PDF and then import the trimmed pages as in your code.
Example code
The following code borrows from the code in my answer to Using iTextPDF to trim a page's whitespace. In contrast to the code there, though, we have to manipulate the media box, not the crop box, because this is the only box respected by PdfWriter.getImportedPage.
Before importing a page from a given PdfReader, crop it using this method:
static void cropPdf(PdfReader reader) throws IOException
{
int n = reader.getNumberOfPages();
for (int i = 1; i <= n; i++)
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MarginFinder finder = parser.processContent(i, new MarginFinder());
Rectangle rect = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
PdfDictionary page = reader.getPageN(i);
page.put(PdfName.MEDIABOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
}
}
(excerpt from ImportPageWithoutFreeSpace.java)
The extended render listener MarginFinder is taken as is from the question linked to above. You can find a copy here: MarginFinder.java.
Example run
Using this code
PdfReader readerText = new PdfReader(docText);
cropPdf(readerText);
PdfReader readerGraphics = new PdfReader(docGraphics);
cropPdf(readerGraphics);
try ( FileOutputStream fos = new FileOutputStream(new File(RESULT_FOLDER, "importPages.pdf")))
{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, fos);
document.open();
document.add(new Paragraph("Let's import 'textOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerText, 1)));
document.add(new Paragraph("and now 'graphicsOnly.pdf'", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.add(Image.getInstance(writer.getImportedPage(readerGraphics, 1)));
document.add(new Paragraph("That's all, folks!", new Font(FontFamily.HELVETICA, 12, Font.BOLD)));
document.close();
}
finally
{
readerText.close();
readerGraphics.close();
}
(excerpt from unit test method testImportPages in ImportPageWithoutFreeSpace.java)
I imported both the page from the docText document
and the page from the docGraphics document
into a new document with some text before, between, and after. The result:
As you can see, source styles are preserved but free space around is discarded.
I have multiple PDFs that get populated with multiple records (a.pdf,b.pdf,c[0-9].pdf,d[0-9].pdf,ez.pdf) using acroforms and pdfbox.
The resulting files (aflat.pdf,bflat.pdf,c[0-9]flat.pdf,d[0-9]flat.pdf,ezflat.pdf) should have their forms(dictionaries and whatever adobe uses) removed but the fields filled as raw text saved on the pdf (setReadOnly is not what I want!).
PdfStamper can only remove fields without saving their content but I've found some references to PdfContentByte as a way to save the content. Alas, the documentation is too brief to understand how I should do this.
As a last resort I could use FieldPosition to write directly on the PDF. Has anyone ever encountered such problem? How do I solve it?
UPDATE: Saving a single page of b.pdf yields a valid bfilled.pdf but a blank bflattened.pdf. Saving the whole document solved the issue.
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
/*wrong approach*/doc.importPage((PDPage)pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
/*wrong approach*/doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);//<---right approach
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}
Use PdfStamper.setFormFlattening(true) to get rid of the fields and write them as content.
Always use the whole page when working with acroforms
populateB();
try (PDDocument doc = new PDDocument(); FileOutputStream stream = new FileOutputStream("bfilled.pdf")) {
//importing the page will corrupt the fields
doc.importPage((PDPage) pdfDocuments.get(0).getDocumentCatalog().getAllPages().get(0));
doc.save(stream);
//save the whole document instead
pdfDocuments.get(0).save(stream);
}
try (FileOutputStream stream = new FileOutputStream("bflattened.pdf")) {
PdfStamper stamper = new PdfStamper(new PdfReader("bfilled.pdf"), stream);
stamper.setFormFlattening(true);
stamper.close();
}