Convert pdfReader to byte[] - Itext Java [duplicate] - java

How to get byte array from Itext PDFReader.
float width = 8.5f * 72;
float height = 11f * 72;
float tolerance = 1f;
PdfReader reader = new PdfReader("source.pdf");
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
Rectangle cropBox = reader.getCropBox(i);
float widthToAdd = width - cropBox.getWidth();
float heightToAdd = height - cropBox.getHeight();
if (Math.abs(widthToAdd) > tolerance || Math.abs(heightToAdd) > tolerance)
{
float[] newBoxValues = new float[] {
cropBox.getLeft() - widthToAdd / 2,
cropBox.getBottom() - heightToAdd / 2,
cropBox.getRight() + widthToAdd / 2,
cropBox.getTop() + heightToAdd / 2
};
PdfArray newBox = new PdfArray(newBoxValues);
PdfDictionary pageDict = reader.getPageN(i);
pageDict.put(PdfName.CROPBOX, newBox);
pageDict.put(PdfName.MEDIABOX, newBox);
}
}
From above code I need to get byte array from reader object. How?
1) Not working, getting empty byteArray.
OutputStream out = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, out);
stamper.close();
byte byteArray[] = (((ByteArrayOutputStream)out).toByteArray());
2) Not working, getting java.io.IOException: Error: Header doesn't contain versioninfo
ByteArrayOutputStream outputStream = new ByteArrayOutputStream( );
for (int i = 1; i <= reader.getNumberOfPages(); i++)
{
outputStream.write(reader.getPageContent(i));
}
PDDocument pdDocument = new PDDocument().load(outputStream.toByteArray( );)
Is there any other way to get byte array from PDFReader.

Let's take a the question from a different angle. It seems to me that you want to render a PDF page by page. If so, then your question is all wrong. Extracting the page content stream will not be sufficient as I already indicated: not a single renderer will be able to render such a stream because you don't pass any resources such as fonts, Form and Image XObjects,...
If you want to render separate pages from a PDF, you need to burst the document into separate single page full-blown PDF documents. These single page documents need to contain all the necessary information to render the page. This isn't memory friendly: suppose that you have a 100 KByte document of 10 pages where every page shows an 80 KByte logo, you'll end up with 10 documents that are each at least 80 KByte (times 10 makes already 800 KByte which is much more than the 10-page document where a single Image XObject is shared by the 10 pages).
You'd need to do something like this:
PdfReader reader = new PdfReader("source.pdf");
int n = reader.getNumberOfPages();
reader close();
ByteArrayOutputStream boas;
PdfStamper stamper;
for (int i = 0; i < n; ) {
reader = new PdfReader("source.pdf");
reader.selectPages(String.valueOf(++i));
baos = new ByteArrayOutputStream();
stamper = new PdfStamper(reader, baos);
stamper.close();
doSomethingWithBytes(baos.toByteArray);
}
In this case, baos.toByteArray() will contain the bytes of a valid PDF file. This wasn't the case in any of your attempts.

PdfReader reader = new PdfReader("source.pdf");
byte byteArray[] = reader.getPageContent(1); // page 1
Also have a look at this link

Related

Remove PdfName.Rotate value without rotation

I have to combine multiple pages from several files into new one PDF. The page orientation of all the pages must be portrait.
After this work is done, I am using a couple of programs to reset the rotation to zero without really rotate the page.
I want to use itext to remove the rotation value.
Taked from itext examples, I've tried something like this:
protected void manipulatePdf(String dest) throws Exception {
PdfDocument pdfDoc = new PdfDocument(new PdfReader(SRC), new PdfWriter(DEST));
int n = pdfDoc.getNumberOfPages();
PdfPage page;
PdfNumber rotate;
for (int p = 1; p <= n; p++) {
page = pdfDoc.getPage(p);
rotate = page.getPdfObject().getAsNumber(PdfName.Rotate);
page.setRotation(0);
pdfDoc.close();
}
}
This:
PdfDictionary diccionario = page.getPdfObject();
diccionario.Remove(iText.Kernel.Pdf.PdfName.Rotate);
And the function CopyPagesTo with the same result: The pages orientation has been altered.
Here there is an example file with 0, 90, 180 y 270 degrees.
The goal is set rotate value of all pages to zero keeping portrait mode:
https://filebin.ca/4vep0uuU1p2s/1.pdf
Any advice would be greatly appreciated.
I have found a solution using the SetIgnorePageRotationForContent function.
VB.NET example:
Dim srcPdf As iText.Kernel.Pdf.PdfDocument = New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfReader(srcFile))
Dim destPDF As New iText.Kernel.Pdf.PdfDocument(New iText.Kernel.Pdf.PdfWriter(destFile))
For contador = 1 To srcPdf.GetNumberOfPages
Dim srcPage = srcPdf.GetPage(contador)
Dim rotacion As iText.Kernel.Pdf.PdfNumber = srcPage.GetPdfObject().GetAsNumber(iText.Kernel.Pdf.PdfName.Rotate)
If IsNothing(rotacion) OrElse rotacion.IntValue = 0 Then
srcPdf.CopyPagesTo(contador, contador, destPDF)
Continue For
End If
Dim destPage As iText.Kernel.Pdf.PdfPage = destPDF.AddNewPage(New iText.Kernel.Geom.PageSize(srcPage.GetPageSizeWithRotation))
If rotacion.IntValue = 180 Then
destPage.GetPdfObject().Put(iText.Kernel.Pdf.PdfName.Rotate, New iText.Kernel.Pdf.PdfNumber(180))
Else
destPage.GetPdfObject().Put(iText.Kernel.Pdf.PdfName.Rotate, New iText.Kernel.Pdf.PdfNumber(rotacion.IntValue + 180))
End If
destPage.SetIgnorePageRotationForContent(True)
Dim canvas As New iText.Kernel.Pdf.Canvas.PdfCanvas(destPage)
Dim pageCopy As iText.Kernel.Pdf.Xobject.PdfFormXObject = srcPage.CopyAsFormXObject(destPDF)
canvas.AddXObject(pageCopy, 0, 0)
destPage.GetPdfObject().Remove(iText.Kernel.Pdf.PdfName.Rotate)
Next
destPDF.Close()
srcPdf.Close()

iText7. How to flatten an existing pdf document

I have masked an existing pdf document with images, as described into this question: iText7 Image Transparency
My issue is that somebody using Acrobat Reader DC Pro can still edit the document and remove the images, making the masking ineffective.
I have been thinking of flattening the pdDocument, but it seems the API applies to form, and not to the entire document.
I have tried the code below, but it is still possible to edit the pdf and remove the masking images.
Do you have any advice for this?
// Read the pdf input
PdfReader pdfReader = new PdfReader(value);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter pdfWriter = new PdfWriter(outputStream);
PdfDocument pdfDoc = new PdfDocument(pdfReader, pdfWriter);
Document document = new Document(pdfDoc);
// Creating an ImageData object
ImageData data = ImageDataFactory.create(fileName);
for (int x = 1; x < 800; ) {
for (int y = 1; y < 1000; ) {
Image image = new Image(data);
image.setFixedPosition(x , y);
document.add(image);
y = y + y1 + 40;
}
x = x + x1 + 40;
}
PdfAcroForm.getAcroForm(pdfDoc, true).flattenFields();
// The content has now been modified, return it as a stream
document.close();
I expect: the image cannot be removed, or the document cannot be edited

Non removable watermark on PDF file using iText in Java

We have a requirement where we need to add text watermark on magazines which has multiple rich images on each page. I tried com.itextpdf.jar version 5.0.6 to add the watermark but eventually I am able to remove it using Adobe Acrobat Pro.
I tried below option also but that too didn't work.
stamper.setFreeTextFlattening(true);
Is it possible with iText to add a watermark which can not be removed without much effort.
Below is my implementation.
public static void addWaterMark() throws IOException, DocumentException {
PdfReader reader = new PdfReader("C:/Trade-catalog/Catalog2017.pdf");
ByteArrayOutputStream outputPdf = new ByteArrayOutputStream();
PdfStamper stamper = new PdfStamper(reader, outputPdf);
String bodyWatermarkText = "12345 - John Smith";
String bodyWatermarkRotation = "35";
String footerWatermarkText = "Richard Parker";
BaseFont font = BaseFont.createFont("/fonts/micross.ttf", "Cp1250", BaseFont.EMBEDDED);
PdfGState state = new PdfGState();
state.setFillOpacity(0.3f);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
Rectangle thisPageSize = reader.getPageSize(i);
PdfPatternPainter bodyPainter = stamper.getOverContent(i).createPattern(thisPageSize.getWidth(),
thisPageSize.getHeight());
bodyPainter.setColorFill(new BaseColor(0, 0, 0));
bodyPainter.beginText();
bodyPainter.setTextRenderingMode(PdfPatternPainter.TEXT_RENDER_MODE_FILL);
bodyPainter.setFontAndSize(font, 60);
bodyPainter.showTextAlignedKerned(Element.ALIGN_CENTER, bodyWatermarkText, thisPageSize.getWidth() / 2,
thisPageSize.getHeight() / 2, Integer.valueOf(bodyWatermarkRotation));
bodyPainter.showTextAlignedKerned(Element.ALIGN_RIGHT, footerWatermarkText, thisPageSize.getWidth() * 0.97f,
thisPageSize.getHeight() * 0.015f, 0);
bodyPainter.endText();
PdfContentByte overContent = stamper.getOverContent(i);
overContent.setGState(state);
overContent.setColorFill(new PatternColor(bodyPainter));
overContent.rectangle(thisPageSize.getLeft(), thisPageSize.getBottom(), thisPageSize.getWidth(),
thisPageSize.getHeight());
overContent.fill();
overContent.setFlatness(100);
}
stamper.close();
FileOutputStream outputStream = new FileOutputStream(
"C:/Trade-catalog/output/TradeCatalog2017Watermarked_bodyPainter.pdf");
outputPdf.writeTo(outputStream);
outputPdf.close();
reader.close();
}

Retrieve the page number of an image in pdf- IText

I am using the code from the below link to render the images
MyImageRenderListener - IText
Below is my try block of the Code. What I am actually doing is finding the DPI of the image and if the dpi of the image is below 300 then writing it in a text file.
NOW, I also want to write the page numbers where these images are located in the PDF. How can I obtain the Page Number of that image?
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
BufferedImage img = null;
String txtfile = "results/results.txt";
PdfDictionary imageDict = renderInfo.getImage().getDictionary();
float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue();
float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue();
float widthUu = renderInfo.getImageCTM().get(Matrix.I11);
float heigthUu = renderInfo.getImageCTM().get(Matrix.I22);
float widthIn = widthUu/72;
float heightIn = heigthUu/72;
float imagepdi = widthPx/widthIn;
filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
System.out.println(filename+"-->"+imagepdi);
if(imagepdi < 300){
File file = new File("C:/Users/Abhinav/workspace/itext/results/result.txt");
if(filename != null){
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
file.setReadable(true, false);
file.setExecutable(true, false);
file.setWritable(true, false);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(filename);
bw.write("\r\n");
bw.close();
}
}
This is a strange question, because it is incomplete and illogical.
Why is your question incomplete?
You are using MyImageRenderListener in the context of another example, ExtractImages:
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
reader.close();
In this example, you loop over every page number to examine every separate page. Hence you know the page number whenever MyImageRenderListener returns an image.
Images are stored inside a PDF as external objects (aka XObject). MyImageRenderListener returns what's stored in such a stream object (containing the bytes of the image). So far, so good.
Why is your question illogical?
Because the whole purpose of storing images in XObject is to be able to reuse the same image stream. Imagine an image of a logo. That image can be present on every page of the document. In this case, MyImageRenderListener will give you the same image (from the same stream) as many times as there are pages, but in reality, there is only one image, and it's external to the page content. It doesn't make sense for that image to "know" the page it is on: it is on every page. The same logic applies even when the image is only used on one page. That is inherent to the design of PDF: an image stream doesn't know which page it belongs to. The link between the image stream and the page exists through the /XObject entry in the /Resources of the page dictionary.
What would be an elegant way to solve this?
Create a member-variable in MyImageRenderListener, e.g.:
protected int pagenumber;
public void setPagenumber(int pagenumber) {
this.pagenumber = pagenumber;
}
Use the setter from your loop:
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
listener.setPagenumber(i);
parser.processContent(i, listener);
}
reader.close();
Now you can use pagenumber in the renderImage(ImageRenderInfo renderInfo) method. This way, you'll always know which page is being examined when this method is triggered.

iText PDFDocument page size inaccurate

I am trying to add a header to existing pdf documents in Java with iText. I can add the header at a fixed place on the document, but all the documents are different page sizes, so it is not always at the top of the page. I have tried getting the page size so that I could calculate the position of the header, but it seems as if the page size is not actually what I want. On some documents, calling reader.getPageSize(i).getTop(20) will place the text in the right place at the top of the page, however, on some different documents it will place it half way down the page. Most of the pages have been scanned be a Xerox copier, if that makes a difference. Here is the code I am using:
PdfReader reader = new PdfReader(readFilePath);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(writeFilePath));
BaseFont bf = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
PdfContentByte cb = stamper.getOverContent(i);
cb.beginText();
cb.setFontAndSize(bf, 14);
float x = reader.getPageSize(i).getWidth() / 2;
float y = reader.getPageSize(i).getTop(20);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, "Copy", x, y, 0);
cb.endText();
}
stamper.close();
PDF that works correctly
PDF that works incorrectly
Take a look at the StampHeader1 example. I adapted your code, introducing ColumnText.showTextAligned() and using a Phrase for the sake of simplicity (maybe you can change that part of your code too):
public void manipulatePdf(String src, String dest) throws IOException, DocumentException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
Phrase header = new Phrase("Copy", new Font(FontFamily.HELVETICA, 14));
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
float x = reader.getPageSize(i).getWidth() / 2;
float y = reader.getPageSize(i).getTop(20);
ColumnText.showTextAligned(
stamper.getOverContent(i), Element.ALIGN_CENTER,
header, x, y, 0);
}
stamper.close();
reader.close();
}
As you have found out, this code assumes that no rotation was defined.
Now take a look at the StampHeader2 example. I'm using your "Wrong" file and I've added one extra line:
stamper.setRotateContents(false);
By telling the stamper not to rotate the content I'm adding, I'm adding the content using the coordinates as if the page isn't rotated. Please take a look at the result: stamped_header2.pdf. We added "Copy" at the top of the page, but as the page is rotated, we see the word appear on the side. The word is rotated because the page is rotated.
Maybe that's what you want, maybe it isn't. If it isn't, please take a look at StampHeader3 in which I calculate x and y differently, based on the rotation of the page:
if (reader.getPageRotation(i) % 180 == 0) {
x = reader.getPageSize(i).getWidth() / 2;
y = reader.getPageSize(i).getTop(20);
}
else {
x = reader.getPageSize(i).getHeight() / 2;
y = reader.getPageSize(i).getRight(20);
}
Now the word "Copy" appears on what is perceived as the "top of the page" (but in reality, it could be the side of the page): stamped_header3.pdf

Categories