Retrieve the page number of an image in pdf- IText

Retrieve the page number of an image in pdf- IText - java

I am using the code from the below link to render the images
MyImageRenderListener - IText
Below is my try block of the Code. What I am actually doing is finding the DPI of the image and if the dpi of the image is below 300 then writing it in a text file.
NOW, I also want to write the page numbers where these images are located in the PDF. How can I obtain the Page Number of that image?
try {
String filename;
FileOutputStream os;
PdfImageObject image = renderInfo.getImage();
BufferedImage img = null;
String txtfile = "results/results.txt";
PdfDictionary imageDict = renderInfo.getImage().getDictionary();
float widthPx = imageDict.getAsNumber(PdfName.WIDTH).floatValue();
float heightPx = imageDict.getAsNumber(PdfName.HEIGHT).floatValue();
float widthUu = renderInfo.getImageCTM().get(Matrix.I11);
float heigthUu = renderInfo.getImageCTM().get(Matrix.I22);
float widthIn = widthUu/72;
float heightIn = heigthUu/72;
float imagepdi = widthPx/widthIn;
filename = String.format(path, renderInfo.getRef().getNumber(), image.getFileType());
System.out.println(filename+"-->"+imagepdi);
if(imagepdi < 300){
File file = new File("C:/Users/Abhinav/workspace/itext/results/result.txt");
if(filename != null){
if (!file.exists()) {
file.createNewFile();
}
FileWriter fw = new FileWriter(file.getAbsoluteFile(),true);
file.setReadable(true, false);
file.setExecutable(true, false);
file.setWritable(true, false);
BufferedWriter bw = new BufferedWriter(fw);
bw.write(filename);
bw.write("\r\n");
bw.close();
}
}

This is a strange question, because it is incomplete and illogical.
Why is your question incomplete?
You are using MyImageRenderListener in the context of another example, ExtractImages:
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
parser.processContent(i, listener);
}
reader.close();
In this example, you loop over every page number to examine every separate page. Hence you know the page number whenever MyImageRenderListener returns an image.
Images are stored inside a PDF as external objects (aka XObject). MyImageRenderListener returns what's stored in such a stream object (containing the bytes of the image). So far, so good.
Why is your question illogical?
Because the whole purpose of storing images in XObject is to be able to reuse the same image stream. Imagine an image of a logo. That image can be present on every page of the document. In this case, MyImageRenderListener will give you the same image (from the same stream) as many times as there are pages, but in reality, there is only one image, and it's external to the page content. It doesn't make sense for that image to "know" the page it is on: it is on every page. The same logic applies even when the image is only used on one page. That is inherent to the design of PDF: an image stream doesn't know which page it belongs to. The link between the image stream and the page exists through the /XObject entry in the /Resources of the page dictionary.
What would be an elegant way to solve this?
Create a member-variable in MyImageRenderListener, e.g.:
protected int pagenumber;
public void setPagenumber(int pagenumber) {
this.pagenumber = pagenumber;
}
Use the setter from your loop:
PdfReader reader = new PdfReader(filename);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
MyImageRenderListener listener = new MyImageRenderListener(RESULT);
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
listener.setPagenumber(i);
parser.processContent(i, listener);
}
reader.close();
Now you can use pagenumber in the renderImage(ImageRenderInfo renderInfo) method. This way, you'll always know which page is being examined when this method is triggered.

Related

How to use iText to parse paths (such as lines in the document)

I am using iText to parse text in a PDF document, and i am using PdfContentStreamProcessor with a RenderListener. Such as:
PdfReader reader = new PdfReader(file.toURI().toURL());
int numberOfPages = reader.getNumberOfPages();
MyRenderListener listener = new MyRenderListener ();
PdfContentStreamProcessor processor = new PdfContentStreamProcessor(listener);
for (int pageNumber = 1; pageNumber <= numberOfPages; pageNumber++) {
PdfDictionary pageDic = reader.getPageN(pageNumber);
PdfDictionary resourcesDic = pageDic.getAsDict(PdfName.RESOURCES);
Rectangle pageSize = reader.getPageSize(pageNumber);
listener.startPage(pageNumber, pageSize);
processor.processContent(ContentByteUtils.getContentBytesForPage(reader, pageNumber), resourcesDic);
}
I have no problem to get the text with the renderText(TextRenderInfo) method, but how do I parse the graphic content appart from images? For example in my case I would like to get:
Text content which is in a box
Horizontal lines

Per mkl comment, by using ExtRenderListener I am able to get the geometries. I used How to extract the color of a rectangle in a PDF, with iText for reference

I am not able to create two pages of pdf using PdfDocument of Android

Devs! I am using PdfDocument to try to save the text as a pdf file. So I wrote this code :
public void Convert(String text, String fileName){
PdfDocument myPdfDocument = new PdfDocument();
PdfDocument.PageInfo myPageInfo = new PdfDocument.PageInfo.Builder(this.pageWidth,this.pageHeight,this.pageNumber).create();
PdfDocument.Page myPage = myPdfDocument.startPage(myPageInfo);
Paint myPaint = new Paint();
Paint backPaint = new Paint();
backPaint.setColor(this.backgroundcolor);
myPaint.setTextSize(this.textSize);
myPaint.setColor(this.fontcolor);
//To find size of screen so that line is perfectly fit
DisplayMetrics dm = this.context.getResources().getDisplayMetrics();
int widthOfScreen = dm.widthPixels;
int periodForSplittingText = widthOfScreen / Math.round(this.textSize);
// To make line split in width of the screen here;
String insert = "\n";
StringBuilder builder = new StringBuilder(
text.length() + insert.length() * (text.length()/periodForSplittingText)+1);
int index = 0;
String prefix = "";
while (index < text.length()){
builder.append(prefix);
prefix = insert;
builder.append(text.substring(index, Math.min(index + periodForSplittingText, text.length())));
index += periodForSplittingText;
}
String myString = builder.toString();
for (String line:myString.split("\n")){
myPage.getCanvas().drawPaint(backPaint);
myPage.getCanvas().drawText(line, this.x, this.y, myPaint);
y+=myPaint.descent()-myPaint.ascent();
}
myPdfDocument.finishPage(myPage);
// Started another page
myPdfDocument.startPage(myPageInfo);
myPage.getCanvas().drawText("Second Page Text", 14, 25, myPaint);
myPdfDocument.finishPage(myPage);
String myFilePath = Environment.getExternalStorageDirectory().getPath() + this.filePath + "/" + fileName;
File myFile = new File (myFilePath);
try {
myPdfDocument.writeTo(new FileOutputStream(myFile));
} catch (Exception e) {
e.printStackTrace();
}
myPdfDocument.close();
}
But if I entered the long text then it is not able to create it on two pages. According to Android Developer's
This class enables generating a PDF document from native Android content. You create a new document and then for every page you want to add you start a page, write content to the page, and finish the page. After you are done with all pages, you write the document to an output stream and close the document. After a document is closed you should not use it anymore. Note that pages are created one by one, i.e. you can have only a single page to which you are writing at any given time. This class is not thread-safe.
But I am not understanding what does it mean?
How can I solve this? Help
Edit:
Now adding this
// Started another page
myPdfDocument.startPage(myPageInfo);
myPage.getCanvas().drawText("Second Page Text", 14, 25, myPaint);
myPdfDocument.finishPage(myPage);
results this error : Attempt to invoke virtual method 'void android.graphics.Canvas.drawText(java.lang.String, float, float, android.graphics.Paint)' on a null object reference

When you start a new page you are not assigning it to a variable
Change to
// Started another page
myPage = myPdfDocument.startPage(myPageInfo);

Replacing images with same resource in PDFBox

I have a pdf containing 2 blank images. I need to replace both the images with 2 separate images using PDFBox. The problem is, both the blank images appear to have the same resource. So, if I replace one, the other one is replaced with the same image as well.
I followed this example and tried overriding the processOperator() method and replaced the images based on the imageHeight. However, it still ends up replacing both the images with the same image. This is my code thus far:
protected void processOperator( PDFOperator operator, List arguments ) throws IOException
{
String operation = operator.getOperation();
if( INVOKE_OPERATOR.equals(operation) )
{
COSName objectName = (COSName)arguments.get( 0 );
Map<String, PDXObject> xobjects = getResources().getXObjects();
PDXObject xobject = (PDXObject)xobjects.get( objectName.getName() );
if( xobject instanceof PDXObjectImage )
{
PDXObjectImage blankImage = (PDXObjectImage)xobject;
int imageWidth = blankImage.getWidth();
int imageHeight = blankImage.getHeight();
System.out.println("Image width >>> "+imageWidth+" height >>>> "+imageHeight);
// Check if it is blank image 1 based on height
if(imageHeight < 480){
File logo = new File("abc.jpg");
BufferedImage bufferedImage = ImageIO.read(logo);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write( bufferedImage, "jpg", baos );
baos.flush();
byte[] logoImageInBytes = baos.toByteArray();
baos.close();
// label will be used to replace the blank image
label = logoImageInBytes;
}
BufferedImage img = ImageIO.read(new ByteArrayInputStream(label));
BufferedImage resizedImage = Scalr.resize(img, Scalr.Method.BALANCED, Scalr.Mode.FIT_EXACT, img.getWidth(), img.getHeight());
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(resizedImage, "jpg", baos);
// Replace empty image in template with the image generated from shipping label byte array
PDXObjectImage validImage = new PDJpeg(doc, new ByteArrayInputStream(baos.toByteArray()));
blankImage.getCOSStream().replaceWithStream(validImage.getCOSStream());
}
Now, when I remove the if block which checks if (imageHeight < 480), it prints the imageHeight as 30 and 470 for the blank images. However, when I add the if block, it prints the imageHeight as 480 and 1500 and never goes inside the if block because of which both the blank images end up getting replaced by the same image.
What's going on here? I'm new to PDFBox, so I am unsure if my code is correct.

While first thinking about a generic way to actually replace the existing Image by the new Images, I agree with #TilmanHausherr that a more simple solution would be to simply add an extra content stream with two images in the size / position you need covering the existing Image.
This approach is easier to implement (even generically) and less error-prone than actual replacement.
In a generic solution we do not have the Image positions beforehand. To determine them, we can use this helper class (which essentially is a rip-off of the PDFBox example PrintImageLocations):
public class ImageLocator extends PDFStreamEngine
{
private static final String INVOKE_OPERATOR = "Do";
public ImageLocator() throws IOException
{
super(ResourceLoader.loadProperties("org/apache/pdfbox/resources/PDFTextStripper.properties", true));
}
public List<ImageLocation> getLocations()
{
return new ArrayList<ImageLocation>(locations);
}
protected void processOperator(PDFOperator operator, List<COSBase> arguments) throws IOException
{
String operation = operator.getOperation();
if (INVOKE_OPERATOR.equals(operation))
{
COSName objectName = (COSName) arguments.get(0);
Map<String, PDXObject> xobjects = getResources().getXObjects();
PDXObject xobject = (PDXObject) xobjects.get(objectName.getName());
if (xobject instanceof PDXObjectImage)
{
PDXObjectImage image = (PDXObjectImage) xobject;
PDPage page = getCurrentPage();
Matrix matrix = getGraphicsState().getCurrentTransformationMatrix();
locations.add(new ImageLocation(page, matrix, image));
}
else if (xobject instanceof PDXObjectForm)
{
// save the graphics state
getGraphicsStack().push((PDGraphicsState) getGraphicsState().clone());
PDPage page = getCurrentPage();
PDXObjectForm form = (PDXObjectForm) xobject;
COSStream invoke = (COSStream) form.getCOSObject();
PDResources pdResources = form.getResources();
if (pdResources == null)
{
pdResources = page.findResources();
}
// if there is an optional form matrix, we have to
// map the form space to the user space
Matrix matrix = form.getMatrix();
if (matrix != null)
{
Matrix xobjectCTM = matrix.multiply(getGraphicsState().getCurrentTransformationMatrix());
getGraphicsState().setCurrentTransformationMatrix(xobjectCTM);
}
processSubStream(page, pdResources, invoke);
// restore the graphics state
setGraphicsState((PDGraphicsState) getGraphicsStack().pop());
}
}
else
{
super.processOperator(operator, arguments);
}
}
public class ImageLocation
{
public ImageLocation(PDPage page, Matrix matrix, PDXObjectImage image)
{
this.page = page;
this.matrix = matrix;
this.image = image;
}
public PDPage getPage()
{
return page;
}
public Matrix getMatrix()
{
return matrix;
}
public PDXObjectImage getImage()
{
return image;
}
final PDPage page;
final Matrix matrix;
final PDXObjectImage image;
}
final List<ImageLocation> locations = new ArrayList<ImageLocation>();
}
(ImageLocator.java)
In contrast to the example class this helper stores the locations in a list instead of printing them.
We now can cover existing images using code like this:
try ( InputStream resource = getClass().getResourceAsStream("sample.pdf");
InputStream left = getClass().getResourceAsStream("left.png");
InputStream right = getClass().getResourceAsStream("right.png");
PDDocument document = PDDocument.load(resource) )
{
if (document.isEncrypted())
{
document.decrypt("");
}
PDJpeg leftImage = new PDJpeg(document, ImageIO.read(left));
PDJpeg rightImage = new PDJpeg(document, ImageIO.read(right));
// Locate images
ImageLocator locator = new ImageLocator();
List<?> allPages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < allPages.size(); i++)
{
PDPage page = (PDPage) allPages.get(i);
locator.processStream(page, page.findResources(), page.getContents().getStream());
}
// cover images
for (ImageLocation location : locator.getLocations())
{
// Decide on a replacement
PDRectangle cropBox = location.getPage().findCropBox();
float center = (cropBox.getLowerLeftX() + cropBox.getUpperRightX()) / 2.0f;
PDJpeg image = location.getMatrix().getXPosition() < center ? leftImage : rightImage;
AffineTransform transform = location.getMatrix().createAffineTransform();
PDPageContentStream content = new PDPageContentStream(document, location.getPage(), true, false, true);
content.drawXObject(image, transform);
content.close();
}
document.save(new File(RESULT_FOLDER, "sample-changed.pdf"));
}
(OverwriteImage)
This sample covers all images on the left half of their respective page with left.png and all others with right.png.

I have no implementation or example, but I want to illustrate you a possible way to do what you want by the following steps:
Since you need 2 Images (lets tell them imageA and imageB) in the pdf instead of 1 (which is the blank one). You have to add both of them to the pdf.
save the file temporary - optional, it could work without rewriting the pdf
reopen the file - optional, if you don't need step 2, you also don't need this step
Then replace the blank image with imageA or imageB
Remove the blank image from the pdf
Save the pdf

How do you access an attachment stored as MIME Part?

It seems to me there are two ways to store an attachment in a NotesDocument.
Either as a RichTextField or as a "MIME Part".
If they are stored as RichText you can do stuff like:
document.getAttachment(fileName)
That does not seem to work for an attachment stored as a MIME Part. See screenshot
I have thousands of documents like this in the backend. This is NOT a UI issue where I need to use the file Download control of XPages.
Each document as only 1 attachment. An Image. A JPG file. I have 3 databases for different sizes. Original, Large, and Small. Originally I created everything from documents that had the attachment stored as RichText. But my code saved them as MIME Part. that's just what it did. Not really my intent.
What happened is I lost some of my "Small" pictures so I need to rebuild them from the Original pictures that are now stored as MIME Part. So my ultimate goal is to get it from the NotesDocument into a Java Buffered Image.
I think I have the code to do what I want but I just "simply" can't figure out how to get the attachment off the document and then into a Java Buffered Image.
Below is some rough code I'm working with. My goal is to pass in the document with the original picture. I already have the fileName because I stored that out in metaData. But I don't know how to get that from the document itself. And I'm passing in "Small" to create the Small image.
I think I just don't know how to work with attachments stored in this manner.
Any ideas/advice would be appreciated! Thanks!!!
public Document processImage(Document inputDoc, String fileName, String size) throws IOException {
// fileName is the name of the attachment on the document
// The goal is to return a NEW BLANK document with the image on it
// The Calling code can then deal with keys and meta data.
// size is "Original", "Large" or "Small"
System.out.println("Processing Image, Size = " + size);
//System.out.println("Filename = " + fileName);
boolean result = false;
Session session = Factory.getSession();
Database db = session.getCurrentDatabase();
session.setConvertMime(true);
BufferedImage img;
BufferedImage convertedImage = null; // the output image
EmbeddedObject image = null;
InputStream imageStream = null;
int currentSize = 0;
int newWidth = 0;
String currentName = "";
try {
// Get the Embedded Object
image = inputDoc.getAttachment(fileName);
System.out.println("Input Form : " + inputDoc.getItemValueString("form"));
if (null == image) {
System.out.println("ALERT - IMAGE IS NULL");
}
currentSize = image.getFileSize();
currentName = image.getName();
// Get a Stream of the Imahe
imageStream = image.getInputStream();
img = ImageIO.read(imageStream); // this is the buffered image we'll work with
imageStream.close();
Document newDoc = db.createDocument();
// Remember this is a BLANK document. The calling code needs to set the form
if ("original".equalsIgnoreCase(size)) {
this.attachImage(newDoc, img, fileName, "JPG");
return newDoc;
}
if ("Large".equalsIgnoreCase(size)) {
// Now we need to convert the LARGE image
// We're assuming FIXED HEIGHT of 600px
newWidth = this.getNewWidth(img.getHeight(), img.getWidth(), 600);
convertedImage = this.getScaledInstance(img, newWidth, 600, false);
this.attachImage(newDoc, img, fileName, "JPG");
return newDoc;
}
if ("Small".equalsIgnoreCase(size)) {
System.out.println("converting Small");
newWidth = this.getNewWidth(img.getHeight(), img.getWidth(), 240);
convertedImage = this.getScaledInstance(img, newWidth, 240, false);
this.attachImage(newDoc, img, fileName, "JPG");
System.out.println("End Converting Small");
return newDoc;
}
return newDoc;
} catch (Exception e) {
// HANDLE EXCEPTION HERE
// SAMLPLE WRITE TO LOG.NSF
System.out.println("****************");
System.out.println("EXCEPTION IN processImage()");
System.out.println("****************");
System.out.println("picName: " + fileName);
e.printStackTrace();
return null;
} finally {
if (null != imageStream) {
imageStream.close();
}
if (null != image) {
LibraryUtils.incinerate(image);
}
}
}

I believe it will be some variation of the following code snippet. You might have to change which mimeentity has the content so it might be in the parent or another child depending.
Stream stream = session.createStream();
doc.getMIMEEntity().getFirstChildEntity().getContentAsBytes(stream);
ByteArrayInputStream bais = new ByteArrayInputStream(stream.read());
return ImageIO.read(bais);
EDIT:
session.setConvertMime(false);
Stream stream = session.createStream();
Item itm = doc.getFirstItem("ParentEntity");
MIMEEntity me = itm.getMIMEEntity();
MIMEEntity childEntity = me.getFirstChildEntity();
childEntity.getContentAsBytes(stream);
ByteArrayOutputStream bo = new ByteArrayOutputStream();
stream.getContents(bo);
byte[] mybytearray = bo.toByteArray();
ByteArrayInputStream bais = new ByteArrayInputStream(mybytearray);
return ImageIO.read(bais);

David have a look at DominoDocument,http://public.dhe.ibm.com/software/dw/lotus/Domino-Designer/JavaDocs/XPagesExtAPI/8.5.2/com/ibm/xsp/model/domino/wrapped/DominoDocument.html
There you can wrap every Notes document
In the DominoDocument, there such as DominoDocument.AttachmentValueHolder where you can access the attachments.
I have explained it at Engage. It very powerful
http://www.slideshare.net/flinden68/engage-use-notes-objects-in-memory-and-other-useful-java-tips-for-x-pages-development

iText Flying Saucer pdf headers and ignoring html

We use xhtml to pdf with good success, but a new requirement came up to have headers and page count on every page. We are using newset release of Flying Saucer.
I followed example here: http://today.java.net/pub/a/today/2007/06/26/generating-pdfs-with-flying-saucer-and-itext.html#page-specific-features
...but this would not work. The header would be top left on first page.
If I use the r7 version, headers and page numbering works perfectly, but none of the passed in html is rendered, whilst in r8 the headers\ page numbers are ignored, but the html is rendered perfectly. xHTML used for tests is copied from url above.
I know I must be missing something very simple, if anyone has any ideas\ comments, I would be very grateful to hear.

I think they changed this functionality in r8.... try this method instead:
https://gist.github.com/626264

We use the same method and everything works perfectly, I have however decided not to use flying-saucer's built in headers/footers and use a PdfStamper to add them after the PDF is generated, it works quite well, here is an example.
public void modifyPdf(PdfStamper stamper) {
this.reader = stamper.getReader();
PdfContentByte under = null;
PdfPTable header = null;
PdfPTable footer = null;
final int total = this.reader.getNumberOfPages();
for (int page = 1; page <= total; page++) {
under = stamper.getUnderContent(page);
final PdfDocument doc = under.getPdfDocument();
final Rectangle rect = this.reader.getPageSizeWithRotation(page);
header = ... //build your header
footer = ... // build your footer
final float x = 0;
//write header to PDF
if (header != null) {
float y = (rect.getTop() - 0);
header.writeSelectedRows(0, -1, x, y, under);
}
//write footer to PDF
if (footer != null) {
float y = (rect.getBottom() + 20);
footer.writeSelectedRows(0, -1, x, y, under);
}
}
}
you can build your stamper like this:
final PdfReader reader = new PdfReader(/*your pdf file*/);
final PdfStamper stamper = new PdfStamper(reader, /* output */);
Hope you find this helpful.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Retrieve the page number of an image in pdf- IText - java

Related

How to use iText to parse paths (such as lines in the document)

I am not able to create two pages of pdf using PdfDocument of Android

Replacing images with same resource in PDFBox

How do you access an attachment stored as MIME Part?

iText Flying Saucer pdf headers and ignoring html

Categories

Resources