I know there are many suggestions how to resolve problems with edition existing PDF, but among all of those, I couldn't find a solution for my problem.
I need to add information about file acceptance ("Document accepted by Tom Smith, 2020-01-01" - possible multiple acceptations) to the last page of the PDF. I need to determine if page is filled or is there enough space for my text.
I wanted to find position (y) of the last element on the last page of the pdf to check it against page size. If the page is full I'm going to add a new page and then add new text.
I have no idea how to resolve this. I tried looking for answers with iText and PDFBOX, but there are no satisfying resolutions.
Raster Image based approach:
Render the last page of the pdf to a bitmap image with any library you are comfortable with (Ghostscript?). 72 dpi should be enough for your purpose.
Then you can use any image processing library like openCV and check rectangular areas starting from the bottom up, if pixel exist. openCV is very fast with the CountNonZero() function.
You can also find any large white zone that is anywhere in the Image, not just at the bottom. This link could be your starting point.
https://answers.opencv.org/question/72939/how-to-find-biggest-white-zone-in-an-scanned-image/
Related
I have a requirement where I need to clip some rectangular part of OCRed pdf (Initially PDF was Scanned so we have perform OCR) into image.
I was not able to find any library which can achieve this. So I have splitted into two parts.
1. Clip Rectangular part from PDF using iText. The result will be in PDF.
2. Convert clipped PDF into images using pdfBox.
But in the process of converting clipped PDF into images using pdfBox the result is not as expected. As for eg we are not able to get checkbox in JPEG image if the clipped pdf contain only checkbox.
I have searched in StackOverflow for all the possible solution but with no success.
My code is same as the solution provided by Tilman Hausherr here. Ihave also tried this
Is there any direct way to achivve the above two steps in one or get some better way to convert pdf to image.
Please don't mark it as duplicate as I am not able to get the solution even after too many search.
I am generating PDF report using iTextPDF for Selenium WebDriver scripts developed in TestNG.
The report would contain text block (String) and images. Images always contain a text block before it.
The issue I am facing is that while creating the document, the text block and image blocks are getting displayed in the wrong order of occurrence in the test case. I believe this is because the image to be inserted has size greater than the PDF page.
Consider a scenario where the order of occurrence in the test is as follows
Text Block1
Image1
Text Block2
Text Block3
Image2
'Text Block4'
But the PDF shows as
Text Block1
Image1
Text Block2
Text Block3
Text Block4
Image2
My code is not wrong. I have triple checked it.
No, I cannot post the code because it is huge (>500 lines) and is in my company system.
I want to know if we can create a PDF page and then change its size dynamically when I encounter that the image to be inserted is large.
Your code is not wrong. When an image doesn't fit and there's text that follows the image, adding the image is postponed. You can change this behavior by using the following line:
writer.setStrictImageSequence(true);
In this case writer is your PdfWriter instance.
This solves one problem: the sequence of the text and images will now be correct. However, due to the image size, you will end up with plenty of white space in your document because images that don't fit will trigger a new page.
You could try to solve this by changing the page size. This involves using the setPageSize() method as explained in my answer to this question: iText create document with unequal page sizes
If you want to match page sizes to image sizes, take a look at my answer to this question: Add multiple images into a single pdf file with iText using java
The Image class extends Rectangle and we can use an Image object as a parameter when we create a Document instance, or we can use an Image object when we change the page size:
document.setPageSize(img);
document.newPage();
Important: when you change the page size, the new size will only go into effect on the next page. You can't change the size of the current page (it has already been initialized and changing it after initialization might screw up the content that was already added).
Also: it isn't sufficient for you to change the page size to the size of the image because you're also adding text. You could use ColumnText in simulation mode to find out how much space you need for the text, and then use ColumnText once more to add the text for real after you've created a page with a size that accommodates for the text and the image.
See Can I tell iText how to clip text to fit in a cell and look for the getYLine() method.
I wonder if it wouldn't be easier for you to scale down the images so that they fit the page... Of course: if the size of the images can vary, you'd have the risk that large images would become illegible.
P.S. All the answers I refer to are also available in the free ebook The Best iText Questions on StackOverflow. I bundled hundreds of my answers in this book so that I could easily search for already answered questions when answering a new question.
I'm using Apache POI to search Word files (doc and docx) to find specified paragraphs and tables. Using various Q/A's from SO and the API, so far this works fine.
What I'd like to do next is to convert the Word file into an image and highlight my search results by drawing boxes around my found paragraphs/tables.
I already wrote the parts, where I draw boxes around text on a PDF and convert those to images (using PDFBox), and I read that Tika will be able to print my Word doc to a PDF.
But I'm totally clueless as how to retrieve the position of my text paragraphs/tables. I've searched the API, but the closest I was able to find was the character position in a paragraph (as in the "i-th character in the paragraph"), which tells me nothing right now about where I'm supposed to start/stop drawing my box.
My "Plan-B" would be to "empty print" all the paragraphs I'm not interested in and only "visibly print" my found part into a PDF and retrieve the coordinates there. But I'd really like to avoid that, since I'm afraid there will be other complications to retrieve the exact position if I change the text appearances.
I don't want to draw the box (or otherwise highlight) the text directly in the word doc, since I'm planning to port the presentation part to a web application (and draw the boxes with <div> or something).
Does anyone have an idea how I can proceed on this or know some place I might find a hint or solution?
I got a peculiar problem. I am business analyst working on a dash-boarding product based on java applet. I don't have access to any elements in the java applet it's like a black box.
I am looking to print a section of page without using html elements (since I don't have access to code). Ideal case would be using something like window.print() that takes Start (X,Y) and End (X,Y) coordinates .Then in the dialog I can select pdf995 option to print as pdf.
For e.g. (23,45) to (93,100) pixels should print the area within that range. Instead of pixels, percentage should be good as well.
Any help is greatly appreciated. Thanks guys.
Maybe this will help. How to get the X,Y coordinates of a point in a PDF
On that question you should have everything you need.
You can use java.awt.Robot ; java.awt.Image to take the ScreenShot and write a Image File (bmp or jpeg, or what you like); next, you can use Jasper Report to create a PDF and view the file report. Jasper its a powerfull library to create reports, website here
You can find some details to how take screen shots here. Its also possible to take screen shots with cordinates. You can see some examples here
I have a PDF page that Logo, 4 address lines and content. I want to realign the 4 address line to the left about 3 - 4 inches, but keep everything else the same. Is it possible to do this using iText java version?
As far as I know, thats going to be a hard one. Thats not a typical iText-mission. Good for creating PDFs from scratch, not so good at reading or editing. (You can easily add content though).