I would like to know if it is possible, and how, one can iterate through the PDF input file content using iText in JAVA and find all the PATH elements (the lines that one can draw in a PDF) and if there exists one or more, change their properties, such as changing the stroke width or changing the overprint properties.
I don't really have any code examples or something to start with and I am hoping for some direction and examples will be provided to follow.
I have looked into many code examples here https://www.demo2s.com/java/itext-pdf-folder-section.html, but I cannot find anything in regard to my needs.
Related
I know there are many suggestions how to resolve problems with edition existing PDF, but among all of those, I couldn't find a solution for my problem.
I need to add information about file acceptance ("Document accepted by Tom Smith, 2020-01-01" - possible multiple acceptations) to the last page of the PDF. I need to determine if page is filled or is there enough space for my text.
I wanted to find position (y) of the last element on the last page of the pdf to check it against page size. If the page is full I'm going to add a new page and then add new text.
I have no idea how to resolve this. I tried looking for answers with iText and PDFBOX, but there are no satisfying resolutions.
Raster Image based approach:
Render the last page of the pdf to a bitmap image with any library you are comfortable with (Ghostscript?). 72 dpi should be enough for your purpose.
Then you can use any image processing library like openCV and check rectangular areas starting from the bottom up, if pixel exist. openCV is very fast with the CountNonZero() function.
You can also find any large white zone that is anywhere in the Image, not just at the bottom. This link could be your starting point.
https://answers.opencv.org/question/72939/how-to-find-biggest-white-zone-in-an-scanned-image/
I'm using Apache POI to search Word files (doc and docx) to find specified paragraphs and tables. Using various Q/A's from SO and the API, so far this works fine.
What I'd like to do next is to convert the Word file into an image and highlight my search results by drawing boxes around my found paragraphs/tables.
I already wrote the parts, where I draw boxes around text on a PDF and convert those to images (using PDFBox), and I read that Tika will be able to print my Word doc to a PDF.
But I'm totally clueless as how to retrieve the position of my text paragraphs/tables. I've searched the API, but the closest I was able to find was the character position in a paragraph (as in the "i-th character in the paragraph"), which tells me nothing right now about where I'm supposed to start/stop drawing my box.
My "Plan-B" would be to "empty print" all the paragraphs I'm not interested in and only "visibly print" my found part into a PDF and retrieve the coordinates there. But I'd really like to avoid that, since I'm afraid there will be other complications to retrieve the exact position if I change the text appearances.
I don't want to draw the box (or otherwise highlight) the text directly in the word doc, since I'm planning to port the presentation part to a web application (and draw the boxes with <div> or something).
Does anyone have an idea how I can proceed on this or know some place I might find a hint or solution?
I'm currently trying to generate PDF with PDFBox for some manual cover and I was wondering if it was possible to take a precise zone of text in my PDF and move it (to the left) depending on my manuel thickness (which will be determined by the number of pages my manual will have)
I manage to create my PDF just fine, but I did not find a way to get only a block of text.
Is it possible to do so with PDFBox?
Note : I tried to search on the web and on other questions, but none of them were useful.
Thanks
Wrap saveGraphicsState() and restoreGraphicsState() around that block. Within that, use moveTextPositionByAmount() (after beginText() !) or setTextTranslation() or (more general) concatenate2CTM(1, 0, 0, 1, tx, ty) to move the position.
Like i said in a comment, we decided to create a new PDF with the informations needed instead of trying to edit an existing one.
We tried to edit the PDF but in vein and when we decided to start from scratch, it was way more easier to do so.
Thanks to the people who tried to help me !
First I have created a pdf using itext and java and put a table and tableCell
PdfPTable table = new PdfPTable(2);
table.setWidths(new int[]{1, 2});
PdfPCell cell;
table.addCell("Address:");
cell = new PdfPCell(new Phrase(""));
cell.setFixedHeight(60);
table.addCell(cell);
I have another Program Which read this pdf File
PdfReader reader = new PdfReader("path_of_previously_created_pdf");
Now i want to get TableCell cell and want to Change cell height cell.setFixedHeight(new_Fixed_Height);
It is possible... if Yes .
How??
Thanx in advance
If your PDF contains just that simple 1x2 table, it of course would be possible to implement something that gives you the PDF with a cell hight of you choice.
But I assume it eventually is meant to contain more. Already the code you provided via your google drive included more (more table cells plus form elements), and that code, too, does look unfinished concerning the PDF construction. Thus,...
The direct answer
It is not possible.
First of all the table and cell objects you have while creating the PDF are not present as such in the resulting file, they merely are drawn as a number of lines and some text (or whatever you put into the cells).
Thus, you cannot even retrieve the cells you want to change, let alone change it.
The twisted answer
You could, of course, try and parse the page content stream to find the commands for drawing lines, find those ones among them which were drawn for the cell you are interested in, and try to derive the original cell dimension attributes from the line coordinates. Afterwards you can attempt to move everything below the cell down to create the extra space you want.
Depending on the information you have (Do you know the approximate position of the cell? If not, do you at least know some unique content of it?) reading the current cell height will include some guesswork and much coding because unfortunately the iText parser framework does not yet support parsing path operations.
Essentially you have to enhance the classes in the PDF parser package to also process and emit events for PDF path operators (if you know your way around in iText and the PDF specification that should not take more than a week or two) and create an appropriate event listener to find the lines surrounding the cell position you already know (not more than one day of work). Some iText code analysis will show how the fixed cell height and the distance of the surrounding lines relate.
Most likely, though, this is the smaller part of your work. The bigger part is actually manipulating the page content:
If you are lucky, all your page content is located in a single content stream. In that case you merely have to analyse all the page content again but this time to actually change it. The easiest way would be to enhance the classes in the parser package once again (because they already do much of the necessary math and book-keeping) to signal every command from the content stream with normalized coordinates (this might take a week or two). Based on this information signaled to you built an all new content stream in which you leave everything above your cell, move down everything below, and stretch everything crossing the line on which the bottom border of your cell lies (another week maybe).
If you are less lucky you have to fight with multiple included form xobjects crossing the line. As those xobjects may be used from other streams also, you cannot change them but have to either change a copy or include the xobject content in your newly created stream.
Then what about images crossing the line? or interesting patterns? In that case stretching the cell will utterly distort everything.
And then there are annotations, e.g. your form fields. You need to shift and stretch them, too.
Thus, while this approach is possible to follow, please be aware that (depending on how generic the solution has to become) its implementation will take someone knowing iText and PDF some months.
An alternative approach
You say in a comment
I am working on Pdf Form.I have created itext form using TextField(MULTILINE TEXT) once. After read this pdf and fill up the form but when the content increases it shows scroll Bar and content hide. My problem is Once i print the pdf it did't print hide content.
Why don't you simply for each set of data build an individual PDF with all the cells big enough for the form contents of the respective data set and copy the field values into this new PDF. This is a fairly simple approach, yet flexible enough to not waste too much space but at the same time not hide content.
I am trying to solve the problem of template matching, wherein we have been given a pattern image I have to find whether that image exists as a sub-image inside an image i.e. source/search image.
I have implemented SAD [Sum of Absolute Difference Solution using JAVA give in Wikipedia using some ].
Now, I want to implement a scale-invariant template matching solution for the given problem.
Where I don't care which scale is given for a pattern image, if there is a match at a certain scale then it should report it.
Things I have tried so far:
1. I have seen SIFT but it is too complicated to implement it.
Also, Image Fingerprinting used by TinEye is not that effective as it is useful in finding similar image, not sub-image.
I can not use exhaustive approach which I used earlier as there could be many scales at which there is a match.
CUSTOM APPROACH: Classify the image in several color groups e.g. 255 colors but I am not sure that would help, in scale - in-variance problem.
I saw some questions, but they are rotation and scale invariant, I just need scale-in-variance.
It is one of the semester project, on which I am currently working on.
Can someone guide me about possible solutions and approaches that I can take?