automating filing out pdfs with iText and java

automating filing out pdfs with iText and java - java

I'm looking to automate filling out a pdf contract by putting variable text in predefined locations(such as a date or dollar value). I've been trying to wrap my head around iText as a solution but am having trouble actually stamping text onto a pdf. I would really appreciate an example snipped that simply stamps a piece of text on a specific (x,y) coordinate of a pdf file.
If there is a better solution than iText to my problem I would also love to hear other possible solutions.

Related

How to replace a variable in a pptx-slide with an image with PPTX4J?

I need to replace a variable ${var} with an image in a pptx slide. Also, this image has to be in a specific size (idk which yet). I think inserting the image in a given position in the slide would do the same for me, but the documentation for pptx4j (which gives an example of how to insert a picture in a slide) couldn't really help me.
Many thanks in advance! :-)

It is relatively hard to program such image manipulation for PowerPoint files.
Maybe SlideMight would be a suitable solution for your use case:
This utiltiy lets you merge data with PowerPoint templates; both text and images, in slides and tables. The usage is in principle similar to mail merge, with some more advanced stuff.
See www.slidemight.com.
Disclaimer: I am the developer and seller of SlideMight.

Retrieving text position (as px/in/cm coordinates) in a Word document

I'm using Apache POI to search Word files (doc and docx) to find specified paragraphs and tables. Using various Q/A's from SO and the API, so far this works fine.
What I'd like to do next is to convert the Word file into an image and highlight my search results by drawing boxes around my found paragraphs/tables.
I already wrote the parts, where I draw boxes around text on a PDF and convert those to images (using PDFBox), and I read that Tika will be able to print my Word doc to a PDF.
But I'm totally clueless as how to retrieve the position of my text paragraphs/tables. I've searched the API, but the closest I was able to find was the character position in a paragraph (as in the "i-th character in the paragraph"), which tells me nothing right now about where I'm supposed to start/stop drawing my box.
My "Plan-B" would be to "empty print" all the paragraphs I'm not interested in and only "visibly print" my found part into a PDF and retrieve the coordinates there. But I'd really like to avoid that, since I'm afraid there will be other complications to retrieve the exact position if I change the text appearances.
I don't want to draw the box (or otherwise highlight) the text directly in the word doc, since I'm planning to port the presentation part to a web application (and draw the boxes with <div> or something).
Does anyone have an idea how I can proceed on this or know some place I might find a hint or solution?

How to move block of text in a PDF (using PDFBox)

I'm currently trying to generate PDF with PDFBox for some manual cover and I was wondering if it was possible to take a precise zone of text in my PDF and move it (to the left) depending on my manuel thickness (which will be determined by the number of pages my manual will have)
I manage to create my PDF just fine, but I did not find a way to get only a block of text.
Is it possible to do so with PDFBox?
Note : I tried to search on the web and on other questions, but none of them were useful.
Thanks

Wrap saveGraphicsState() and restoreGraphicsState() around that block. Within that, use moveTextPositionByAmount() (after beginText() !) or setTextTranslation() or (more general) concatenate2CTM(1, 0, 0, 1, tx, ty) to move the position.

Like i said in a comment, we decided to create a new PDF with the informations needed instead of trying to edit an existing one.
We tried to edit the PDF but in vein and when we decided to start from scratch, it was way more easier to do so.
Thanks to the people who tried to help me !

how read pdf using itext and java and get table cell height

First I have created a pdf using itext and java and put a table and tableCell
PdfPTable table = new PdfPTable(2);
table.setWidths(new int[]{1, 2});
PdfPCell cell;
table.addCell("Address:");
cell = new PdfPCell(new Phrase(""));
cell.setFixedHeight(60);
table.addCell(cell);
I have another Program Which read this pdf File
PdfReader reader = new PdfReader("path_of_previously_created_pdf");
Now i want to get TableCell cell and want to Change cell height cell.setFixedHeight(new_Fixed_Height);
It is possible... if Yes .
How??
Thanx in advance

If your PDF contains just that simple 1x2 table, it of course would be possible to implement something that gives you the PDF with a cell hight of you choice.
But I assume it eventually is meant to contain more. Already the code you provided via your google drive included more (more table cells plus form elements), and that code, too, does look unfinished concerning the PDF construction. Thus,...
The direct answer
It is not possible.
First of all the table and cell objects you have while creating the PDF are not present as such in the resulting file, they merely are drawn as a number of lines and some text (or whatever you put into the cells).
Thus, you cannot even retrieve the cells you want to change, let alone change it.
The twisted answer
You could, of course, try and parse the page content stream to find the commands for drawing lines, find those ones among them which were drawn for the cell you are interested in, and try to derive the original cell dimension attributes from the line coordinates. Afterwards you can attempt to move everything below the cell down to create the extra space you want.
Depending on the information you have (Do you know the approximate position of the cell? If not, do you at least know some unique content of it?) reading the current cell height will include some guesswork and much coding because unfortunately the iText parser framework does not yet support parsing path operations.
Essentially you have to enhance the classes in the PDF parser package to also process and emit events for PDF path operators (if you know your way around in iText and the PDF specification that should not take more than a week or two) and create an appropriate event listener to find the lines surrounding the cell position you already know (not more than one day of work). Some iText code analysis will show how the fixed cell height and the distance of the surrounding lines relate.
Most likely, though, this is the smaller part of your work. The bigger part is actually manipulating the page content:
If you are lucky, all your page content is located in a single content stream. In that case you merely have to analyse all the page content again but this time to actually change it. The easiest way would be to enhance the classes in the parser package once again (because they already do much of the necessary math and book-keeping) to signal every command from the content stream with normalized coordinates (this might take a week or two). Based on this information signaled to you built an all new content stream in which you leave everything above your cell, move down everything below, and stretch everything crossing the line on which the bottom border of your cell lies (another week maybe).
If you are less lucky you have to fight with multiple included form xobjects crossing the line. As those xobjects may be used from other streams also, you cannot change them but have to either change a copy or include the xobject content in your newly created stream.
Then what about images crossing the line? or interesting patterns? In that case stretching the cell will utterly distort everything.
And then there are annotations, e.g. your form fields. You need to shift and stretch them, too.
Thus, while this approach is possible to follow, please be aware that (depending on how generic the solution has to become) its implementation will take someone knowing iText and PDF some months.
An alternative approach
You say in a comment
I am working on Pdf Form.I have created itext form using TextField(MULTILINE TEXT) once. After read this pdf and fill up the form but when the content increases it shows scroll Bar and content hide. My problem is Once i print the pdf it did't print hide content.
Why don't you simply for each set of data build an individual PDF with all the cells big enough for the form contents of the respective data set and copy the field values into this new PDF. This is a fairly simple approach, yet flexible enough to not waste too much space but at the same time not hide content.

Alternative to latex / a way to typeset good looking documents from Java to PDF

I'm working on application in Java that will maintain database of song lyrics in plain text and print out some songbooks/chordbooks(that is create PDF file from selected songs). I was planing that the Java application will generate source code for pdflatex and after compiling this source user will get PDF file.
Lately I've run into a lot of problems because of latex limitation: fixed memory size (some pictures will also be drawn to PDF) - error when exceeded, no way to query end of line or and of page dynamically, it's very hard to override latex placement algorithm in a complex way,... see also some my other questions regarding latex. I come to conclusion that latex is not good option for automated PDF generation.
So I need replacement. I need to be able to typeset:
Chords over lyrics when the lyrics are in variable char width so I need to be able to measure text width
Chord diagrams that means I'll have to draw quite complex pictures
Each song on separate double page
Different fonts etc.
Thanks for all answers

Here are some PDF open source APIs
http://java-source.net/open-source/pdf-libraries
This has been asked many time, You might want to look at this post

IText is a free library which offers lots of capabilities for creating PDFs programmatically.

Rather than try to manage/calculate the complexities of the desired layout, you could try Docmosis. It will let you layout a document as a template using doc or odt formats. This means if you could make a doc or odt look like you want, you can turn it into a template and get Docmosis to render it as a PDF. Text and images can be placed inside or outside tables which makes layout fairly easy to manage.

ConTeXt is another TeX system, but it is easier to control the layout than with LaTeX. For drawing you could use PGF/TikZ or MetaPost. Support for both is available in ConTeXt. With ConTeXt's built in Lua scripting you could draw the chords automatically, assuming you have them stored in some sort of data structure.

why not just use lilypond with latex? it's meant for typesetting music.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.