How to (vertically) align text of PDTextField in PDFBox? - java

I have writen a form with LibreOffice that includes some named fields to fill out via PdfBox. The fields are properly set up with text-alignment (all left-centered) but PdfBox simply ignores the vertical aligment. Both on text fields and on labels. The fields get aligned bottom and the labels top. Besides the optical difference in presentation, the values of the fields also get cut on lower letters like "g" etc.
I would provide a image but i don`t have enough reputation...
Provided that I cannot be the only one with this problem, I`ve found this answer here on SO How to (horizontally) align text of PDTextField in PDFBox?.
There is a function mentioned that provides this function for horizontal alignment
textBox.setQ(PDTextField.QUADDING_CENTERED);
I`ve tested this and it works.
The overall call in my programm is rather simple:
PDDocumentCatalog docCatalog = _pdf.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDTextField field = (PDTextField) acroForm.getField(name);
if (field != null)
{
field.setQ(PDTextField.QUADDING_LEFT);
field.setValue(value);
}
else
{
LOGGER.log(Level.ERROR, "No field found with name:" + name);
}
Unfortunately (or out of sheer stupidity on my end; I blame the temperatures) I didn`t find any way to force PdfBox to any aligment on the vertical axis.
Any help would be much appreciated. Thanks.
EDIT:
As per request here is an example (not the real version because it is a document from work - nonetheless it is the same construction as the original):
https://drive.google.com/file/d/1JKyuFqkb9B8Q9M61uxncAJXzMOo7zvJ5/view?usp=sharing
The field is named err_desc.
Also I provides in the pdf doc a picture of the result in the original filled form.
I've also tried to simply make the vertical space of the text field bigger. That unfortunately didn't work as the text seems to stick to the bottom and simply left more space above the text.

Related

Placeholders for a text in a pdf Java-PDFBox?

Can we make placeholders for a text in a pdf and mark them with an id (similar to html tags) and just fill that placeholder with our text, of whichever length in Java, using PdfBox?
Can we make placeholders for a text in a pdf and mark them with an id (similar to html tags) and just fill that placeholder with our text, of whichever length
No, at least not without a great deal of coding around it.
The reason is that PDF is a format for documents with a finished layout.
If you fill that placeholder with your text, of whichever length, in particular in case of filling with a long text, the contents of the document would have to be re-flowed, text following the placeholder would have to be moved down, text already at the bottom of the page body would have to be moved to the next page, etc.
As PDF documents in general don't contain information on stuff like margins, text alignments, etc., that task is severely non-trivial.
(There also are other issues, e.g. embedded font subsets without the glyphs of your replacement text or backgrounds or borders without linkage to the "backgrounded" or "bordered" text.)
I'm not aware of an automatized general purpose implementation of that task, in particular not in free PDF libraries.

Different position of text by flatten pdf with iText

I have a problem with iText and flatten form fields in pdfs.
I submit a pdf with form fields created in Acrobat to my java method. On a website i have created a form to fill the form fields in the pdf. The form fields are filled correctly, but as soon as i flatten the document the text is moved to a little different position. The biggest difference is seen in multiline form fields. There the text is at the border of the field in the upper left. In Acrobat and before flattening the text has a padding to the top.
Here is my java code to call the methods of iText:
PdfReader template = new PdfReader(templ);
XfdfReader xfdfReader = new XfdfReader(xfdf);
OutputStream outputStream = new FileOutputStream(output);
PdfStamper stamper = new PdfStamper(template, outputStream, '\0');
AcroFields form = stamper.getAcroFields();
Set<String> fields = form.getFields().keySet();
form.setFields(xfdfReader);
stamper.setFreeTextFlattening(true);
stamper.setFormFlattening(true);
stamper.close();
template.close();
Anyone has an idea why the text is moving when i flatten the pdf? How can I avoid this?
I allready tried different versions from iText. From version 4.X to 5.X. The difference appears in all versions.
Although i tried to move the form fields in the code of iText, but then the whole field is moving and the difference of the position is much bigger and not predictable.
In my project the text must be at the exact same position as in Acrobat, so i must find a workaround for this misbehavior. I hope somebody can help me.
Thanks for your help in advance.
The position of the baseline of a field in a PDF file has been changed over the years. You'll even see differences depending on the version of Acrobat you are using.
There is no solution for your problem unless you know the exact offset. If you do, you can use the setExtraMargin() method to change the offset of all fields when flattening the document.
We created this method to deal with specific forms that have a baseline that is different from what is to be expected. Which values you choose can be different for different forms.

Find invisible text in iText

I am creating a PDF document of multiple pages using iText. I am adding some unique text on one of the pages in the middle of this document but making it invisible as-
Chunk chunk = new Chunk("invisible text here");
chunk.setTextRenderMode(PdfContentByte.TEXT_RENDER_MODE_INVISIBLE, 0f, null);
com.lowagie.text.Document iTextDoc.add(new Paragraph(Element.ALIGN_JUSTIFIED, chunk));
The reason for adding this invisible text is to identify this particular page at the time of onEndPage(). But it is failing.
To achieve in the onEndPage(), I have the following code -
boolean b = (pdfWriter.getDirectContent().toString()).contains("invisible text here");
I get the value of b as false.
If I compare any other text on that page(which is visible) results b as true.
I tried to manually search the invisible text in the PDF reader and it finds the text.
What could I modify to achieve this?
It is never a good idea to assume you can recognize text in the content without elaborate parsing. The text may be split into multiple segments, encoding might not be platform's default character encoding, etc... Thus don't try something like
boolean b = (pdfWriter.getDirectContent().toString()).contains("invisible text here");
You can achieve your goal
The reason for adding this invisible text is to identify this particular page at the time of onEndPage().
much more easily. Simply add a member to your PdfPageEvent implementation, i.e. the class with your onEndPage() method, and set it where you used to add the invisible page content to the text you used to add to the page.
Now you can test that member variable directly in your onEndPage(). Don't forget to reset the variable afterwards, preferably in onEndPage() itself!

PDF Fields Appear Invisible with PDF Clown

I use the Java version of PDF Clown to fill out the fields of PDF Acroforms. This works great and I'm able to programmatically fill out forms and save them without any issues.
However, some PDF viewers render some of the text invisible in the fields I'm filling out, unless you click on them in which case they become visible. This forum post explains that this can happen in form-fillable PDFs in general and that it can be fixed by setting the background color of the PDF field to "None", even if the GUI already says that the background color is "None." This has worked for others and I'd like to try it for myself.
Unfortunately, I"m stuck on how to actually do this in PDFClown. There isn't a direct method like field.setBackgroundColor(null) for the Field class and I'm not able to figure out a way to do it by using one of the other accessor methods, like getDefaultAppearanceState().
Is there anyone who knows how to do this in PDF Clown?
EDIT: A sample PDF with this issue can be found here. Everything in this PDF was filled in with PDF Clown. Note in particular that the two fields in the upper left (labeled with "Name") are invisible until clicked on. The five fields in the right are also invisible until clicked on, except for the "Charisma" field, which was previously invisible, but then I manually typed in the value and then it was made visible. Everything else was put in by PDF Clown, but unlike the other fields was made visible.
EDIT 2: It has since been discovered that this only happens when you overwrite values in an existing form-fillable character sheet. An original can be downloaded here.
As a first analysis:
Nearly as suspected in my original comment, the field "Name Line 1" contains the value (field dictionary V) "Doc Lightning" but a normal appearance stream (field dictionary AP -> appearances dictionary, key N) which displays no text.
Furthermore the interactive form dictionary entry NeedAppearances is not set to true; thus, the PDF viewer is made believe that the appearance streams are up-to-date. Only when you click into the field and, therefore, signal that you want to edit, the PDF viewer generates a new appearance of the stream, an appearance of its own making which it understands completely for the task of editing.
If you filled in that form field and no other tool changed your results afterwards, therefore, something is wrong either in your code or in PDF Clown. Please provide some self-contained sample code and not-yet-filled-in document to reproduce the issue.
EDIT:
I just applied the current (trunk) PDF Clown AcroFormFillingSample.java sample to the not-yet-filled-in Character Sheet (i.e. the revision consisting of the initial 1458834 bytes of your file), and the result is ok, all field contents are visible even without clicking into them. Thus there is something special in your source... (or do you use an older version?)
In detail:
Page 1 of the character sheet of Doc Lightning references the annotation in object 162:
/MK <<>>
/F 4
/Type /Annot
/Subtype /Widget
/Rect [37.0108, 617.055, 156.923, 631.717]
/FT /Tx
/DA /Helv 12 Tf 0 g
/T (Name Line 1)
/V (Doc Lightning)
/P 47 0 R
/AP 537 0 R
Thus, the value of the field indeed is "Doc Lightning".
On the other hand, the appearances dictionary in object 537 references the normal appearance stream:
/N 538 0 R
And the stream in object 538 only contains:
/Tx BMC
q
1 0 0 1 2 -7.331 cm
/Helv 12 Tf
Q
EMC
So the normal appearance stream positions in the field (setting the current transformation matrix accordingly) and selects a font (Helvetica, properly defined in the ressources, BTW), and then prints... nothing!
The interactive form dictionary (object 144) does not contain a NeedAppearances entry at all. According to the PDF specification ISO 32000-1:2008, Table 218, this entry is
A flag specifying whether to construct appearance streams and appearance dictionaries for all widget annotations in the document (see 12.7.3.3, “Variable Text”). Default value: false.
Thus, the PDF viewer acts just like expected when not showing the value "Doc Lightning" of "Name Line 1" but instead the empty appearance stream.
After revisiting this issue, and carefully looking at the source code, I realized that the Sample.java class of PDFClown's samples had an applyDocumentSettings() method that contained three lines of code missing from my source:
//Previously we instantiated "document" from org.pdfclown.files.File.getDocument()
ViewerPreferences view = new ViewerPreferences(document); // Instantiates viewer preferences inside the document context.
document.setViewerPreferences(view); // Assigns the viewer preferences object to the viewer preferences function.
view.setDisplayDocTitle(true);
I'm not sure that the last line is actually necessary, but I went ahead and kept it in for good measure.
The user mkl wrote in his answer that "the PDF viewer generates a new appearance of the stream, an appearance of its own making which it understands completely for the task of editing." It seems that what the lines of code do above is generate an appearance that is understood to be for reading (and maybe editing?).

PDF find out if text is underlined or a table cell

I have been playing around with PdfBox and PDFTextStripperByArea method.
I was able to extract information if the text is bold or italic, but I'm unable to get the underline information.
As far as I understand it in PDF, underline is done by drawing lines. So in theory I should be able to get some sort of information about lines somewhere around the text. Giving this information I could then find out if either text is underlined or in a table.
Here is my code so far:
List<TextPosition> textPos = charactersByArticle.get(index);
for (TextPosition t : textPos)
{
if (t.getFont().getFontDescriptor() != null)
{
if (t.getFont().getFontDescriptor().getFontWeight() > BOLD_WEIGHT ||
t.getFont().getFontDescriptor().isForceBold())
{
isBold = true;
}
if (t.getFont().getFontDescriptor().isItalic())
{
isItalic = true;
}
}
}
I have tried to play around the PDGraphicsState object which is processed in the processEncodedText method in PDFStreamEngine class but no information of lines found there.
Any suggestions where this information could be retrieved from ?
Here is what I have found out so far:
PDFBox uses a resource file to bound PDF operators/instructions to certain classes which then process the information.
If we take a look at the PDFTextStripper.properties resource file under:
pdfbox\src\main\resources\org\apache\pdfbox\resources\
we can see that for instance the BT operator is bound to the
org.apache.pdfbox.util.operator.BeginText class and so on.
The PDFTextStripper under
pdfbox\src\main\java\org\apache\pdfbox\util\
takes this into account and utilizes the processing of the PDF with this classes.
BUT all graphical objects are ignored, therefore no information of underline or table structure!
Now if we take a look at the PageDrawer.properties resource file we can see that this one bounds to almost all operators available. Which is utilized by PageDrawer class under
pdfbox\src\main\java\org\apache\pdfbox\pdfviewer\
The "trick" is now to find out which graphical operators are those who represent underline and tables and to use them in combination with PDFTextStripper.
Now this would mean reading the PDF file specification, which is currently way to much work.
If someone knows which operators are responsible for which actions to draw underlines and table lines please let me know.
As you mention -- PDFBox uses resource files, to bind PDF operators/ instructions to visitors which will process the information.
You'd probably best start by copying PDFBox's existing visitor into your own source-folder, and then adding/ extending the implementation from there.
My long-ago PostScript experience recalls 'moveto' and 'lineto' operators. Since PDF is roughly PS-based, you'll be looking for something similar.
http://learnpostscript.wordpress.com/category/lineto/
PDF format is a b*tch -- it's HTML, done wrong. It represents graphical implementation, not semantics. Even reconstructing sentences is difficult -- words or even individual characters are positioned, the 'space' or 'newline' must be algorithmically reconstructed. In short, Adobe are a*holes. And Reader is an non-ergonomic, bug-riddled, insecure, bloated pig.
However, you can accomplish your requirement -- if you are willing to put, say, 12+ hours of work in. As well as detecting by position, underlines will typically be emitted in the PDF immediately after their text.. so you can latch your detection by PDF document-order, not just page position.
Also, try constructing a trivial two-line PDF with underlined text. Then see what you can make of it, parsing it back in! The underline should stick out like dog's bananas, and once you can detect that, you'll be well on the way.
PDFBox is not very good for extensibility, it's mainly just a big pile of algorithms. For this reason, just copy the PDFTextStripper source (and maybe have PageDrawer for reference) and prototype from there.
Hope this helps!
you can use Itext to generate pdf reports.
by using itext you can able to put the lines in easy way.
try the follwing.
document.add(new LineSeparator(0.5f, 50, null, 0, 198));
the above code is used to generate lines in pdf report. and set the dimensions according to your choice.
hope this will help you.
As far as I have understood the pdfbox, there is no option by which you can read underline. Maybe you can try itextpdf for this purpose.
According to the api getfont() returns The font size.
You can use getStyle() method and it will return STYLE_UNDERLINE for underlined font. Thus you can retrieve underline style.

Categories