PDFBox inconsistent PDTextField behaviour after setValue - java

PDFBox setValue() is not setting data for each PDTextField. It is saving few fields. It is not working for fields which have similar appearance in getFullyQualifiedName().
Note: field.getFullyQualifiedName() { customdutiesa, customdutiesb, customdutiesc } it is working for customdutiesa, but not working for customdutiesb and customdutiesc etc...
#Test
public void testb3Generator() throws IOException {
File f = new File(inputFile);
outputFile = String.format("%s_b3-3.pdf", "123");
try (PDDocument document = PDDocument.load(f)) {
PDDocumentCatalog catalog = document.getDocumentCatalog();
PDAcroForm acroForm = catalog.getAcroForm();
int i = 0;
for (PDField field : acroForm.getFields()) {
i=i+1;
if (field instanceof PDTextField) {
PDTextField textField = (PDTextField) field;
textField.setValue(Integer.toString(i));
}
}
document.getDocumentCatalog().getAcroForm().flatten();
document.save(new File(outputFile));
document.close();
}
catch (Exception e) {
e.printStackTrace();
}
}
Input pdf link : https://s3-us-west-2.amazonaws.com/kx-filing-docs/b3-3.pdf
Ouput pdf link : https://kx-filing-docs.s3-us-west-2.amazonaws.com/123_b3-3.pdf

The problem is that under certain conditions PDFBox does not construct appearances for fields it sets the value of, and, therefore, during flattening completely forgets the field content:
// in case all tests fail the field will be formatted by acrobat
// when it is opened. See FreedomExpressions.pdf for an example of this.
if (actions == null || actions.getF() == null ||
widget.getCOSObject().getDictionaryObject(COSName.AP) != null)
{
... generate appearance ...
}
(org.apache.pdfbox.pdmodel.interactive.form.AppearanceGeneratorHelper.setAppearanceValue(String))
I.e. if there is a JavaScript action for value formatting associated with the field and no appearance stream is yet present, PDFBox assumes it does not need to create an appearance (and probably would do it wrong anyways as it does not use that formatting action).
In case of a use case later flattening the form, that assumption of PDFBox obviously is wrong.
To force PDFBox to generate appearances for those fields, too, simply remove the actions before setting field values:
if (field instanceof PDTextField) {
PDTextField textField = (PDTextField) field;
textField.setActions(null);
textField.setValue(Integer.toString(i));
}
(from FillAndFlatten test testLikeAbubakarRemoveAction)

Related

How to get a input field's title using PDFBox

I want to get all fields from pdf file and get all required data: field type, id, default value, title, popup text and so on.
I can get almost all data except title. If I got it right, correct field titles I can find in the pdf content chapter, but how I can match them to fields?
I use this code to get fields information.
try(PDDocument document = Loader.loadPDF(pdfFileBinary)) {
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm == null) {
return Collections.emptyList();
}
return acroForm.getFields()
.stream()
.flatMap(unfoldFunction)
.map(PdfFieldImpl::new)
.collect(Collectors.toList());
} catch (IOException e) {
throw new RuntimeException("Can't parse pdf document");
}
If someone know the solution but with different library, it would be also great.
Sorry if it is too stupid question for you :)

add custom value for signature file using pdfbox java

How i will add my custom value as like (xyz123) when adding signature.Because when i adding signature that time i am not able to add custom file for signature. the field "signature1" automatically added inside the document.
My output file screenshot attached bello:
Instead of "signature1", I want to add my custom value as (xyz123)
throws IOException {
PDSignature pdSignature = new PDSignature();
pdSignature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE);
pdSignature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
pdSignature.setName("jvmfy");
pdSignature.setReason("Learn how to sign pdf with jvmfy.com!");
pdSignature.setLocation("location");
// the signing date, needed for valid signature
pdSignature.setSignDate(Calendar.getInstance());
// register signature dictionary and sign interface
document.addSignature(pdSignature, signature);
// write incremental (only for signing purpose)
// use saveIncremental to add signature, using plain save method may break up a
// document
document.saveIncremental(output);
}
private void signDetached(SignatureInterface signature, PDDocument document, OutputStream output)
throws IOException {
PDSignature pdSignature = new PDSignature();
pdSignature.setFilter(PDSignature.FILTER_ADOBE_PPKLITE);
pdSignature.setSubFilter(PDSignature.SUBFILTER_ADBE_PKCS7_DETACHED);
pdSignature.setSignDate(Calendar.getInstance());
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm();
if (acroForm == null)
{
acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
}
PDSignatureField signatureField = new PDSignatureField(acroForm);
signatureField.setPartialName("xyz123");
signatureField.setValue(pdSignature);
signatureField.getWidgets().get(0).setPage(document.getPage(0));
document.addSignature(pdSignature, signature);
document.saveIncremental(output);
}```
You need your PDSignatureField object... when you have it, do this:
signatureField.setPartialName("xyz123");
If the code doesn't create its own PDSignatureField object (as in the example for invisible signature fields), PDFBox does it for you. You can get all PDSignatureField objects by calling PDDocument.getSignatureFields().
If you created the file yourself, then there is only one such field. If you are signing existing files, then it's more tricky, then I'd recommend that you compare the field names or the results of getCOSObject() (i.e. create two sets). Don't assume that the last one is the right one (in some cases it isn't).
Or you create the field yourself:
PDAcroForm acroForm = document.getDocumentCatalog().getAcroForm(null);
if (acroForm == null)
{
acroForm = new PDAcroForm(document);
document.getDocumentCatalog().setAcroForm(acroForm);
}
PDSignatureField signatureField = new PDSignatureField(acroForm);
signatureField.setValue(signature);
signatureField.getWidgets().get(0).setPage(document.getPage(0));
acroForm.getFields().add(signatureField);
// page annotation, only needed if PDF/A
document.getPage(0).getAnnotations().add(signatureField.getWidgets().get(0));
document.getPage(0).getCOSObject().setNeedToBeUpdated(true);

Why is my form being flattened without calling the flattenFields method?

I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.
Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...
The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.
IllegalArgumentException issue has been fixed in iText 7.1.5.

PDFBox RichText formatted field

I am currently trying to open, edit & save a PDF file using PDFBox.
with plain-text fields it already works but I'm having a hard time setting RichTextFormat-Text as value, since everytime I use "setRichTextValue", save and open the document, the field is empty (unchanged).
Code is as follows (stripped from multiple functions):
PDDocument pdfDoc = PDDocument.load(new File("my pdf path"));
PDDocumentCatalog docCatalog = pdfDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField("field-to-change");
if (field instanceof PDTextField) {
PDTextField tfield = (PDTextField) field;
COSDictionary dict = field.getCOSObject();
//COSString defaultAppearance = (COSString) dict.getDictionaryObject(COSName.DA);
//if (defaultAppearance != null && font != "" && size > 0)
// dict.setString(COSName.DA, "/" + font + " " + size + " Tf 0 g");
boolean rtf = true;
String val = "{\rtf1\ansi\deff0 {\colortbl;\red0\green0\blue0;\red255\green0\blue0;} \cf2 Red RTF Text \cf1 }";
tfield.setRichText(rtf);
if (rtf)
tfield.setRichTextValue(val);
else
tfield.setValue(val);
}
// save document etc.
by digging the PDFBox documentation I found this for .setRichTextValue(String r)
* Set the fields rich text value.
* Setting the rich text value will not generate the appearance
* for the field.
* You can set {#link PDAcroForm#setNeedAppearances(Boolean)} to
* signal a conforming reader to generate the appearance stream.
* Providing null as the value will remove the default style string.
* #param richTextValue a rich text string
so I added
pdfDoc.getDocumentCatalog().getAcroForm().setNeedAppearances(true);
..directly after the PDDocument object and it didnt change anything. So I searched further and found the AppearanceGenerator class, which should create the styles automatically? But it doesnt seem to, and you cant call it manually.
I'm at a loss here and Google is no help either. Seems nobody ever used this before or I'm just too stupid. I want the solution to be done in PDFBox since you dont pay for licenses and it already works for everything else I am doing (getting & replacing images, removing text fields), so it must be possible right?
Thanks in advance.

How to remove user-defined style for XLS via ApachePOI?

I want remove some predefined styles for XLS - for example "Good". For XLSX there is no problem: create new CTCellStyle (unfortunatelly by reflection), setName("Good"), setBuiltinId(26) and setHidden(true) - now Excel (2016) doesn’t show "Good" style. Can I do sth like this for XLS?
EDIT
Sample code:
Hidding style for XLSX - there is no problem:
StylesTable styleSource = xssfWorkbook.getStylesSource(); // xssfWorkbook is instance of XSSFWorkbook
try {
// get ctCellStyles (by reflection)
Field field = StylesTable.class.getDeclaredField("doc");
field.setAccessible(true);
Object obj = field.get(styleSource);
StyleSheetDocument ssd = (StyleSheetDocument) obj;
CTStylesheet ctStyleSheet = ssd.getStyleSheet();
CTCellStyles ctCellStyles = ctStyleSheet.getCellStyles();
// find style "Good"
for (int i = 0; i < ctCellStyles.sizeOfCellStyleArray(); i++) {
CTCellStyle ctCellStyle = ctCellStyles.getCellStyleArray(i);
if (ctCellStyle.getName().equals("Good")) {
XmlBoolean hiddenXml = XmlBoolean.Factory.newInstance();
hiddenXml.setStringValue("1");
ctCellStyle.xsetHidden(hiddenXml);
}
}
} catch (Exception e) {}
Hidding style for XLS:
If style exists in workbook I can get it, but how to set it as "hidden"?
try {
// get InternalWorkbook (by reflection)
Field field = HSSFWorkbook.class.getDeclaredField("workbook");
field.setAccessible(true);
Object iwb = field.get(hssfWorkbook); // hssfWorkbook is instance of HSSFWorkbook
InternalWorkbook internalWorkbook = (InternalWorkbook) iwb;
// find style "Good"
for (int xfIndex = 0; xfIndex < internalWorkbook.getNumRecords(); xfIndex++) {
// try to get every record as StyleRecord from internalWorkbook
StyleRecord styleRecord = internalWorkbook.getStyleRecord(xfIndex);
if (styleRecord != null && styleRecord.getName() != null) {
if (styleRecord.getName().equals("Good")) {
new DebugUtil(styleRecord.getName());
// TODO set here sth like "hidden" for styleRecord or maybe:
// get style with current id from workbook
HSSFCellStyle hssfCellStyle = hssfWorkbook.getCellStyleAt((short) xfIndex); // workbook is instance of org.apache.poi.ss.usermodel.Workbook
// TODO set here sth like "hidden" for hssfCellStyle
}
}
}
} catch (Exception e) {}
Even If I could mark style as "hidden", there is other problem: If I iterate from 0 to internalWorkbook.getNumRecords() I get only existing styles. So if I'm creating workbook self, probably I should create new StyleRecord and/or HSSFCellStyle and mark as "hidden". I tried this:
int size = internalWorkbook.getSize();
StyleRecord newStyleRecord = internalWorkbook.createStyleRecord(size);
HSSFCellStyle newHssfCellStyle = hssfWorkbook.createCellStyle();
newHssfCellStyle.setAlignment((short) 3); // align right, for tests, to see difference between original and created "Good" style
newStyleRecord.setName("Good");
// TODO set here sth like "hidden" for newStyleRecord and/or for newHssfCellStyle
This is the way to set my own "Good" style. If I don't do it, Excel (2016) will show default "Good" style.
You should be able to use HSSFWorkbook.getCellStyleAt(int index) to access styles at a given position.

Categories