Problems scanning the 1042-S 2015 PDF file with iText

Problems scanning the 1042-S 2015 PDF file with iText - java

I am building a program that will write automatically into a PDF file. I am using the library iText to do.
Well, to check the name of the fields I run this small code:
public static void main(String[] args) throws IOException {
PdfReader reader = new PdfReader(PDF_PATH);
AcroFields fields = reader.getAcroFields();
Set<String> fldNames = fields.getFields().keySet();
for (String fldName : fldNames) {
System.out.println( fldName + ": " + fields.getField(fldName));
}
}
The output is something like:
topmostSubform[0].CopyA[0].Group12-13[0].Line13d-g[0].Line13e[0]: 13e
topmostSubform[0].CopyB[0].Group1-11[0].Line3[0].Line3[0]: 0
topmostSubform[0].CopyE[0].Group1-11[0].Line7[0]: 7
topmostSubform[0].CopyD[0].Group14-24[0].Line16[0].Line15i[0]: 15i
the topmostSubform[0].CopyE[0].Group1-11[0].Line7[0] is the value that I am looking for and what comes after the : is the value that I put in the original PDF to keep track of the variable names of each field.
So far so good, but I am having problem with 1 specific field. The field number 16. I input the value 16 to keep track but in my output there is only 1 16 output but It was supposed to have 5 Copies, the CopyA , CopyB, CopyC, CopyD and CopyE. What I find is only this:
topmostSubform[0].CopyA[0].Group14-24[0].Line16[0] and when I try to write in this field using this code:
form.setField("topmostSubform[0].CopyA[0].Group14-24[0].Line16[0]", "BLA BLA BLA"); it does not work. Obviously something weird is happening with the 16 Field.
The PDF can be Downloaded at: https://www.irs.gov/pub/irs-prior/f1042s--2015.pdf
Thank you.

The form is a hybrid XFA form (or, as I like to call such forms, an abomination). In a hybrid XFA form, the fields of the form are described twice, once using PDF syntax (pure AcroForm technology), once using XML (the XML Forms Architecture, aka XFA).
This is problematic because:
There are differences between the form functionality in AcroForm functionality versus the XML Forms Architecture.
There's always the risk that the form described in PDF syntax doesn't correspond with the form in XML syntax.
That's why I always throw away the XML syntax. See the FillHybridForm example:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.removeXfa();
form.setField("topmostSubform[0].CopyA[0].Group14-24[0].Line16[0]", "16");
stamper.close();
reader.close();
}
This line is the one you probably don't have in your code:
form.removeXfa();
Please read my answers to the following questions for more info:
How to check a checkbox in PDF file with the same variable name with iText and Java
How to change the text color of an AcroForm field?
Is it safe to remove XFA?
If you only have time to read one Q&A from the list above, choose the last one in the list.

Related

How to export data set using pdf template with itext?

In my project, some data sets are needed to be exported in PDF format.
I learned that iText is helpful, and PdfpTable can do the work, but it needs much code to deal with styles. While using PDF template can save time and code for adjusting style, but I can only set certain fields left in the template.
Can you give me some suggestions to show the data sets using commands like foreach? Thanks in advance!
Here are my code using pdfpTable, which has done the work, but the code is a little ugly:
PdfPTable pdfTable = createNewPDFTable();
for (int i = 0; i < dataSet.size(); i++) {
MetaObject metaObject = SystemMetaObject.forObject(dataSet.get(i));
for (String field : fields) {
Phrase phrase = new Phrase(String.valueOf(metaObject.getValue(field) != null ? metaObject.getValue(field) : "")
, PDFUtil.createChineseSong(DEFAULT_CELL_FONT_SIZE));
PdfPCell fieldCell = new PdfPCell(phrase);
fieldCell.setBorder(Rectangle.NO_BORDER);
fieldCell.setFixedHeight(DEFAULT_COLUMN_HEIGHT);
fieldCell.setHorizontalAlignment(Element.ALIGN_CENTER);
fieldCell.setVerticalAlignment(Element.ALIGN_MIDDLE);
pdfTable.addCell(fieldCell);
}
}
Here are some code using pdfp template,which is copied from itext examples, the work is unfinished yet because i haven't find a proper way to show the data set.
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(dest));
AcroFields form = stamper.getAcroFields();
form.setField("text_1", "Bruno Lowagie");
form.setFieldProperty("text_1", "setfflags", PdfFormField.FF_READ_ONLY, null);

There is an inconsistency in your question. You write: PdfpTable can do the work, but it needs much code to deal with styles. However, in your first code snippet, you don't really create your PDFs the way one would expect. Instead of producing a high volume of finished PDFs, you create use PdfPTable to create a template. I assume you then use that template to create a high volume of finished PDFs.
If you want to use a template and populate it afterwards, you shouldn't create your form using iText. Create it manually, for instance using Open Office or Libre Office. See for instance the example in chapter 6 of my book (section 6.3.5). Create the template with a tool that has a GUI, then fill out that template many times using iText.
This approach has some down-sides: all the content has to fit the fields you define. All fields have a fixed position on a fixed page.
If "applying styles through code" is a problem, you may want to follow the approach described in the ZUGFeRD book. In that book, we create HTML first: Creating HTML invoices.
Once you have the HTML, then convert the HTML to PDF, and use CSS to apply styles: Creating PDF invoices.
This is how we create a ZUGFeRDDocument:
ZugferdDocument pdfDocument = new ZugferdDocument(
new PdfWriter(fos), ZugferdConformanceLevel.ZUGFeRDComfort,
new PdfOutputIntent("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", new FileInputStream(INTENT)));
pdfDocument.addFileAttachment(
"ZUGFeRD invoice", dom.toXML(), "ZUGFeRD-invoice.xml",
PdfName.ApplicationXml, new PdfDictionary(), PdfName.Alternative);
pdfDocument.setTagged();
HtmlConverter.convertToPdf(
new ByteArrayInputStream(html), pdfDocument, getProperties());
The getProperties() method looks like this:
public ConverterProperties getProperties() {
if (properties == null) {
properties = new ConverterProperties()
.setBaseUri("resources/zugferd/");
}
return properties;
}
You can find other examples on how to use HTML to PDF here: pdfHTML add-on (read the introduction).
Note that you are using an old version of iText. The examples I shared are using iText 7. There's a huge difference between iText 5 and iText 7.

How to make a particular sub-string Bold while printing a string in Pdf using iText in java eclipse?

I am using iText in Java to convert a HTML to PDF.
I want a particular paragraph which has some words as Bold and some as Bold+Underlined to be passed as a string to the Java code and to be converted to PDF using the iText library.
I am unable to find a suitable method for this. How should I do this?

If you want to convert XHTML to PDF, you need iText + XML Worker.
You can find a number of examples here: http://itextpdf.com/sandbox/xmlworker
The most simple examples looks like this:
public void createPdf(String file) throws IOException, DocumentException {
// step 1
Document document = new Document();
// step 2
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
// step 3
document.open();
// step 4
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new FileInputStream(HTML));
// step 5
document.close();
}
Note that the HTML file is passed as a FileInputStream in this case. You want to pass a String. This means you'll have to do something like this:
XMLWorkerHelper.getInstance().parseXHtml(writer, document,
new StringReader("<p>The <b>String</b> I want to render to PDF</p>"));
There are more complex examples in the Sandbox in case you need support for images, special fonts, and so on. For instance this example will convert XHTML to a series of iText objects instead of rendering them to a page rightaway.

iText: PDF Generation. One Template. More Inputs. One Output

i try to generate a pdf with itext. First i read in a existing template and stamp the formulars in the method stampFormular(Formular formular, PdfStamper stamper). The stamp method works. But i have a problem, with adding more formulars to the output file.
I want to stamp for each Formular the PDF Template "yellow". So i tried it with, the document.add(), but that doesn't work. So i tried to do this with pdf writer. But that doesn't work to. Any idea how i can stamp the pdf template with the one formular data, make a new page and stamp the same pdf template with the next formular data.
public static File createForm(List<Fomular> formulars) {
Document document = new Document();
File pdf = null;
document.open();
try {
PdfReader pdfTemplate = new PdfReader('YELLOW');
PdfStamper stamper = new PdfStamper(pdfTemplate,
new FileOutputStream("output.pdf"));
PdfWriter writer;
for (Formular f : formulars) {
stamper = stampFormular(f, stamper);
writer = stamper.getWriter();
writer.newPage();
}
stamper.close();
pdfTemplate.close();
pdf = new File("output.pdf");
Desktop.getDesktop().open(pdf);
} catch (DocumentException | IOException e) {
e.printStackTrace();
}
return pdf;
}

A couple of observations:
You can't take the PdfWriter object from a PdfStamper, use newPage() and expect it to work. That's the equivalent of opening the hood of your car and start rewiring tubes that fit without knowing anything about the art of motor maintenance. When you want to add a new page to a stamper, you're supposed to use the insertPage() method as explained in the documentation.
Second observation: you're not telling us if you're flattening the content of the forms. If you do, then it's simple, just use the example mentioned in the documentation and you're all set. In other words: combine PdfStamper with PdfSmartCopy. Especially if you're using the same template over and over again, PdfSmartCopy will give you much better results than PdfCopy for the reason explained in chapter 6.
Suppose that your template needs to remain interactive, then you may have a problem for a reason that is also explained in that chapter: different visualizations of a field with a specific name must always have the same value. For instance: if your template has a field named name, then every occurrence of this field in the document must have the same value. If you don't want this, you need to rename name, for instance to name1, name2, etc...
Concatenation of templates that need to remain interactive used to be done with PdfCopyFields (see documentation). Here, the documentation is somewhat outdated. In the latest version of iText, we now have a method addDocument() in PdfCopy and PdfSmartCopy. This method allows you to add a full document at once, preserving the interactivity.

How to automate PDF form-filling in Java

I am doing some "pro bono" development for a food pantry near where I live. They are inundated with forms and paperwork, and I would like to develop a system that simply reads data from their MySQL server (which I set up for them on a previous project) and feeds data into PDF versions of all the forms they are required to fill out. This will help them out enormously and save them a lot of time, as well as get rid of a lot of human errors that are made when filling out these forms.
Not knowing anything about the internals of PDF files, I can foresee two avenues here:
Harder Way: It is possible to scan a paper document, turn it into a PDF, and then have software that "fills out" the PDF simply by saying "add text except blah to the following (x,y) coordinates..."; or
Easier Way: PDF specification already allows for the construct of "fields" that can be filled out; this way I just write code that says "add text excerpt blah to the field called *address_value*...", etc.
So my first question is: which of the two avenues am I facing? Does PDF have a concept of "fields" or do I need to "fill out" these documents by telling the PDF library the pixel coordinates of where to place data?
Second, I obviously need an open source (and Java) library to do this. iText seems to be a good start but I've heard it can be difficult to work with. Can anyone lend some ideas or general recommendations here? Thanks in advance!

You can easily merge data into PDF's fields using the FDF(Form Data Format) technology.
Adobe provides a library to do that : Acrobat Forms Data Format (FDF) Toolkit
Also Apache PDFBox can be used to do that.

Please take a look at the chapter about interactive forms in the free ebook The Best iText Questions on StackOverflow. It bundles the answers to questions such as:
How to fill out a pdf file programatically?
How can I flatten a XFA PDF Form using iTextSharp?
Checking off pdf checkbox with itextsharp
How to continue field output on a second page?
finding out required fields to fill in pdf file
and so on...
Or you can watch this video where I explain how to use forms for reporting step by step.
See for instance:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream(dest));
AcroFields fields = stamper.getAcroFields();
fields.setField("name", "CALIFORNIA");
fields.setField("abbr", "CA");
fields.setField("capital", "Sacramento");
fields.setField("city", "Los Angeles");
fields.setField("population", "36,961,664");
fields.setField("surface", "163,707");
fields.setField("timezone1", "PT (UTC-8)");
fields.setField("timezone2", "-");
fields.setField("dst", "YES");
stamper.setFormFlattening(true);
stamper.close();
reader.close();
}

public void fillPDF()
{
try {
PDDocument pDDocument = PDDocument.load(new File("D:/pdf/pdfform.pdf")); // pdfform.pdf is input file
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
PDField field = pDAcroForm.getField("Given Name Text Box");
field.setValue("firstname");
field = pDAcroForm.getField("Family Name Text Box");
field.setValue("lastname");
field = pDAcroForm.getField("Country Combo Box");
field.setValue("Country");
System.out.println("country combo" );
field = pDAcroForm.getField(" Driving License Check Box");
field = pDAcroForm.getField("Favourite Colour List Box");
System.out.println("country combo"+ field.isRequired());
pDDocument.save("D:/pdf/pdf-java-output.pdf");
pDDocument.close();
} catch (IOException e) {
e.printStackTrace();
}
}

iText copy form fields

Is there a way in iText to copy just the PDF acroform fields from one PDF document to another PDF document? I have the code to copy the entire PDF, but I would like to be able to overlay all my fields to a new/updated PDF document.

public void replaceBackground(String newBackground, String CurrentForm, String newFile) throws Exception
{
PdfReader reader = new PdfReader(newBackground);
PdfReader reader2 = new PdfReader(CurrentForm);
PdfStamper stamp = new PdfStamper(reader2, new FileOutputStream(newFile));
stamp.replacePage(reader, 1, 1);
stamp.close();
}

I don't quite remember very well if we were able to achieve this since I was not directly working on the implementation but I remember pointing someone in this direction a while ago.
You may use the PdfStamper to extract fields out of the acroForm and then use the PdfWriter to create a new AcroForm with the pre-populated fields. I wish I could give you a better example but I don't quite have the code with me.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.