error while converting pdf to pdf/a - java

I am trying to create a pdf/a file from a pdf file using itext. Everything goes fine and I get a pdf/a file. But when I check it here http://www.pdf-tools.com/pdf/validate-pdfa-online.aspx I get an error like
The width for character 1 in font 'ArialRegular' does not match.
The width for character 2 in font 'ArialRegular' does not match.
The width for character 3 in font 'ArialRegular' does not match.
how could I solve this error?
PdfReader pdfReader = new PdfReader(file);
FontFactory.defaultEmbedding = true;
BaseFont bf = BaseFont.createFont(FONT, BaseFont.CP1252, BaseFont.EMBEDDED);
while (currentpagenumber < pdfReader.getNumberOfPages()) {
document.newPage();
currentpagenumber++;
finalpagenumber++;
page = pdfAWriter.getImportedPage(pdfReader, currentpagenumber);
cb.addTemplate(page, 0, 0);
cb.beginText();
cb.setFontAndSize(bf, 18);
cb.showTextAligned(PdfContentByte.ALIGN_CENTER, finalpagenumber+"", 520, 5, 0);
cb.endText();
ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream(PROFILE));
this is the basic code. I also tried to find the font used in each page using pdfdictionary .. and tried to embeded it as base font.. but couldnot work..

Never used IText before, but just looking at doing a similar conversion now, and it appears to be a bug in the library and not in the way you are using it.
The best option I can give you is to report a bug at:
http://sourceforge.net/p/itext/bugs/
The IText mailing list would also be another place to try.
md_5

Related

Add HTML Markup using java Apache PDFBOX

I have been using PDFBOX and EasyTable which extends PDFBOX to draw datatables. I have hit a problem whereby I have a java object with a string of HTML data that I need to be added to the PDF using PDFBOX. A dig at the documentation seems not to bear any fruits.
The code below is a snippet hello world, which I want on the pdf been generated to have H1 formatting.
// Create a document and add a page to it
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage( page );
// Create a new font object selecting one of the PDF base fonts
PDFont font = PDType1Font.HELVETICA_BOLD;
// Start a new content stream which will "hold" the to be created content
PDPageContentStream contentStream = new PDPageContentStream(document, page);
// Define a text content stream using the selected font, moving the cursor and drawing the text "Hello World"
contentStream.beginText();
contentStream.setFont( font, 12 );
contentStream.moveTextPositionByAmount( 100, 700 );
contentStream.drawString( "<h1>HelloWorld</h1>" );
contentStream.endText();
// Make sure that the content stream is closed:
contentStream.close();
// Save the results and ensure that the document is properly closed:
document.save( "Hello World.pdf");
document.close();
}
Use jerico to format the html to free text while mapping correctly the output of tags.
sample
public String extractAllText(String htmlText){
return new net.htmlparser.jericho
.Source(htmlText)
.getRenderer()
.setMaxLineLength(Integer.MAX_VALUE)
.setNewLine(null)
.toString();
}
Include on your gradle or maven:
compile group: 'net.htmlparser.jericho', name: 'jericho-html', version: '3.4'
PDFBox does not know HTML, at least not for creating content.
Thus, with plain PDFBox you have to parse the HTML yourself and derive special text drawing characteristics from the tags text is in.
E.g. when you encounter "<h1>HelloWorld</h1>", you have to extract the text "HelloWorld" and use the information that it is in a h1 tag to select an appropriate prime header font and font size to draw that "HelloWorld".
Alternatively you can look for a library doing that HTML parsing and transforming to PDF text drawing instructions for PDFBox, e.g. Open HTML to PDF.

Setting substitution fonts in AcroForm itext7

I have a pdf with AcroForm and need to fill it with string that contains different languages glyphs (English, Chinese, Korean, Khmer).
In iText5 I've used:
AcroFields form = stamper.getAcroFields();
form.addSubstitutionFont(arialFont);
form.addSubstitutionFont(khmerFont);
And it worked fine for Chinese and Korean, but I've faced an issue with Khmer ligatures not being rendered. Found out that I need pdfCalligraph addon to make ligatures work, but it comes with iText7 only. I've managed to add paragraphs with proper Khmer ligatures rendering (requiring typography as a dependency and loading a license key). But in AcroForm it won't do it automatically. I'm struggling to find an iText7 version of addSubstitutionFont and make it work with pdfCalligraph.
Code I've used with iText7:
PdfReader reader = new PdfReader(templatePath);
PdfDocument pdf = new PdfDocument(reader, new PdfWriter(outputPath));
Document document = new Document(pdf);
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfFont khmerFont = PdfFontFactory.createFont(pathToKhmerFont, PdfEncodings.IDENTITY_H, true);
PdfFont font = PdfFontFactory.createFont(pathToArialUnicodeFont, PdfEncodings.IDENTITY_H, true);
pdf.addFont(khmerFont);
pdf.addFont(font);
FontSet set = new FontSet();
set.addFont(pathToKhmerFont);
set.addFont(pathToArialUnicodeFont);
document.setFontProvider(new FontProvider(set));
document.setProperty(Property.FONT, "Arial");
form.setNeedAppearances(true);
String content = "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일";
PdfFormField tf = form.getField("Text3");
tf.setValue(content);
// tf.setFont(khmerFont);
tf.regenerateField();
// add a paragraph just to check pdfCalligraph works
document.add(new Paragraph(content));
pdf.close();
String used to test proper rendering: "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일"
iText5 in form field without pdfCalligraph, but with substitution fonts:
iText7 in form field with pdfCalligraph loaded (set arial font field.setFont(arialFont)):
iText7 in form field with pdfCalligraph loaded (set khmer font field.setFont(khmerFont)):
iText7 same document but in a paragraph instead of form field with pdfCalligraph loaded (It is an expected resut, so it does use pdfCalligraph for paragraphs, but not for form fields):
So, as you can see there're basically 2 issues:
How do I addSubstitutionFont in iText7?
How do I use pdfCalligraph in PdfFormField appearance?
I've also checked if pdfCalligraph works in text form and it looks like it does not. Here is a code I've used to check it:
LicenseKey.loadLicenseFile(path_to_license);
String outputPath = path_to_output_doc;
PdfDocument pdf = new PdfDocument(new PdfWriter(outputPath));
Document document = new Document(pdf);
// prepare fonts for pdfCalligraph to use
FontSet set = new FontSet();
set.addFont("/path_to/Khmer.ttf");
set.addFont("/path_to/ArialUnicodeMS.ttf");
FontProvider fontProvider = new FontProvider(set);
document.setFontProvider(fontProvider);
document.setProperty(Property.FONT, "Arial");
String content = "khmer ថ្ងៃឈប់សម្រាក and chinese 假日 and korean 휴일";
// Add a paragraph to check if pdfCalligraph works
document.add(new Paragraph(content));
// Add a form text field
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
PdfTextFormField field = PdfFormField.createText(pdf, new Rectangle(36, 700, 400, 30), "test");
field.setValue(content);
form.addField(field);
document.close();
Output with pdfCalligraph dependency loaded (as you can see paragraph rendered properly, but in form all non-halvetica chars just ignored:
Output without pdfCalligraph dependency loaded (as you can see paragraph is not rendered properly which is expected, the form field looks same as with loaded pdfCalligraph):
Am I missing something?

iText PDF add text in absolute position on top of the 1st page

I have a script that creates a PDF file and writes contents to it. After the execution is complete I need to write the status (fail, success) to the PDF, but the status should be on the top of the page. So the solution I came up with is to use absolute positioned text. Below is my code for the same
PdfContentByte cb = writer.DirectContent;
BaseFont bf = BaseFont.CreateFont(BaseFont.HELVETICA, BaseFont.CP1252, BaseFont.NOT_EMBEDDED);
cb.SaveState();
cb.BeginText();
cb.MoveText(700, 30);
cb.SetFontAndSize(bf, 12);
cb.ShowText("My status");
cb.EndText();
cb.RestoreState();
But as the PDF creates multiple pages, this is added to the last page of the PDF. How can I add it to the 1st page??
Also is there a way to calculate the top coordinates of the page, ie the top-left coordinate?
iText was written with internet applications in mind. It was designed to flush content from memory as soon as possible: if a page is finished, that page is sent to the OutputStream and there is no way to return to that page.
That doesn't mean your requirement is impossible. PDF has a concept known as Form XObject. In iText, this concept is implemented under the name PdfTemplate. Such a PdfTemplate is a rectangular canvas with a fixed size that can be added to a page without being part of that page.
An example should clarify what that means. Please take a look at the WriteOnFirstPage example. In this example, we create a PdfTemplate like this:
PdfContentByte cb = writer.getDirectContent();
PdfTemplate message = cb.createTemplate(523, 50);
This message object refers to a Form XObject. It is a piece of content that is external to the page content.
We wrap the PdfTemplate inside an Image object. By doing so, we can add the Form XObject to the document just like any other object:
Image header = Image.getInstance(message);
document.add(header);
Now we can add as much data as we want:
for (int i = 0; i < 100; i++) {
document.add(new Paragraph("test"));
}
Adding 100 "test" lines will cause iText to create 3 pages. Once we're on page 3, we no longer have access to page 1, but we can still write content to the message object:
ColumnText ct = new ColumnText(message);
ct.setSimpleColumn(new Rectangle(0, 0, 523, 50));
ct.addElement(
new Paragraph(
String.format("There are %s pages in this document", writer.getPageNumber())));
ct.go();
If you check the resulting PDF write_on_first_page.pdf, you'll notice that the text we've added last is indeed on the first page.

PDF Annotations using JAVA

I've been trying to add annotations to an existing PDF using the iText API and the older Lowagie version. However, I need alternatives to the API since it does not seem to be able to do what is asked in our requirement.
The requirement is to put an Annotation into an existing PDF with the following details:
Type: plain text
Postion: x=0mm && y=0mm
Font: Arial
Text Colour: White
Text Content: "some text"
Using iText, I can put in an annotation but I need to approximate in pixels where in my A4 size page I should put it. The closest approximation is using
PdfReader reader = new PdfReader(headerFilePath.concat(xmlFileName));
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(headerFilePath.concat(xmlFileNameNew)));
PdfAnnotation annotation = PdfAnnotation.createText(stamper.getWriter(), new Rectangle(0, 842, 5, 842), "some text", "some text", true, null);
annotation.setColor(Color.WHITE);
stamper.addAnnotation(annotation, 1);
reader.close();
stamper.close();
This snippet places it at the top left corner but I'm not sure if it's 0mm,0mm. Also it is black and I cannot specify the font.
Any help on the matter is greatly appreciated. Thanks!
You may try rich text annotions as described here:
annotation.put(PdfName.RC, new PdfString( "<font size=\"whatever\">" +
some text+
"</font>" ) );
myStamper.setGenerateAppearances( false );
However I doubt that this is what you need. If you add an appearance then you'll get an rectangular icon in the upper left corner. If you then hover with you mouse over it you are able to read the annotation in form of a pop up. And even if you get a white font color of the annotation text you can't hide the rectangular icon of the annotation itself...
You want to have a white font color which indicates that you want to "transport" some hidden (white color on white background) information. Maybe in this case you may use the following mechanism ( which is normally used for adding watermarks etc.):
String text;
int pageNumber
PdfContentByte overContent = stamper.getOverContent(pageNumber);
overContent.beginText();
overContent.setFontAndSize(yourFont, yourFontSize);
//overContent.setGrayFill(...);
overContent.showTextAligned(PdfContentByte.ALIGN_CENTER, yourText + " Center", 150, 760, 0);
overContent.endText();
Update: To set an annotation to invisble add to your code:
annotation.setColor(Color.WHITE);
//set visibility to 0 (invisble). All values between 0...1 are possible
//so 0.5 means 50% opacity
annotation.put(PdfName.CA, new PdfNumber(0));

PDFBox change page sizes and save it again

First of all, sorry for my bad english.
I´m trying to remove the Header and Footer of a PDF page, it´s necessary to search at the page break for some words, but it´s impossible with the header and footer, so it´s necessary to crop it and then convert to text than it´s "possible" to search the words.
I´m doing it:
PDDocument pdDoc = PDDocument.load("document.pdf");
PDPage page = (PDPage) pdDoc.getDocumentCatalog().getAllPages().get(0);
PDRectangle rectangle = new PDRectangle();
rectangle.setUpperRightY(page.findCropBox().getUpperRightY() - 100);
rectangle.setLowerLeftY(page.findCropBox().getLowerLeftY());
rectangle.setUpperRightX(page.findCropBox().getUpperRightY());
rectangle.setLowerLeftX(page.findCropBox().getLowerLeftX());
page.setMediaBox(rectangle);
PDDocument document = new PDDocument();
document.addPage(page);
document.save("newDocument.pdf");
document.close();
But when I change it to HTML, it steal keeps the text that was hidden. Is there any way to save it withou the croped area and convert to html correctly?
Thanks.
Best regard´s.

Categories