Double space not being preserved in PDF - java

I'm using the lowagie.text.pdf library to create a PDF document from within my java application, however I have realised that strings with double spacing in them i.e. "DOUBLE SPACE" aren't preserved.Is this a .PDF limitation or have I overlooked something else ?
Font font = FontFactory.getFont(FontFactory.HELVETICA, 8);
float[] columnWidths = new float[columnCount];
PdfPCell headerCell = new PdfPCell(new Phrase(gridColumn.getCaption(), font));
PdfPTable table = new PdfPTable(columnWidths);
table.getDefaultCell().setBorderWidth(0.5f);
table.getDefaultCell().setBorderColor(Color.LIGHT_GRAY);
table.setHeaderRows(1);
for (PdfPCell headerCell : headerCells) {
table.addCell(headerCell);
}
String value = "DOUBLE SPACED";
table.addCell(new Phrase(value, font));
Document pdfDocument = new Document(PageSize.A4.rotate(), 0, 0, 0, 0);
ByteArrayOutputStream pdfStream = new ByteArrayOutputStream();
PdfWriter.getInstance(pdfDocument, pdfStream);
pdfDocument.addTitle(caption);
pdfDocument.open();
pdfDocument.add(table);
pdfDocument.close();'
Thanks.

I have realised that strings with double spacing in them i.e. "DOUBLE SPACE" aren't preserved
That depends on what you mean by being preserved.
I extended you sample a bit to print a single, a double, and a triple space
table.addCell(new Phrase("SINGLE SPACED", new Font(BaseFont.createFont(), 36)));
table.addCell(new Phrase("DOUBLE SPACED", new Font(BaseFont.createFont(), 36)));
table.addCell(new Phrase("TRIPLE SPACED", new Font(BaseFont.createFont(), 36)));
and furthermore updated the used classes to current iText 5.5.x variants.
If you look into the generated PDF internal instructions, you'll see
(SINGLE SPACED) Tj
...
(DOUBLE SPACED) Tj
...
(TRIPLE SPACED) Tj
Thus, iText does literally preserve the spaces.
The visual result:
As you see the gap is growing and growing. Thus, double and triple spaces are preserved in the visual representation of the PDF!
On the other hand, if you copy and paste using Adobe Reader, you get:
SINGLE SPACED
DOUBLE SPACED
TRIPLE SPACED
Thus, the current Adobe Reader does not copy&paste the space characters as they are inside the PDF but collapses multiple ones to a single one.
So:
Is this a .PDF limitation
This neither is a PDF limitation nor an iText limitation, it is a quirk of Adobe Reader (and some other PDF viewers, too).

Related

While parsing pdf with iText7 chars move on fixed interval (with Freeset font)

I'm trying to parse pdf that I have created with iText. In document I have two paragraphs:
"Имя" - ("name" from Russian) - font: Helvetica, size: 20.
"Фамилия" - ("surname" from Russian) - font: Freeset (I downloaded it here), size: 10.
When I finish parsing I get "Имя" properly encoded and "Ôàìèëèÿ" instead of "Фамилия". It is Unicode characters for "Фамилия" but moved 848 chars (10-based) left. (I mean that, for instance, instead of "Ф" (0x0424 in UTF-8) I get "Ô" (0x00d4) and difference between them is 848 (or 350 in hex))
I use this example to get text from pdf (but instead of filtering by font, I filter by equality to one of the Strings in the set ("Имя", "Фамилия")
I know that we are advised to store non-English charactes as sequence of Unicode symbols, but I'm creating pdf on the fly from incoming data so I can't manually retype it as separate Unicode symbols (if you know how to do it on the fly, please provide your approach).
Any ideas why this movement of character happen and how to avoid it are welcomed. Thank you in advance.
Here is the file I worked with.
Edit
I tried opening file in Acrobat Pro and everything is fine there. Acrobat also shows that all three fonts I've put in pdf are still in the document.
Here is the code I used to create pdf I'm processing:
private static void create() throws IOException {
PdfDocument pdf = new PdfDocument(new PdfReader(srcPdf), new PdfWriter(targetPdf));
PdfCanvas pdfCanvas = new PdfCanvas(pdf.getFirstPage());
PdfFont freeset = getPdfFont(freesetPath);
PdfFont helvetica = getPdfFont(helveticaPath);
PdfFont circe = getPdfFont(circePath);
pdfCanvas.beginText()
.setFontAndSize(helvetica, 15)
.setColor(Color.RED, true)
.moveText(50, 300)
.showText("Имя")
.setFontAndSize(freeset, 10)
.setColor(Color.GREEN, true)
.moveText(0, -30)
.showText("Фамилия")
.setFontAndSize(circe, 20)
.setColor(Color.BLUE, true)
.moveText(0, -30)
.showText("Должность")
.endText();
pdf.close();
}
private static PdfFont getPdfFont(String path) throws IOException {
InputStream fontInputStream = new FileInputStream(path);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[2048];
int a;
while((a = fontInputStream.read(buffer, 0, buffer.length)) != -1) {
baos.write(buffer, 0, a);
}
baos.flush();
return PdfFontFactory.createFont(baos.toByteArray(),
PdfEncodings.IDENTITY_H, true);
}
iText 7 appears to have an issue with embedding the font in question. I don't know whether it's a bug in the font or in iText, though.
The "FreeSet" font is indeed embedded in the OP's sample document with a wrong ToUnicode map
...
6 beginbfrange
<009e> <009e> <00d4> <00aa> <00aa> <00e0> <00b2> <00b2> <00e8> <00b5> <00b5> <00eb> <00b6> <00b6> <00ec> <00c9> <00c9> <00ff> endbfrange
...
which maps the glyphs used for "Фамилия" to 00d4, 00e0, 00e8, 00eb, 00ec, and 00ff.
This in turn explains why both iText and Adobe Reader extract unexpected text.
The issue can be reproduced like this:
PdfFont arial = PdfFontFactory.createFont(BYTES_OF_ARIAL_FONT, PdfEncodings.IDENTITY_H, true);
PdfFont freeSet = PdfFontFactory.createFont(BYTES_OF_FREESET_FONT, PdfEncodings.IDENTITY_H, true);
try ( OutputStream result = new FileOutputStream("cyrillicTextFreeSet.pdf");
PdfWriter writer = new PdfWriter(result);
PdfDocument pdfDocument = new PdfDocument(writer);
Document doc = new Document(pdfDocument) ) {
doc.add(new Paragraph("Фамилия").setFont(arial));
doc.add(new Paragraph("Фамилия").setFont(freeSet));
}
(CreateCyrillicText test testCreateTextWithFreeSet)
The result looks ok:
When extracting / copying&pasting, though:
The embedded Arial subset has a proper ToUnicode map, the text in Arial is extracted as "Фамилия".
The embedded FreeSet subset has an incorrect ToUnicode map, the text in FreeSet is extracted as "Ôàìèëèÿ".
(Tested with the current iText 7.1.1-SNAPSHOT)
Apparently iText 7 does understand the FreeSet font program well enough to select the needed subset and reference the correct glyphs from the content but it has problems building an appropriate ToUnicode map. This is not a general problem, though, as the parallel test with Arial shows.

iText- Appending arabic text in pdf table cell phrase at different positions in a page

I want to make a pdf report with English and Arabic texts. I have many tables/phrases across the page. I want to display Arabic text also along with English. I have seen the Arabic example in iText doxument also, using ColumnText. I couldn't help myself with that. My doubt is how to set canvas.setSimpleColumn(36, 750, 559, 780), the arguments in this method for tables/phrases at different positions. I have referred below questions also.Still I have issues.
Writing Arabic in pdf using itext,
http://developers.itextpdf.com/examples/font-examples/language-specific-exampleshe
Below is my code..
private static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
private static final String FONT = "resources/fonts/ARIALUNI.TTF";
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("test.pdf"));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(3);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value", inlineFont);
phrase.add(chunk);
// I want to add Arabic text also here..but direction is not //correct.also coming as single alphabets
p.add(new Chunk(ARABIC, f));
PdfPCell cell1 = new PdfPCell(phrase);
cell1.setFixedHeight(50f);
table.addCell(cell1);
document.add(table);
document.close();
Your code is kind of sloppy.
For example:
you define a PdfPTable with 3 columns, but you only add a single cell. That table will never be rendered.
you define a Phrase with name phrase, but later in your code you use p.add(...). There is no variable with name p in your code.
...
This lack of respect for the StackOverflow reader can result in not getting an answer, because you are expecting the reader not only to fix the actual problem –not being able to use English and Arabic text in a single PdfPCell—, but also to fix all the other (avoidable) errors in your code.
This is a working example:
public static final String FONT = "resources/fonts/NotoNaskhArabic-Regular.ttf";
public static final String ARABIC = "\u0627\u0644\u0633\u0639\u0631 \u0627\u0644\u0627\u062c\u0645\u0627\u0644\u064a";
public void createPdf(String dest) throws IOException, DocumentException {
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream(dest));
document.open();
Font f = FontFactory.getFont(FONT, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
PdfPTable table = new PdfPTable(1);
Phrase phrase = new Phrase();
Chunk chunk = new Chunk("test value");
phrase.add(chunk);
phrase.add(new Chunk(ARABIC, f));
PdfPCell cell = new PdfPCell(phrase);
cell.setUseDescender(true);
cell.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
table.addCell(cell);
document.add(table);
document.close();
}
The result looks like this:
As you can see, both the English and the Arabic text can be read fine. You may be surprised by the alignment and the order of the text. As we are working in the Right-to-Left writing system, left and right are switched. By default, text is left aligned, but as soon as we introduce the RTL run direction, this changes to right aligned.
In your code, you add the English text first, followed by the Arabic text. Text in Arabic is read from right to left. That's why you see the English text to the right, and why the Arabic text is added to the left of the English text.
All of this has been improved in iText 7. iText 7 has an extra pdfCalligraph module that takes care of other writing systems in a transparent way.

iText 5.5.3 ColumnText doesn't wrap text correctly due to the way font size is measured

Question Setup
I am trying to accurately determine the width of a font (Ubuntu Italic), though iText appears to ignore the portion of the last glyph after the horizontal advance as shown in the image below.
The code I used to generate this example is as follows:
Document document = new Document(PageSize.LETTER);
FileOutputStream out = new FileOutputStream(new File("test.pdf"));
PdfWriter writer = PdfWriter.getInstance(document, out);
document.open();
String text = "ff";
Chunk chunk = new Chunk(text, FontFactory.getFont("Ubuntu-Italic.ttf", 72)
Phrase phrase = new Phrase(chunk);
float width = ColumnText.getWidth(phrase);
System.out.println(width + ", " + chunk.getWidthPoint());
PdfPTable table = new PdfPTable(1);
table.setWidthPercentage(100);
table.setTotalWidth(width);
PdfPCell cell = table.getDefaultCell();
cell.setPadding(0);
cell.setUseDescender(true);
cell.setUseAscender(true);
table.addCell(phrase);
float height = table.calculateHeights();
PdfContentByte canvas = writer.getDirectContent();
ColumnText columnText = new ColumnText(canvas);
columnText.setSimpleColumn(36, 756 - height, 36 + width, 756);
columnText.addElement(table);
columnText.go();
document.close();
out.close();
As demonstrated in the code I tried both ColumnText.getWidth(phrase) as well as chunk.getWidthPoint() which both return the same value, with a little bit of floating point difference.
Question
The code I wrote above simulates a situation in iText where text doesn't wrap correctly to the next line. The problem I'm having is that ColumnText, in the code I am using, is cropped. The issue is that because of the way iText measures text, the ColumnText thinks there is enough room for the f at the right edge when in fact there really isn't, so in my situation it is getting cut off. Is there a way to force ColumnText to measure the width of the font differently so that this doesn't happen?
Your observation
The right border that cuts through the f
corresponds with the definition of the Ubuntu-Italic letter f:
The width you get is the width of the letter on the base line, the horizontal advance, it is not the distance from the left-most to the right-most x coordinate.
I ended up solving this by adding padding on the left and right side of the PdfPCell and by increasing the width of the ColumnText equal to the padding I added. That way the text flows the same way and allows for the fact that iText only takes into account the horizontal advance values of the font.

How can I preserve accessibility and add PDF/A 2-a conformance with Itext

I have a PDF document which is accessible (tagged), I want to add it PDF/A 2-a compliance with Itext 5.4.5.
I can open a PDFAWriter with PDF/A 2-b compliance (note the b), import each of the pages, copy them. The output document complies with PDF/A 2-b compliance (I checked with two validators), but then I'm losing the accessibility (the structure tags).
I then tried to open a PDFAWriter with PDF/A 2-a compliance (note the a), use writer.setTagged(), import each of the pages and copy them like this:
Document document = new Document();
PdfAWriter writer = PdfAWriter.getInstance(document,
new FileOutputStream(output), PdfAConformanceLevel.PDF_A_2A);
PdfReader pdfReader = new PdfReader(input);
writer.setTagged();
writer.setLanguage("en");
writer.setLinearPageMode();
writer.createXmpMetadata();
document.open();
ICC_Profile icc = ICC_Profile.getInstance(new FileInputStream(PROFILE));
writer.setOutputIntents("Custom", "", "http://www.color.org",
"sRGB IEC61966-2.1", icc);
PdfContentByte cb = writer.getDirectContent();
int n1 = pdfReader.getNumberOfPages();
for (int i = 1; i <= n1; i++) {
document.newPage();
PdfImportedPage page = writer.getImportedPage(pdfReader, i);
cb.addTemplate(page, 0, 0);
}
document.close();
But this generates this error
Exception in thread "main" com.itextpdf.text.pdf.PdfAConformanceException: Alt entry should specify alternate description for /Figure element.
at com.itextpdf.text.pdf.internal.PdfA2Checker.checkStructElem(PdfA2Checker.java:822)
at com.itextpdf.text.pdf.internal.PdfAChecker.checkPdfAConformance(PdfAChecker.java:222)
at com.itextpdf.text.pdf.internal.PdfAConformanceImp.checkPdfIsoConformance(PdfAConformanceImp.java:70)
Any workaround? Solution to this problem?
(I know that PDFCopy would preserve tagging, but then how do I specify the PDF/A 2-a bit...?)
I was easily able to do this in 3‑Heights, but I would like an IText Solution to this problem.
(Personally, I am a bit disappointed by the interface offered. For instance, PDFCopy extends PdfWriter, but not PdfAWriter).

Text Positioning with IText for Java

I need to position different text before I generate the pdf using IText. I've been thinking of using Chunks, but I don't know how to position them separately. I also tried using PdfContentByte but it doesn't generate any PDF File.
Why don't you use tables combined with Chunks for your layout. ex:
PdfPTable headerTable = new PdfPTable(2);
float[] headerTableWidths = { 80f, 20f };
headerTable.setWidthPercentage(100f);
headerTable.setWidths(headerTableWidths);
headerTable.getDefaultCell().setBorderWidth(0);
headerTable.getDefaultCell().setPadding(2);
headerTable.getDefaultCell().setBorderColor(BaseColor.BLACK);
headerTable.getDefaultCell().setFixedHeight(90f);
PdfPCell infoCell = new PdfPCell();
infoCell.setHorizontalAlignment(Element.ALIGN_CENTER);
infoCell.setVerticalAlignment(Element.ALIGN_TOP);
infoCell.addElement("test");
infoCell.addElement("text");
table.addCell(infoCell);

Categories