Can I generate superscipt in iText using input only? - java

I'm involved in a project where we generate pdf-files using iText. I want to generate superscipt text and I know that methods exists for converting chunks of text to superscript however, I am in a situation where it is desirable to decide based on the input whether or not a piece of text should be superscript or not. An optimal solution would be something like: 2<superscript>nd</superscript> to generate the text "2nd".
Is this, or anything like this possible?
It is worth noting that I do not write code directly against iText, there are layers between. This is why I need to decide what is superscript and not based on the input.

If you can't modify the code that's using iText, then no, there's no way. iText doesn't parse the text you put in a Chunk.
In order to get a similar behaviour to what you want to achieve, you would need to do HTML to PDF conversion, with iText's companion XMLWorker for example.

When you find a <superscript> tag you can simply create a Chunk with correct properties:
public Chunk addSuperscript(String text) {
float leading = determineCurrentLeading();
Font f = determineCurrentFont();
Font supFont = new Font(f);
supFont.setSize(f.getSize() / 2f);
Chunk c = new Chunk(text, supFont);
c.setTextRise(leading / 2f);
return c;
}
Then the chunk will be added to a Paragraph, a PdfPCell,... Depending on your implementation, you can determine the current leading and font used.
Hope this will help you.

Related

Do I have to set the offset or cursor on everyline in PDFBox?

I'm trying to learn PDFBox v2.0 but there seems to be few useful examples and tutorials to get started.
I want to create a simple PDF with text only (no images and fancy stuff) and it looks as the following:
1- Introduction
This is an intro to the document
2- Data
2.1- DataPart1
This is some text here....
2.2- DataPart2
This is another text right here!
3- More Information
Some important informational text here...
I have written the following code:
PDPage firstPage = getNewPage();
PDPageContentStream firstContentStream = new PDPageContentStream(document, firstPage);
firstContentStream.setFont(HEADING_FONT, HEADING_FONT_SIZE);
firstContentStream.setNonStrokingColor(Color.BLUE);
firstContentStream.beginText();
firstContentStream.newLineAtOffset(MARGIN, firstPage.getMediaBox().getHeight() - MARGIN);
firstContentStream.showText("1- Introduction");
firstContentStream.endText();
firstContentStream.setFont(TEXT_FONT, TEXT_FONT_SIZE);
firstContentStream.setNonStrokingColor(Color.BLACK);
firstContentStream.beginText();
firstContentStream.showText("This document lists all the impacts that have been observed during the QA validation of the version v3.1 build");
firstContentStream.endText();
firstContentStream.close();
document.addPage(firstPage);
The text "This is the document intro" appears at the end of the page!
My question: do I have to set the cursor or the offset on every line???!
This seems like hard work. Does this mean I have to compute the string width and compare it to the page width and jump to new lines whenever the text fills the entire line?
I really don't understand.
And also how do I make it generate new pages whenever the current page is full?
Could you please give one simple example? Thanks in advance
Can't I just do something like:
contentStream.addTextOnNewLine("Some text here...");
First of all please be aware that PDFBox has a very low-level text drawing API only, i.e. most methods you call directly correspond to a single instruction in the PDF content stream. In particular PDFBox does not come with a (publicly accessible) layout'ing functionality but you essentially have to do the layout'ing in your code. This is both a freedom (you are not bound by the limits of an existing machinery) and a burden (you have to do everything yourself).
In your case that means:
Does this mean I have to compute the string width and compare it to the page width and jump to new lines whenever the text fills the entire line?
Yes. Confer the answers #Tilman linked for you
How to generate multiple lines in PDF using Apache pdfbox
Creating a new PDF document using PDFBOX API
and search some more on stackoverflow, there are further answer on different variations of that theme, e.g. for justified text blocks here.
But there is a detail in your code which in your case makes things possibly harder than necessary:
My question: do I have to set the cursor or the offset on every line???!
While you do have to re-position for every new line, it suffices to re-position by offset from the start of the previous line if you still are in the same text object, i.e. if you have not called endText() and beginText() in-between. E.g.:
PDPage firstPage = getNewPage();
PDPageContentStream firstContentStream = new PDPageContentStream(document, firstPage);
firstContentStream.setFont(HEADING_FONT, HEADING_FONT_SIZE);
firstContentStream.setNonStrokingColor(Color.BLUE);
firstContentStream.beginText();
firstContentStream.newLineAtOffset(MARGIN, firstPage.getMediaBox().getHeight() - MARGIN);
firstContentStream.showText("1- Introduction");
// removed firstContentStream.endText();
firstContentStream.setFont(TEXT_FONT, TEXT_FONT_SIZE);
firstContentStream.setNonStrokingColor(Color.BLACK);
// removed firstContentStream.beginText();
firstContentStream.newLineAtOffset(0, -TEXT_FONT_SIZE); // reposition, tightly packed text lines
firstContentStream.showText("This document lists all the impacts that have been observed during the QA validation of the version v3.1 build");
firstContentStream.endText();
firstContentStream.close();
document.addPage(firstPage);
newLineAtOffset(0, -TEXT_FONT_SIZE) sets the text insertion position at the same x coordinate as the start of the previous line and TEXT_FONT_SIZE units lower.
(Using TEXT_FONT_SIZE here will result in pretty tightly set text lines; you may want to use a higher value, e.g. 1.4 times the font size.)
Considering your description you might want to use a positive x offset value for the line after a heading, though.
The text "This is the document intro" appears at the end of the page!
This happens because the PDF instruction generated by firstContentStream.beginText() resets the text insertion position to the origin (0, 0) which by default is the lower left corner.
And also how do I make it generate new pages whenever the current page is full?
Well, it does not happen automatically, you explicitly have to do that when you want to switch to a new page. And you do that more or less exactly like you created your first page.

how read pdf using itext and java and get table cell height

First I have created a pdf using itext and java and put a table and tableCell
PdfPTable table = new PdfPTable(2);
table.setWidths(new int[]{1, 2});
PdfPCell cell;
table.addCell("Address:");
cell = new PdfPCell(new Phrase(""));
cell.setFixedHeight(60);
table.addCell(cell);
I have another Program Which read this pdf File
PdfReader reader = new PdfReader("path_of_previously_created_pdf");
Now i want to get TableCell cell and want to Change cell height cell.setFixedHeight(new_Fixed_Height);
It is possible... if Yes .
How??
Thanx in advance
If your PDF contains just that simple 1x2 table, it of course would be possible to implement something that gives you the PDF with a cell hight of you choice.
But I assume it eventually is meant to contain more. Already the code you provided via your google drive included more (more table cells plus form elements), and that code, too, does look unfinished concerning the PDF construction. Thus,...
The direct answer
It is not possible.
First of all the table and cell objects you have while creating the PDF are not present as such in the resulting file, they merely are drawn as a number of lines and some text (or whatever you put into the cells).
Thus, you cannot even retrieve the cells you want to change, let alone change it.
The twisted answer
You could, of course, try and parse the page content stream to find the commands for drawing lines, find those ones among them which were drawn for the cell you are interested in, and try to derive the original cell dimension attributes from the line coordinates. Afterwards you can attempt to move everything below the cell down to create the extra space you want.
Depending on the information you have (Do you know the approximate position of the cell? If not, do you at least know some unique content of it?) reading the current cell height will include some guesswork and much coding because unfortunately the iText parser framework does not yet support parsing path operations.
Essentially you have to enhance the classes in the PDF parser package to also process and emit events for PDF path operators (if you know your way around in iText and the PDF specification that should not take more than a week or two) and create an appropriate event listener to find the lines surrounding the cell position you already know (not more than one day of work). Some iText code analysis will show how the fixed cell height and the distance of the surrounding lines relate.
Most likely, though, this is the smaller part of your work. The bigger part is actually manipulating the page content:
If you are lucky, all your page content is located in a single content stream. In that case you merely have to analyse all the page content again but this time to actually change it. The easiest way would be to enhance the classes in the parser package once again (because they already do much of the necessary math and book-keeping) to signal every command from the content stream with normalized coordinates (this might take a week or two). Based on this information signaled to you built an all new content stream in which you leave everything above your cell, move down everything below, and stretch everything crossing the line on which the bottom border of your cell lies (another week maybe).
If you are less lucky you have to fight with multiple included form xobjects crossing the line. As those xobjects may be used from other streams also, you cannot change them but have to either change a copy or include the xobject content in your newly created stream.
Then what about images crossing the line? or interesting patterns? In that case stretching the cell will utterly distort everything.
And then there are annotations, e.g. your form fields. You need to shift and stretch them, too.
Thus, while this approach is possible to follow, please be aware that (depending on how generic the solution has to become) its implementation will take someone knowing iText and PDF some months.
An alternative approach
You say in a comment
I am working on Pdf Form.I have created itext form using TextField(MULTILINE TEXT) once. After read this pdf and fill up the form but when the content increases it shows scroll Bar and content hide. My problem is Once i print the pdf it did't print hide content.
Why don't you simply for each set of data build an individual PDF with all the cells big enough for the form contents of the respective data set and copy the field values into this new PDF. This is a fairly simple approach, yet flexible enough to not waste too much space but at the same time not hide content.

How can I truncate text within a bounding box?

I am writing content to a PdfContentByte object directly using PdfContentByte.showTextAligned, I'd like to know how I can stop the text overflowing a given region when writing.
If possible it would be great if iText could also place an ellipsis character where the text does not fit.
I can't find any method on ColumnText that will help either. I do not wish the content to wrap when writing.
Use this:
int status = ColumnText.START_COLUMN;
ColumnText ct = new ColumnText(cb);
ct.setSimpleColumn(rectangle);
status = ct.go();
Make sure that you define rectangle in a way so that only one line fits, use ColumnText.hasMoreText(status) to find out if you need to add an ellipsis character.

How can I change the font size of annotations in iTextPdf?

I'm programming a system that generates a PDF file containing user-submitted comments. I want to reduce the size of the comments, i.e. the font size of each annotation. I'm using the Java version of iTextPdf.
My code:
PdfAnnotation annotation = PdfAnnotation.createText(pdfStamper.getWriter(),
new Rectangle(x, valor, x+100f, valor+100f), "authors",
comentario.getComentario(), true, "Comment");
annotation.setColor(Color.ORANGE);
Can I reduce the font size like this?
annotation.setFontSize("2px");
I've never used iTextPdf, but its API says that the PdfAnnotation class has no setFontSize() method.
However, it does contain a setDefaultAppearanceString() method, which takes a PdfContentByte as an argument. And PdfContentByte has a setFontAndSize() method.
From what I can see in the API, I would say: no, it's not possible.
Maybe you can use the setAppearance method. On the iText website is an example which creates an PdfAppearance object that owns a setFontAndSize method.
You have two options.
You can set the default font & size, as Lord Torgamus described.
You can use the Rich Text value of the annotation, which you have to directly access using the methods PdfAnnotation inherited from PdfDictionary:
annotation.put(PdfName.RC, new PdfString( "<font size=\"whatever\">" +
comentario.getComentario() +
"</font>" ) );
Note that iText cannot render Rich Text field appearances (at least not yet), so you have to turn off appearance generation so the resulting PDF will ask the viewer to do it:
myStamper.setGenerateAppearances( false );
Yep, have to use a stamper.
It's probably just easier to go with Torg's suggestion... but mind the rabbit. It doesn't play well with others.
PS: Splash's setAppearance() trick would work, but then you'd have to draw the whole thing yourself... background, borders, and text layout (which is much easier thanks to ColumnText).

How to reduce size of RTF with embedded images?

We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.
Typically, the processing time for this is really quick.
However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.
For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).
Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.
So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.
I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.
Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
Does anyone know what type of binary encoding Word/RTF uses?
Thanks in advance.
Here is the best solution
http://support.microsoft.com/kb/224663
Excerpt:
SYMPTOMS
When you save a Microsoft Word document that contains an EMF,
PNG, GIF, or JPEG graphic as a different file format (for example,
Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the
document may dramatically increase.
For example, a Microsoft Word 2000 document that contains a JPEG
graphic that is saved as a Word 2000 document may have a file size of
45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95
(.doc) or as Rich Text Format (.rtf), the file size may grow to
1,289,728 bytes (1.22MB).
CAUSE
This functionality is by design in Microsoft Word. If an
EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document,
when the document is saved, two copies of the graphic are saved in the
document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG
format and are also converted to WMF (Windows Metafile) format.
RESOLUTION
Warning If you use
Registry Editor incorrectly, you may cause serious problems that may
require you to reinstall your operating system. Microsoft cannot
guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own risk.
To prevent Word from saving two copies of the graphic in the document,
and to reduce the file size of the document, add the
ExportPictureWithMetafile=0 string value to the Microsoft Windows
registry.
An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.
EDIT
Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.
We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.
So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here
.
Now, my code looks something like this:
\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your image as an array of bytes in hexadecimal }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your other image }\intbl\cell\row}
if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";
Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!!
I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.
Hope it helps.
Cheers mate
Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.
Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).
Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)
Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).
The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).
Thanks for your suggestions guys.
Yes, by removing the redundant characters. And to do this you must insert them back into your stream.
For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.
-Best of luck.

Categories