I know there are many suggestions how to resolve problems with edition existing PDF, but among all of those, I couldn't find a solution for my problem.
I need to add information about file acceptance ("Document accepted by Tom Smith, 2020-01-01" - possible multiple acceptations) to the last page of the PDF. I need to determine if page is filled or is there enough space for my text.
I wanted to find position (y) of the last element on the last page of the pdf to check it against page size. If the page is full I'm going to add a new page and then add new text.
I have no idea how to resolve this. I tried looking for answers with iText and PDFBOX, but there are no satisfying resolutions.
Raster Image based approach:
Render the last page of the pdf to a bitmap image with any library you are comfortable with (Ghostscript?). 72 dpi should be enough for your purpose.
Then you can use any image processing library like openCV and check rectangular areas starting from the bottom up, if pixel exist. openCV is very fast with the CountNonZero() function.
You can also find any large white zone that is anywhere in the Image, not just at the bottom. This link could be your starting point.
https://answers.opencv.org/question/72939/how-to-find-biggest-white-zone-in-an-scanned-image/
I am writing a program in Java using iText 5. A big part of the program involves printing things out to PDF, and a big part of that is a human readable dictionary.
Given a chapter that contains phrases of all the words in my dictionary, is there any way to hand this to iText and have it align this in 2 column per page format? Essentially, I'm looking for pages that read left half top to left half bottom->right half top to right half bottom->next page. Anyone familiar with oldschool paper style dictionaries will be familiar with what I'm talking about here, and that's the look that I'm going for.
Sorry that I don't have any example code to go off of here, but I'm kind of stymied, as I haven't been able to find anything that does what I want.
I'm currently trying to generate PDF with PDFBox for some manual cover and I was wondering if it was possible to take a precise zone of text in my PDF and move it (to the left) depending on my manuel thickness (which will be determined by the number of pages my manual will have)
I manage to create my PDF just fine, but I did not find a way to get only a block of text.
Is it possible to do so with PDFBox?
Note : I tried to search on the web and on other questions, but none of them were useful.
Thanks
Wrap saveGraphicsState() and restoreGraphicsState() around that block. Within that, use moveTextPositionByAmount() (after beginText() !) or setTextTranslation() or (more general) concatenate2CTM(1, 0, 0, 1, tx, ty) to move the position.
Like i said in a comment, we decided to create a new PDF with the informations needed instead of trying to edit an existing one.
We tried to edit the PDF but in vein and when we decided to start from scratch, it was way more easier to do so.
Thanks to the people who tried to help me !
I'm looking to automate filling out a pdf contract by putting variable text in predefined locations(such as a date or dollar value). I've been trying to wrap my head around iText as a solution but am having trouble actually stamping text onto a pdf. I would really appreciate an example snipped that simply stamps a piece of text on a specific (x,y) coordinate of a pdf file.
If there is a better solution than iText to my problem I would also love to hear other possible solutions.
We have some code which produces an RTF document from a RTF template. It is basically doing string search and replaces of special tags within the RTF file. This is accessible via a web page.
Typically, the processing time for this is really quick.
However, we need to embed an image within a template. We've been embedding these as JPEG images using Word's "Insert/Picture/From File..." functionality. But we've found that the resultant RTF file size is massively dependant upon the image.
For example, I've inserted a 20k JPEG logo (which is basically a solid background with some text). The RTF file increased in size from around 390k (without the image) to 510k (with the image).
Then we inserted a JPEG containing a screenshot, i.e. the image contains text, multiple colours, etc. The JPEG is around 150k. Using this image, the RTF file increased in size from 390k to 3.5MB.
So the encoding that Word uses for storing images into an RTF doesn't perform linearly. I'm guessing it is dependant upon what is in the JPEG image.
I need to keep the size of the RTF templates to a minimum to try and keep our file processing times to a minimum.
Does anyone have any ideas on how to minimize the size of the RTF files with embedded images?
Is there any way of controlling the encoding that Word uses? I can't see any options anywhere.
Does anyone know what type of binary encoding Word/RTF uses?
Thanks in advance.
Here is the best solution
http://support.microsoft.com/kb/224663
Excerpt:
SYMPTOMS
When you save a Microsoft Word document that contains an EMF,
PNG, GIF, or JPEG graphic as a different file format (for example,
Word 6.0/95 (.doc) or Rich Text Format (.rtf)), the file size of the
document may dramatically increase.
For example, a Microsoft Word 2000 document that contains a JPEG
graphic that is saved as a Word 2000 document may have a file size of
45,568 bytes (44.5KB). However, when you save this file as Word 6.0/95
(.doc) or as Rich Text Format (.rtf), the file size may grow to
1,289,728 bytes (1.22MB).
CAUSE
This functionality is by design in Microsoft Word. If an
EMF, a PNG, a GIF, or a JPEG graphic is inserted into a Word document,
when the document is saved, two copies of the graphic are saved in the
document. Graphics are saved in the applicable EMF, PNG, GIF, or JPEG
format and are also converted to WMF (Windows Metafile) format.
RESOLUTION
Warning If you use
Registry Editor incorrectly, you may cause serious problems that may
require you to reinstall your operating system. Microsoft cannot
guarantee that you can solve problems that result from using Registry
Editor incorrectly. Use Registry Editor at your own risk.
To prevent Word from saving two copies of the graphic in the document,
and to reduce the file size of the document, add the
ExportPictureWithMetafile=0 string value to the Microsoft Windows
registry.
An image in an RTF file gets stored as a WMF, uncompressed. On mac, it it would be macpict. Your best bet to keep the file size down is to link the image to the document rather than insert a copy in the document. The trade-off is that you have to keep the files together.
EDIT
Is compressing the RTF an option? Using zip/rar, you'll get your file size back, but you'll have to uncompress, first obviously. There are supposed to be tools that will do rtf compression, but I have never used them.
We have done a similar project over at work. Only we're not using that "Insert/Picture/From File..." functionality. Our template has a tag named [photos], as I presume your own does also. When we process the document we replace the tag with the RTF codes needed to display images. We're putting them within a table and we're displaying two images on each row, plus a row on top for the title.
So, you might place a tag [photos] in your template. Then you replace the tag with the RTF Codes. You can find some good references to these codes on the web. For eg. here
.
Now, my code looks something like this:
\par {\rtf1\ansi\deff0{\trowd\cellx8810 {title}\intbl\qc\cell\row}{\trowd\cellx4405\cellx8810{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your image as an array of bytes in hexadecimal }\intbl\cell{\pict\jpegblip\picwgoal4000\pichgoal3000\piccropl-50\piccropr-50\piccropt-50\piccropb-50\hex
Your other image }\intbl\cell\row}
if you get your image into a byte array, you may use BitConverter.ToString(array) to get your hex code. only you'll need to replace dashes "-" by "";
Our files will take up less than 1/10th of the space a "normal" RTF will. If we open the doc's code with an editor such as Notepad++, we can see the RTF codes, but if we open the document and save it as RTF (changing its name), it'll go from 1.5Mb to 50Mb!!
I'm guessing DaveParillo's reply justifies it: I'm only writing each image once.
Hope it helps.
Cheers mate
Initially, keep in mind that each byte is stored using 2 characters (two bytes), this means that the increments at least is the double size of original picture.
Other things that you need is that Word and Word Pad insert different (flavor or format) of the same image plus other fields (that RTF can to be displayed without them).
Here are some scripts used to insert images in RTF (https://joseluisbz.wordpress.com/2011/06/22/script-de-clases-rtf-para-jsp-y-php/), and one example of use (https://joseluisbz.wordpress.com/2011/07/16/subiendo-imagenes-png-y-jpg-y-archivos-a-mysql-con-php-y-jsp-y-mostrarlos-en-rtf-usando-clases/)
Now, maybe you will need replace the original Image with another (http://joseluisbz.wordpress.com/2013/07/26/exploring-a-wmf-file-0x000900/).
The Swartbees answer worked perfectly for me. I first reduced the image quality to "0" using G.I.M.P. Save as jpeg functionality. After following the microsoft solution suggested by Swartbees above I reinserted the picture into the file and the size increase was negligible 229k to 279k (as opposed to 29000kb).
Thanks for your suggestions guys.
Yes, by removing the redundant characters. And to do this you must insert them back into your stream.
For instance if you have over twenty f characters in one line, then you can replace with f[20] in your stream. It is a start.
-Best of luck.