I have made a software that generate a pdf as the part of its function, I am using iTextPDF Java library to generate PDF. For a demo version of my software, I added text watermarking (like "demo software") by use of following code
PdfContentByte under = writer.getDirectContentUnder();
BaseFont baseFont = BaseFont.createFont(BaseFont.HELVETICA, BaseFont.WINANSI, BaseFont.EMBEDDED);
under.beginText();
under.setColorFill(BaseColor.RED);
under.setFontAndSize(baseFont, 25);
under.showTextAligned(PdfContentByte.ALIGN_CENTER," demo software",250, 470,55);
under.endText();
After it I converted it to .docx format using PDF to Word converter and the resultant docx file does not contain the watermark also the contents are easily editable so as a result the sole purpose of giving demo software is vanished.
How can I achieve permanent watermarking so that pdf to word converter wont be able to remove it.
One idea come to my mind is that instead of putting the text in the pdf there should be a way of converting all the text of a page first into an image then making the pdf comprising of those images. But I am unsure on how to achieve this using iTextPdf.
You can encrypt your PDF so that it cannot be modified without an owner password, after you have generated your PDF, create a PDFStamper with your PDF as input
and encrypt the pdf like the following:
final PdfReader reader = new PdfReader(your_input_stream);
final PdfStamper stamper = new PdfStamper(reader, your_output_stream);
stamper.setEncryption(PdfWriter.ENCRYPTION_AES_128 | PdfWriter.DO_NOT_ENCRYPT_METADATA,
"your_user_password", "your_owner_password", PdfWriter.ALLOW_PRINTING);
stamper.close();
As a side note, i would recommend not using a hardcoded owner password; since you have no need for the owner password after the file has been generated, I would suggest making it a SHA hash of a random string of say 20 alphanumeric characters.
Related
PDF2Dom (based on the PDFBox library) is capable of converting PDFs to HTML format preserving such characteristics like font size, boldness etc. Example of this conversation is shown below:
private void generateHTMLFromPDF(String filename) {
PDDocument pdf = PDDocument.load(new File(filename));
Writer output = new PrintWriter("src/output/pdf.html", "utf-8");
new PDFDomTree().writeText(pdf, output);
output.close();}
I'm trying to parse an existing PDF and extract these characteristics on a line for line basis and I wonder if there are any existing methods within the PDF2Dom/PDFBox parse these right from the PDF?
Another method would be to just use the HTML output and proceed from there but it seems like an unnecessary detour.
I'm using PDFBox 2.0.8 for pdf content extraction, converting it to JSON and then building a new document from created JSON (to clean possible vulnerabilities). I've extended PDFTextStripper class for getting font info:
PDFont font = textPosition.getFont() // it is embedded font
Now I'm trying to write just the same extracted character with its font to new pdf document:
contentStream.setFont(font, 16);
contentStream.showText(text);
and I'm getting java.lang.IllegalArgumentException: No glyph for U+004A in font HLOXAY+Birka-SemiBoldItalic exception on the second line.
The text I want to write is "John Whitington" from the third page of a "PDF Explained" book.
I've already read that it is because of current font doesn't have a Unicode mapping. But as I understand if this text is displayed in all readers there should be a way for copying it to another pdf.
I just want to full copy text and fonts info between documents.
Sorry if this duplicates any question here, but after a few days of searching, I still can't find an acceptable solution. Thanks in advance for any help.
In my android app I fill the formfields from a pdffile, using itextg like this:
PdfReader reader = new PdfReader(this.templateFile);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream(this.targetFile));
AcroFields form = stamper.getAcroFields();
for (String key : values.keySet()) {
form.setField(key, values.get(key));
}
stamper.setFormFlattening(true);
stamper.close();
I can see, that the value of the formfields are actually set, when debugging and inspecting the stamper. But as soon I open the targetFile all of my fields are empty.
If I do not flatten my form, the values remain in the fields, which makes me believe the value would also be there in the flattend pdf but simply not displayed.
Btw, using FormFiller form the itext demos (http://itextpdf.com/itext-android-demos) the same pdf works really fine!
This could be caused by different things.
Not the correct iTextG version
See Appearance issues with pdf interactive forms using iText where you'll find this answer:
This seems to be a bug on some versions of iText. I had the same problem with iTextSharp version 5.5.5 and it was solved after I upgraded to version 5.5.9.
The form doesn't know it has to generate the appearances
See Editable .pdf fields disappear (but visible on field focus) after save with evince where the problem is solved by changing the appearance setting:
form.put(PdfName.NEEDAPPEARANCES, PdfBoolean.PDFTRUE);
Or see iText 5.5 fails to fill form where iText is instructed to create the appearances:
af.setGenerateAppearances(true);
I would start with af.setGenerateAppearances(true); first.
I am doing some "pro bono" development for a food pantry near where I live. They are inundated with forms and paperwork, and I would like to develop a system that simply reads data from their MySQL server (which I set up for them on a previous project) and feeds data into PDF versions of all the forms they are required to fill out. This will help them out enormously and save them a lot of time, as well as get rid of a lot of human errors that are made when filling out these forms.
Not knowing anything about the internals of PDF files, I can foresee two avenues here:
Harder Way: It is possible to scan a paper document, turn it into a PDF, and then have software that "fills out" the PDF simply by saying "add text except blah to the following (x,y) coordinates..."; or
Easier Way: PDF specification already allows for the construct of "fields" that can be filled out; this way I just write code that says "add text excerpt blah to the field called *address_value*...", etc.
So my first question is: which of the two avenues am I facing? Does PDF have a concept of "fields" or do I need to "fill out" these documents by telling the PDF library the pixel coordinates of where to place data?
Second, I obviously need an open source (and Java) library to do this. iText seems to be a good start but I've heard it can be difficult to work with. Can anyone lend some ideas or general recommendations here? Thanks in advance!
You can easily merge data into PDF's fields using the FDF(Form Data Format) technology.
Adobe provides a library to do that : Acrobat Forms Data Format (FDF) Toolkit
Also Apache PDFBox can be used to do that.
Please take a look at the chapter about interactive forms in the free ebook The Best iText Questions on StackOverflow. It bundles the answers to questions such as:
How to fill out a pdf file programatically?
How can I flatten a XFA PDF Form using iTextSharp?
Checking off pdf checkbox with itextsharp
How to continue field output on a second page?
finding out required fields to fill in pdf file
and so on...
Or you can watch this video where I explain how to use forms for reporting step by step.
See for instance:
public void manipulatePdf(String src, String dest) throws DocumentException, IOException {
PdfReader reader = new PdfReader(src);
PdfStamper stamper = new PdfStamper(reader,
new FileOutputStream(dest));
AcroFields fields = stamper.getAcroFields();
fields.setField("name", "CALIFORNIA");
fields.setField("abbr", "CA");
fields.setField("capital", "Sacramento");
fields.setField("city", "Los Angeles");
fields.setField("population", "36,961,664");
fields.setField("surface", "163,707");
fields.setField("timezone1", "PT (UTC-8)");
fields.setField("timezone2", "-");
fields.setField("dst", "YES");
stamper.setFormFlattening(true);
stamper.close();
reader.close();
}
public void fillPDF()
{
try {
PDDocument pDDocument = PDDocument.load(new File("D:/pdf/pdfform.pdf")); // pdfform.pdf is input file
PDAcroForm pDAcroForm = pDDocument.getDocumentCatalog().getAcroForm();
PDField field = pDAcroForm.getField("Given Name Text Box");
field.setValue("firstname");
field = pDAcroForm.getField("Family Name Text Box");
field.setValue("lastname");
field = pDAcroForm.getField("Country Combo Box");
field.setValue("Country");
System.out.println("country combo" );
field = pDAcroForm.getField(" Driving License Check Box");
field = pDAcroForm.getField("Favourite Colour List Box");
System.out.println("country combo"+ field.isRequired());
pDDocument.save("D:/pdf/pdf-java-output.pdf");
pDDocument.close();
} catch (IOException e) {
e.printStackTrace();
}
}
i am trying to get text from properties file that he is coded in utf-8 and write it in to a PDF file using document object in java .
Document document = new Document();
File file = new File(FILES_PATH + ".pdf");
FileOutputStream fos = new FileOutputStream(file);
PdfWriter.getInstance(document, fos);
.
.
.
pdfTable table;
document.add(table);
document.close();
when i get just the value from property is ignores Chinese characters .
when i try to encode the string instead Chinese characters i get
strange words or "?".
tried to code it in utf-8 , iso-8859-1 , gbk or gb3212.
need help that PDF file will be able to get Chinese characters
It will not work that way.
In order to display Unicode character in PDF, that is not in build-in PDF fonts, you need to specify custom font for the text frangment and create the separate fragment for each text fragment, that is covered by given font. You need also to embed the used fonts into PDF document (so please consider, if the licence for the fonts you use enables distributing them).
So each String could be rendered using many fonts. But iText has the class FontSelector, that does that task:
FontSelector selector = new FontSelector();
BaseFont bf1 = BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
bf1.setSubset(true);
Font font1 = new Font(baseFont, 12, Font.BOLD);
selector.addFont(font1);
// ... do that with all fonts you need
Phrase ph = selector.process(TEXT);
document.add(new Paragraph(ph));
More complex example you can find in my article: Using dynamic fonts for international texts in iText