Open and Save to Same PDF File Name [duplicate]

Open and Save to Same PDF File Name [duplicate] - java

I am required to replace a word in an existing PDF AcroField with another word. I am using PDFStamper of iTEXTSHARP to do the same and it is working fine. But, in doing so it is required to create a new PDF and i would like the change to be reflected in the existing PDF itself. If I am setting the destination filename same as the original filename then no change is being reflected.I am new to iTextSharp , is there anything I am doing wrong? Please help.. I am providing the piece of code I am using
private void ListFieldNames(string s)
{
try
{
string pdfTemplate = #"z:\TEMP\PDF\PassportApplicationForm_Main_English_V1.0.pdf";
string newFile = #"z:\TEMP\PDF\PassportApplicationForm_Main_English_V1.0.pdf";
PdfReader pdfReader = new PdfReader(pdfTemplate);
for (int page = 1; page <= pdfReader.NumberOfPages; page++)
{
PdfReader reader = new PdfReader((string)pdfTemplate);
using (PdfStamper stamper = new PdfStamper(reader, new FileStream(newFile, FileMode.Create, FileAccess.ReadWrite)))
{
AcroFields form = stamper.AcroFields;
var fieldKeys = form.Fields.Keys;
foreach (string fieldKey in fieldKeys)
{
//Replace Address Form field with my custom data
if (fieldKey.Contains("Address"))
{
form.SetField(fieldKey, s);
}
}
stamper.FormFlattening = true;
stamper.Close();
}
}
}

As documented in my book iText in Action, you can't read a file and write to it simultaneously. Think of how Word works: you can't open a Word document and write directly to it. Word always creates a temporary file, writes the changes to it, then replaces the original file with it and then throws away the temporary file.
You can do that too:
read the original file with PdfReader,
create a temporary file for PdfStamper, and when you're done,
replace the original file with the temporary file.
Or:
read the original file into a byte[],
create PdfReader with this byte[], and
use the path to the original file for PdfStamper.
This second option is more dangerous, as you'll lose the original file if you do something that causes an exception in PdfStamper.

Related

read docx document using java

I have a project steganography to hide docx document into jpeg image. Using apache POI, I can run it and read docx document but only letters can be read.
Even though there are pictures in it.
Here is the code
FileInputStream in = null;
try
{
in = new FileInputStream(directory);
XWPFDocument datax = new XWPFDocument(in);
XWPFWordExtractor extract = new XWPFWordExtractor(datax);
String DataFinal = extract.getText();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = null;
this.isi_file = extract.getText();
}
catch (IOException x) {}
System.out.println("isi :" + this.isi_file);
How can I read all component in the docx document using java? Please help me and thank you for your helping.

Please check documentation for XWPFDocument class. It contains some useful methods, for example:
getAllPictures() returns list of all pictures in document;
getTables() returns list of all tables in document.
In your code snippet exists line XWPFDocument datax = new XWPFDocument(in);. So after that line your can write some code like:
// process all pictures in document
for (XWPFPictureData picture : datax.getAllPictures()) {
// get each picture as byte array
byte[] pictureData = picture.getData();
// process picture somehow
...
}

Why is my form being flattened without calling the flattenFields method?

I am testing my method with this form https://help.adobe.com/en_US/Acrobat/9.0/Samples/interactiveform_enabled.pdf
It is being called like so:
Pdf.editForm("./src/main/resources/pdfs/interactiveform_enabled.pdf", "./src/main/resources/pdfs/FILLEDOUT.pdf"));
where Pdf is just a worker class and editForm is a static method.
The editForm method looks like this:
public static int editForm(String inputPath, String outputPath) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
Map<String, PdfFormField> m = form.getFormFields();
for (String s : m.keySet()) {
if (s.equals("Name_First")) {
m.get(s).setValue("Tristan");
}
if (s.equals("BACHELORS DEGREE")) {
m.get(s).setValue("Off"); // On or Off
}
if (s.equals("Sex")) {
m.get(s).setValue("FEMALE");
}
System.out.println(s);
}
pdf.close();
logger.info("Completed");
} catch (IOException e) {
logger.error("Unable to fill form " + outputPath + "\n\t" + e);
return 1;
}
return 0;
}
Unfortunately the FILLEDOUT.pdf file is no longer a form after calling this method. Am I doing something wrong?
I was using this resource for guidance. Notice how I am not calling the form.flattenFields(). If I do call that method however, I get an error of java.lang.IllegalArgumentException.
Thank you for your time.

Your form is Reader-enabled, i.e. it contains a usage rights digital signature by a key and certificate issued by Adobe to indicate to a regular Adobe Reader that it shall activate a number of additional features when operating on that very PDF.
If you stamp the file as in your original code, the existing PDF objects will get re-arranged and slightly changed. This breaks the usage rights signature, and Adobe Reader, recognizing that, disclaims "The document has been changed since it was created and use of extended features is no longer available."
If you stamp the file in append mode, though, the changes are appended to the PDF as an incremental update. Thus, the signature still correctly signs its original byte range and Adobe Reader does not complain.
To activate append mode, use StampingProperties when you create your PdfDocument:
PdfDocument pdf = new PdfDocument(new PdfReader(inputPath), new PdfWriter(outputPath), new StampingProperties().useAppendMode());
(Tested with iText 7.1.1-SNAPSHOT and Adobe Acrobat Reader DC version 2018.009.20050)
By the way, Adobe Reader does not merely check the signature, it also tries to determine whether the changes in the incremental update don't go beyond the scope of the additional features activated by the usage rights signature.
Otherwise you could simply take a small Reader-enabled PDF and in append mode replace all existing pages by your own content of choice. This of course is not in Adobe's interest...

The filled in PDF is still an AcroForm, otherwise the example below would result in the same PDF twice.
public class Main {
public static final String SRC = "src/main/resources/interactiveform_enabled.pdf";
public static final String DEST = "results/filled_form.pdf";
public static final String DEST2 = "results/filled_form_second_time.pdf";
public static void main(String[] args) throws Exception {
File file = new File(DEST);
file.getParentFile().mkdirs();
Main main = new Main();
Map<String, String> data1 = new HashMap<>();
data1.put("Name_First", "Tristan");
data1.put("BACHELORS DEGREE", "Off");
main.fillPdf(SRC, DEST, data1, false);
Map<String, String> data2 = new HashMap<>();
data2.put("Sex", "FEMALE");
main.fillPdf(DEST, DEST2, data2, false);
}
private void fillPdf(String src, String dest, Map<String, String> data, boolean flatten) {
try {
PdfDocument pdf = new PdfDocument(new PdfReader(src), new PdfWriter(dest));
PdfAcroForm form = PdfAcroForm.getAcroForm(pdf, true);
//Delete print field from acroform because it is defined in the contentstream not in the formfields
form.removeField("Print");
Map<String, PdfFormField> m = form.getFormFields();
for (String d : data.keySet()) {
for (String s : m.keySet()) {
if(s.equals(d)){
m.get(s).setValue(data.get(d));
}
}
}
if(flatten){
form.flattenFields();
}
pdf.close();
System.out.println("Completed");
} catch (IOException e) {
System.out.println("Unable to fill form " + dest + "\n\t" + e);
}
}
}
The issue you are facing has to do with the 'reader enabled forms'.
What it boils down to is that the PDF file that is initially fed to your program is reader enabled. Hence you can open the PDF in Adobe Reader and fill in the form. This allows Acrobat users to extend the behaviour of Adobe Reader.
Once the PDF is filled in and closed using iText it saves the PDF as 'not reader-extended'.
This makes it so that the AcroForm can still be filled using iText but when you open the PDF using Adobe Reader the extended functionality you see in the original PDF is gone. But this does not mean the form is flattened.
iText cannot make a form reader enabled, as a matter of fact, the only way to create a reader enabled form is using Acrobat Professional. This is how Acrobat and Adobe Reader interact and it is not something iText can imitate or solve. You can find some more info and a possible solution on this link.
The IllegalArgumentException you get when you call the form.flattenFields() method is because of the way the PDF document was constructed.
The "Print form" button should have been defined in the AcroForm, yet it is defined in the contentstream of the PDF, meaning the button in the AcroForm has an empty text value, and this is what causes the exception.
You can fix this by removing the print field from the AcroForm before you flatten.

IllegalArgumentException issue has been fixed in iText 7.1.5.

ItextSharp - diacritic chars

i reading pdf documents via ItextSharp library.
But these documents is in Czech language which use diacritic (ř ě ž š č etc.)
How I can read this chars? Any idea? Or, is some solution for replacing this chars for normal r e z s c ?
This is code in my method. Thanks
PdfReader reader = new PdfReader("M:/ShareDirs_KSP/RDM_Debtors/DMS_PROD/" + src);
// we can inspect the syntax of the imported page
String text = new String();
for (int page = 1; page <= 1; page++) {
text += PdfTextExtractor.getTextFromPage(reader, page);
}
reader.close();

I have written a small proof of concept that parses the file czech.pdf. This file contains several characters with diacritics. It was created in answer to the following question: Can't get Czech characters while generating a PDF
The text is stored in the file twice: once using a simple font, once using a composite font. In my proof of concept (named ParseCzech), I parse this PDF to a file encoded using UTF-8 (UNICODE):
public void parse(String filename) throws IOException {
PdfReader reader = new PdfReader(filename);
FileOutputStream fos = new FileOutputStream(DEST);
for (int page = 1; page <= 1; page++) {
fos.write(PdfTextExtractor.getTextFromPage(reader, page).getBytes("UTF-8"));
}
fos.flush();
fos.close();
}
The result is the file czech.txt:
As you can see from the screen shot, the text is extracted correctly (but make sure that the viewer you use knows that the file is encoded as UTF-8, otherwise you may see strange characters instead of the actual text).
Note that some PDFs do not allow text to be extracted correctly. This is explained in the following video: http://www.youtube.com/watch?v=wxGEEv7ibHE
Please share your PDF so that people on StackOverflow can check whether you don't succeed to extract text because of an error in your code, or whether you don't succeed because the PDF doesn't allow you to extract the text.

iText mergeFields in PdfCopy creates invalid pdf

I am working on the task of merging some input PDF documents using iText 5.4.5. The input documents may or may not contain AcroForms and I want to merge the forms as well.
I am using the example pdf files found here and this is the code example:
public class TestForms {
#Test
public void testNoForms() throws DocumentException, IOException {
test("pdf/hello.pdf", "pdf/hello_memory.pdf");
}
#Test
public void testForms() throws DocumentException, IOException {
test("pdf/subscribe.pdf", "pdf/filled_form_1.pdf");
}
private void test(String first, String second) throws DocumentException, IOException {
OutputStream out = new FileOutputStream("/tmp/out.pdf");
InputStream stream = getClass().getClassLoader().getResourceAsStream(first);
PdfReader reader = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream)), null);
InputStream stream2 = getClass().getClassLoader().getResourceAsStream(second);
PdfReader reader2 = new PdfReader(new RandomAccessFileOrArray(
new RandomAccessSourceFactory().createSource(stream2)), null);
Document pdfDocument = new Document(reader.getPageSizeWithRotation(1));
PdfCopy pdfCopy = new PdfCopy(pdfDocument, out);
pdfCopy.setFullCompression();
pdfCopy.setCompressionLevel(PdfStream.BEST_COMPRESSION);
pdfCopy.setMergeFields();
pdfDocument.open();
pdfCopy.addDocument(reader);
pdfCopy.addDocument(reader2);
pdfCopy.close();
reader.close();
reader2.close();
}
}
With input files containing forms I get a NullPointerException with or without compression enabled.
With standard input docs, the output file is created but when I open it with Acrobat it says there was a problem (14) and no content is displayed.
With standard input docs AND compression disabled the output is created and Acrobat displays it.
Questions
I previously did this using PdfCopyFields but it's now deprecated in favor of the boolean flag mergeFields in the PdfCopy, is this correct? There's no javadoc on that flag and I couldn't find documentation about it.
Assuming the answer to the previous question is Yes, is there anything wrong with my code?
Thanks

We are using PdfCopy to merge differents files, some of files may have fields. We use the version 5.5.3.0. The code is simple and it seems to work fine, BUT sometimes the result file is impossible to print!
Our code :
Public Shared Function MergeFiles(ByVal sourceFiles As List(Of Byte())) As Byte()
Dim document As New Document()
Dim output As New MemoryStream()
Dim copy As iTextSharp.text.pdf.PdfCopy = Nothing
Dim readers As New List(Of iTextSharp.text.pdf.PdfReader)
Try
copy = New iTextSharp.text.pdf.PdfCopy(document, output)
copy.SetMergeFields()
document.Open()
For fileCounter As Integer = 0 To sourceFiles.Count - 1
Dim reader As New PdfReader(sourceFiles(fileCounter))
reader.MakeRemoteNamedDestinationsLocal()
readers.Add(reader)
copy.AddDocument(reader)
Next
Catch exception As Exception
Throw exception
Finally
If copy IsNot Nothing Then copy.Close()
document.Close()
For Each reader As PdfReader In readers
reader.Close()
Next
End Try
Return output.GetBuffer()
End Function

Your usage of PdfCopy.setMergeFields() is correct and your merging code is fine.
The issues you described are because of bugs that have crept into 5.4.5. They should be fixed in rev. 6152 and the fixes will be included in the next release.
Thanks for bringing this to our attention.

Its just to say that we have the same probleme : iText mergeFields in PdfCopy creates invalid pdf. So it is still not fixed in the version 5.5.3.0

how to append data in docx file using docx4j

Please tell me how to append data in docx file using java and docx4j.
What I'm doing is, I am using a template in docx format in which some field are dilled by java at run time,
My problem is for every group of data it creates a new file and i just want to append the new file into 1 file. and this is not done using java streams
String outputfilepath = "e:\\Practice/DOC/output/generatedLatterOUTPUT.docx";
String outputfilepath1 = "e:\\Practice/DOC/output/generatedLatterOUTPUT1.docx";
WordprocessingMLPackage wordMLPackage;
public void templetsubtitution(String name, String age, String gender, Document document)
throws Exception {
// input file name
String inputfilepath = "e:\\Practice/DOC/profile.docx";
// out put file name
// id of Xml file
String itemId1 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId2 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
String itemId3 = "{A5D3A327-5613-4B97-98A9-FF42A2BA0F74}".toLowerCase();
// Load the Package
if (inputfilepath.endsWith(".xml")) {
JAXBContext jc = Context.jcXmlPackage;
Unmarshaller u = jc.createUnmarshaller();
u.setEventHandler(new org.docx4j.jaxb.JaxbValidationEventHandler());
org.docx4j.xmlPackage.Package wmlPackageEl = (org.docx4j.xmlPackage.Package) ((JAXBElement) u
.unmarshal(new javax.xml.transform.stream.StreamSource(
new FileInputStream(inputfilepath)))).getValue();
org.docx4j.convert.in.FlatOpcXmlImporter xmlPackage = new org.docx4j.convert.in.FlatOpcXmlImporter(
wmlPackageEl);
wordMLPackage = (WordprocessingMLPackage) xmlPackage.get();
} else {
wordMLPackage = WordprocessingMLPackage
.load(new File(inputfilepath));
}
CustomXmlDataStoragePart customXmlDataStoragePart = wordMLPackage
.getCustomXmlDataStorageParts().get(itemId1);
// Get the contents
CustomXmlDataStorage customXmlDataStorage = customXmlDataStoragePart
.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:name[1]", name,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId2);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:age[1]", age,
"xmlns:ns0='EasyForm'");
customXmlDataStoragePart = wordMLPackage.getCustomXmlDataStorageParts()
.get(itemId3);
// Get the contents
customXmlDataStorage = customXmlDataStoragePart.getData();
// Change its contents
((CustomXmlDataStorageImpl) customXmlDataStorage).setNodeValueAtXPath(
"/ns0:orderForm[1]/ns0:record[1]/ns0:gender[1]", gender,
"xmlns:ns0='EasyForm'");
// Apply the bindings
BindingHandler.applyBindings(wordMLPackage.getMainDocumentPart());
File f = new File(outputfilepath);
wordMLPackage.save(f);
FileInputStream fis = new FileInputStream(f);
ByteArrayOutputStream bos = new ByteArrayOutputStream();
byte[] buf = new byte[1024];
try {
for (int readNum; (readNum = fis.read(buf)) != -1;) {
bos.write(buf, 0, readNum);
}
// System.out.println( buf.length);
} catch (IOException ex) {
}
byte[] bytes = bos.toByteArray();
FileOutputStream file = new FileOutputStream(outputfilepath1, true);
DataOutputStream out = new DataOutputStream(file);
out.write(bytes);
out.flush();
out.close();
System.out.println("..done");
}
public static void main(String[] args) {
utility u = new utility();
u.templetsubtitution("aditya",24,mohan);
}
thanks in advance

If I understand you correctly, you're essentially talking about merging documents. There are two very simple approaches that you can use, and their effectiveness really depends on the structure and onward use of your data:
PhilippeAuriach describes one approach in his answer, which entails
appending all components within a MaindocumentPart instance to
another. In terms of the final docx file, this means the content
that appears in document.xml -- it won't take into account headers
and footers ( for example), but that may be fine for you.
You can insert multiple documents into a single docx file by inserting them
as AltChunk elements (see the docx4j documentation). This will
bring everything from one Word file into another, headers and all.
The downside of this is that your final document won't be a proper
flowing Word file until you open it and save it in MS Word itself
(the imported components remain as standalone files within the docx
bundle). This will cause you issues if you want to generated
'merged' files and then do something with them like render PDFs --
the merged content will simply be ignored.
The more complete (and complex) approach is to perform a "deep merge". This updates and maintains all references held within a document. Imported content becomes part of the main "flow" of the document (i.e. it is not stored as separate references), so the end result is a properly-merged file which can be rendered to PDF or whatever.
The downside to this is you need a good knowledge of docx structure and the API, and you will be writing a fair amount of code (I would recommend buying a license to Plutext's MergeDocx instead).

I had to deal with similar things, and here is what I did (probably not the most efficient, but working) :
create a finalDoc loading the template, and emptying it (so you have the styles in this doc)
for each data row, create a new doc loading the template, then replace your fields with your values
use the function below to append the doc filled with the datas to the finalDoc :
public static void append(WordprocessingMLPackage docDest, WordprocessingMLPackage docSource) {
List<Object> objects = docSource.getMainDocumentPart().getContent();
for(Object o : objects){
docDest.getMainDocumentPart().getContent().add(o);
}
}
Hope this helps.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.