Updating values of custom properties in word doc using java - java

I am not able to update value of a custom property in a word document using java.
I have a word document which contains a custom property with value 'stack'. Using java I want to change that value to 'overflow'. I used two approaches.
1) Using Apache POI
I set the org.apache.poi.hpsf.CustomProperties in org.apache.poi.hpsf.DocumentSummaryInformation and written it to POIFSFileSystem.
It does updates the value in word's customproperties table. But doesn't update the value for that respective property in the document. After the document is opened, I need to manually refresh the document to get that value updated.
2) Parse the document char by char and use the field codes DC3, DC4 and NAK to identify the location of custom properties. Replace the existing value with the new value.
Now the generated document contains the new values. But the problem here is, if the length of the old value and new value is different the document gets corrupted. I made sure the logic is good enough.
Any help would be appreciated.

To update the approach I took to solve the issue:
Using Office 2007 (docx) document as a template became easier. It is nothing but a zip document. You can open it using your zip application (winzip/7zip) and you can find many xml files inside it. document.xml contains the content, styles.xml contains formatting information and so on.
At runtime, I unzipped the document and parsed the document.xml, then used dom and updated with dynamic content. Custom properties are available in a separate xml file.
For 2003 users, they have to prepare the template using word application and save the document as XML then provide it as input.
No use of apache-poi now.

Related

How can we Edit Docx file and replace blank fields with actual data and save file with modified content?

To sum up in short, I have requirement that if a field is blank in docx file, I would replace blank field with actual content and save that document.
Solution can be in Java or Python as well.
I have tried various approaches as well, but doesn't seem to help my cause.
For eg.
[Text-Replace in docx and save the changed file with python-docx
Using above approach, When I tried to edit document.xml file, but I cant programatically find whether given field is blank or not ?
I have also tried using docx, doc4j in java, and apache poi library as well.
I really cant seem to find any solution for this.
This is sample document.
[https://drive.google.com/file/d/1oKzYWP1VCZ1KHkIhksQZ4c_fO349EVZ0/view?usp=sharing][1]
So basically, If Name field is null, Replace it with some value
and If address field / Date field is null, replace it with some value and save document back in its actual format.

Extracting data from a PDF

I have a system that ultimately creates a PDF files from html file. It works very similar to a mail merge. It grabs data from a database, merge's the data into palceholders in the html document and then converts the html file to a pdf.
When I am unit testing the html file I can look at the values in my place holder. For example if I had a John Smith and I want to validate that the name is "John Smith" I simply look the value of the div after the merge.
I need to do something similar with validating the data in the pdf. Using pdfbox and itext I was able to extract text from a location as well as text from the document but I can't find anything that would let me create a "tag/placeholder/..." and extract information from it similar to what I do with the html file.
Is this possible with a pdf?
That's perfectly possible using pdf2Data, which is a solution from the iText suite.
You can find the demo here
http://pdf2data.online/
It essentially does exactly what you described, you are given a viewer and some tools that allow you to define areas of interest (what you called 'placeholders').
Areas of interest can be defined using:
coordinates
relative to other areas of interest
relative to text or regular expressions
matching a certain regular exression
matching a table
etc
The tool then stores your template as an XML file, and you can use java or .NET code to extract information from a PDF that matches the template.
You are given either a json-like datastructure, or an XML file.
That should make it relatively straightforward to test whether a given area of interest contains a piece of text.

Can i get selected tag from xml and download the value?

I want to download the value of selected tag from some site. The only way i know is is to download whole XML and then get the value.
My Question is:
Can i get the value without downloading whole XML?
For example:
I have this site:
http://api.nbp.pl/api/exchangerates/tables/a/?format=xml
And i only want to download the value of tag "EffectiveDate".
I know i can download whole XML then get it but why should I if I want get only one value.
Is there any way to do it in Java?
If you are looking for converting the data given in some XML format to convert it to Java objects then you can use those values as you need .
There is java marshalling and unmarshalling of XML to Java object would help you to do it.
Please refer to the below link for example..
http://www.javatpoint.com/jaxb-unmarshalling-example
It's not possible to get those values without downloading whole xml. What if at the end of XML there are tags which are matching your needs? On other hand if you need only specific values - maybe it's worth to update backend to return values specified by some selector?

Manipulation of Word Document using OOXML?

We have a requirement to read Word document and make changes to it with dynamic data from Application & some sections that will be updated by the user directly into the word.
Every time the user wants to fetch data from Application, he will upload the document and merge it.
To be more specific, I looked into option of manipulating the word document by adding meta tags as markers in the ooxml for each section. But i am not able to find any.Is there an option to add meta tags for content, which can act like template markers?
Note: We want to implement this in a Java application.
You can use special Java Library: http://poi.apache.org/ Apache POI to manipulate DOC files, but for template you need different system like velocity or freemarker. Or you can just use String replace.

how to create a new word from template with docx4j

I have the following scenario, and need some advice:
The user will input a word document as a template, and provide some parameters in runtime so i can query my database and get data to fill the document.
So, there are two basic things i need to do:
Replace every key in the document with it´s respective result from the current query line.
"Merge" (copy? duplicate?) the existing document unchanged into itself (append) depending on how many rows i got from the query, and replacing the keys from this new copy with the next row values.
What´s is the best aprroach to do this? I´ve managed to do the replace part for now, by using the unmarshallfromtemplate providing it a hashmap.
But this way is a little bit tricky, because i need to add "${variable_name}" in the document, and sometimes word separates "${" and "}" in different tags, causing issues.
I´ve read about the custom xml binding, but didn´t understand it completely. I need to generate a custom XML, inject it in the document (all of this un runtime) and call the applybindings?? If this is true, how would i bind the fields in the document to the xml ? By name?
docx4j includes VariablePrepare, which can tidy up your input docx so that your keys are not split across separate runs.
But, you would still be better off switching to content control data binding, particularly if you have repeated data (think for example of line items in an invoice). Disclosure: I champion this approach in docx4j.
To adopt the content control data binding approach:
dream up an XML format which makes sense for your data, and write some code to convert the results of your database query into that format.
modify your template, so that the content controls are bound to elements in your XML document. ordinarily you'd use an authoring add-in for Word to help with this. (The technology Microsoft uses for binding is XPath, so how you bind depends on your XML structure, but, yes, you'd typically bind to the element name or ID).
now you have your XML file and a suitable intput docx, ContentControlsMergeXML contains the code you need to create an instance document at run time. There's also a version of this for a servlet environment at https://github.com/plutext/OpenDoPE-WAR
As an alternative to 1 & 2, there is also org.docx4j.model.datastorage.migration.FromVariableReplacement in current nightlies, which can convert your existing "${" document. Only to a standardised target XML format though.
If you have further questions, there is a forum devoted to this topic at http://www.docx4java.org/forums/data-binding-java-f16/

Categories