Apache POI docx file content control parse

Apache POI docx file content control parse - java

I'm trying to parse docx file that contains content control fields (that are added using window like this, reference image, mine is on another language)
I'm using library APACHE POI. I found this question on how to do it. I used the same code:
import java.io.FileInputStream;
import org.apache.poi.xwpf.usermodel.*;
import java.util.List;
import java.util.ArrayList;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.*;
import org.apache.xmlbeans.XmlCursor;
import javax.xml.namespace.QName;
public class ReadWordForm {
private static List<XWPFSDT> extractSDTsFromBody(XWPFDocument document) {
XWPFSDT sdt;
XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
List<XWPFSDT> allsdts = new ArrayList<XWPFSDT>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
if (xmlcursor.getObject() instanceof CTSdtRun) {
sdt = new XWPFSDT((CTSdtRun)xmlcursor.getObject(), document);
//System.out.println("block: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtBlock) {
sdt = new XWPFSDT((CTSdtBlock)xmlcursor.getObject(), document);
//System.out.println("inline: " + sdt);
allsdts.add(sdt);
}
}
}
}
return allsdts;
}
public static void main(String[] args) throws Exception {
XWPFDocument document = new XWPFDocument(new FileInputStream("WordDataCollectingForm.docx"));
List<XWPFSDT> allsdts = extractSDTsFromBody(document);
for (XWPFSDT sdt : allsdts) {
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
}
}
document.close();
}
}
The problem is that this code doesn't see these fields in the my docx file until I open it in LibreOffice and re-save it. So if the file is from Windows being put into this code it doesn't see these content control fields. But if I re-save the file in the LibreOffice (using the same format) it starts to see these fields, even tho it loses some of the data (titles and tags of some fields). Can someone tell me what might be the reason of it, how do I fix that so it will see these fields? Or there's an easier way using docx4j maybe? Unfortunately there's not much info about how to do it using these 2 libs in the internet, at least I didn't find it.
Examle files are located on google disk. The first one doesn't work, the second one works (after it was opened in Libre and field was changed to one of the options).

According to your uploaded sample files your content controls are in a table. The code you had found only gets content controls from document body directly.
Tables are beastly things in Word as table cells may contain whole document bodies each. That's why content controls in table cells are strictly separated from content controls in main document body. Their ooxml class is CTSdtCell instead of CTSdtRun or CTSdtBlock and in apache poi their class is XWPFSDTCell instead of XWPFSDT.
If it is only about reading the content, then one could fall back to XWPFAbstractSDT which is the abstract parent class of XWPFSDTCell as well as of XWPFSDT. So following code should work:
private static List<XWPFAbstractSDT> extractSDTsFromBody(XWPFDocument document) {
XWPFAbstractSDT sdt;
XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
List<XWPFAbstractSDT> allsdts = new ArrayList<XWPFAbstractSDT>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
//System.out.println(xmlcursor.getObject().getClass().getName());
if (xmlcursor.getObject() instanceof CTSdtRun) {
sdt = new XWPFSDT((CTSdtRun)xmlcursor.getObject(), document);
//System.out.println("block: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtBlock) {
sdt = new XWPFSDT((CTSdtBlock)xmlcursor.getObject(), document);
//System.out.println("inline: " + sdt);
allsdts.add(sdt);
} else if (xmlcursor.getObject() instanceof CTSdtCell) {
sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), null, null);
//System.out.println("cell: " + sdt);
allsdts.add(sdt);
}
}
}
}
return allsdts;
}
But as you see in code line sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), null, null), the XWPFSDTCell totaly lost its connection to table and tablerow.
There is not a proper method to get the XWPFSDTCell directly from a XWPFTable. So If one would need to get XWPFSDTCell connected to its table, then also parsing the XML is needed. This could look like so:
private static List<XWPFSDTCell> extractSDTsFromTableRow(XWPFTableRow row) {
XWPFSDTCell sdt;
XmlCursor xmlcursor = row.getCtRow().newCursor();
QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
QName qnameTr = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "tr", "w");
List<XWPFSDTCell> allsdts = new ArrayList<XWPFSDTCell>();
while (xmlcursor.hasNextToken()) {
XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
if (tokentype.isStart()) {
if (qnameSdt.equals(xmlcursor.getName())) {
//System.out.println(xmlcursor.getObject().getClass().getName());
if (xmlcursor.getObject() instanceof CTSdtCell) {
sdt = new XWPFSDTCell((CTSdtCell)xmlcursor.getObject(), row, row.getTable().getBody());
//System.out.println("cell: " + sdt);
allsdts.add(sdt);
}
}
} else if (tokentype.isEnd()) {
//we have to check whether we are at the end of the table row
xmlcursor.push();
xmlcursor.toParent();
if (qnameTr.equals(xmlcursor.getName())) {
break;
}
xmlcursor.pop();
}
}
return allsdts;
}
And called from document like so:
...
for (XWPFTable table : document.getTables()) {
for (XWPFTableRow row : table.getRows()) {
List<XWPFSDTCell> allTrsdts = extractSDTsFromTableRow(row);
for (XWPFSDTCell sdt : allTrsdts) {
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
System.out.println(content);
}
}
}
}
...
Using curren apache poi 5.2.0 it is possible getting the XWPFSDTCell from XWPFTableRow via XWPFTableRow.getTableICells. This gets al List of ICells which is an interface which XWPFSDTCell also implemets.
So following code will get all XWPFSDTCell from tables without the need of low level XML parsing:
...
for (XWPFTable table : document.getTables()) {
for (XWPFTableRow row : table.getRows()) {
for (ICell iCell : row.getTableICells()) {
if (iCell instanceof XWPFSDTCell) {
XWPFSDTCell sdt = (XWPFSDTCell)iCell;
//System.out.println(sdt);
String title = sdt.getTitle();
String content = sdt.getContent().getText();
if (!(title == null) && !(title.isEmpty())) {
System.out.println(title + ": " + content);
} else {
System.out.println("====sdt without title====");
System.out.println(content);
}
}
}
}
}
...

Related

Apache POI: Remove Chart from Word Template file entirely

I am using a Word template with Excel graphs which I want to programmatically manipulate with the Java Apache POI library. For this I also need to be able to conditionally delete a Chart which is stored in this template.
Based on Axel Richters post (Removing chart from PowerPoint slide with Apache POI) I think I am almost there, but when I want to open the updated Word file it gives an error about unreadable content. This is what I have thus far:
PackagePart packagePartChart = xWPFChart.getPackagePart();
PackagePart packagePartWordDoc = xWPFDocument.getPackagePart();
OPCPackage packageWordDoc = packagePartWordDoc.getPackage();
// iterate over all relations the chart has and remove them
for (PackageRelationship chartrelship : packagePartChart.getRelationships()) {
String partname = chartrelship.getTargetURI().toString();
PackagePart part = packageWordDoc.getPartsByName(Pattern.compile(partname)).get(0);
packageWordDoc.removePart(part);
packagePartChart.removeRelationship(chartrelship.getId());
}
// now remove the chart itself from the word doc
Method removeRelation = POIXMLDocumentPart.class.getDeclaredMethod("removeRelation", POIXMLDocumentPart.class);
removeRelation.setAccessible(true);
removeRelation.invoke(xWPFDocument, xWPFChart);
If I unzip the Word file I correctly see that:
the relation between the WordDoc and the Chart are deleted in '\word\ _rels\document.xml.rels'
the chart itself is deleted in folder '\word\charts'
the relations between the documents supporting the Chart itself are deleted in folder '\word\charts\' _rels
the related chart items themselves are deleted:
StyleN / ColorsN in folder '\word\charts' and
Microsoft_Excel_WorksheetN in folder '\word\embeddings'
Anybody any idea on what could be going wrong here?

Indeed the finding of the right paragraph holding the chart was a challenge. In the end, for simplicity sake, I added a one row/one column placeholder table with one cell with a text, let's say 'targetWordString' in it, DIRECTLY before the chart. With the below function I am location the BodyElementID of this table:
private Integer iBodyElementIterator (XWPFDocument wordDoc,String targetWordString) {
Iterator<IBodyElement> iter = wordDoc.getBodyElementsIterator();
Integer bodyElementID = null;
while (iter.hasNext()) {
IBodyElement elem = iter.next();
bodyElementID = wordDoc.getBodyElements().indexOf(elem);
if (elem instanceof XWPFParagraph) {
XWPFParagraph paragraph = (XWPFParagraph) elem;
for (XWPFRun runText : paragraph.getRuns()) {
String text = runText.getText(0);
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: " + text);
if (text != null && text.equals(targetWordString)) {
break;
}
}
} else if (elem instanceof XWPFTable) {
if (((XWPFTable) elem).getRow(0) != null && ((XWPFTable) elem).getRow(0).getCell(0) != null) {
// the first cell holds the name via the template
String tableTitle = ((XWPFTable) elem).getRow(0).getCell(0).getText();
if (tableTitle.equals(targetWordString)) {
break;
}
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: " + tableTitle);
} else {
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Table removed!");
}
}
else {
Core.getLogger("WordExporter").trace("Body Element ID: " + bodyElementID + " Text: ?");
}
}
return bodyElementID;
}
In the main part of the code I am calling this function to locate the table, then first delete the chart (ID +1) and then the table (ID)
int elementIDToBeRemoved = iBodyElementIterator(xWPFWordDoc,targetWordString);
xWPFWordDoc.removeBodyElement(elementIDToBeRemoved + 1);
xWPFWordDoc.removeBodyElement(elementIDToBeRemoved);
It needs to be in this order, because the ID's are reordered once you delete a number in between, hence deleting the table first, would mean the chart would get that ID in principle.

I did find a more structural solution for it without the use of dummy tables for the positioning. The solution is build out into 2 parts:
Determine relationship id of chart in word document
Remove chart based on relationship id and remove paragraph + runs
Determine relationship id of chart in word document
First I have determined with a boolean variable 'chartUsed' whether the chart needs to be deleted.
if (!chartUsed) {
String chartPartName = xWPFChart.getPackagePart().getPartName().getName();
String ridChartToBeRemoved = null;
// iterate over all relations the chart has and remove them
for (RelationPart relation : wordDoc.getRelationParts()) {
PackageRelationship relShip = relation.getRelationship();
if (relShip.getTargetURI().toString().equals(chartPartName)) {
ridChartToBeRemoved = relShip.getId();
Core.getLogger(logNode).info("Chart with title " + chartTitle + " has relationship id " + relShip.getId());
break;
}
}
removeChartByRelId(wordDoc, ridChartToBeRemoved);
}
(Core.getLogger is an internal API from the Mendix low code platform I use).
At the end I am calling the function removeChartByRelId which will remove the chart if the relation is found within a run of a paragraph:
Remove chart based on relationship id and remove paragraph + runs
private void removeChartByRelId (XWPFDocument wordDoc, String chartRelId) {
Iterator<IBodyElement> iter = wordDoc.getBodyElementsIterator();
Integer bodyElementID = null;
while (iter.hasNext()) {
IBodyElement elem = iter.next();
bodyElementID = wordDoc.getBodyElements().indexOf(elem);
if (elem instanceof XWPFParagraph) {
Core.getLogger(logNode).trace("XWPFParagraph Body Element ID: " + bodyElementID);
XWPFParagraph paragraph = (XWPFParagraph) elem;
boolean removeParagraph = false;
for (XWPFRun run : paragraph.getRuns()) {
if (run.getCTR().getDrawingList().size()>0) {
String drawingXMLText = run.getCTR().getDrawingList().get(0).xmlText();
boolean isChart = drawingXMLText.contains("http://schemas.openxmlformats.org/drawingml/2006/chart");
String graphicDataXMLText = run.getCTR().getDrawingList().get(0).getInlineArray(0).getGraphic().getGraphicData().xmlText();
boolean contains = graphicDataXMLText.contains(chartRelId);
if (isChart && contains) {
Core.getLogger(logNode).info("Removing chart with relId " + chartRelId + " as it isnt used in data");
run.getCTR().getDrawingList().clear();
removeParagraph = true;
}
}
}
if (removeParagraph) {
for (int i = 0; i < paragraph.getRuns().size(); i++){
paragraph.removeRun(i);
}
}
}
}
}

Java + MongoDB: how get a nested field value using complete path?

I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.

Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}

Does PDFBox allow to remove one field from AcroForm?

I am using Apache PDFBox 2.0.8 and trying to remove one field. But can not find the way to do it, like I can do with iText: PdfStamper.getAcroFields().removeField("signature3").
What I am tying to do. Initially I have template PDF with 3 Digital Signatures. In some cases I need just 2 signatures, so it this case I need to remove 3rd signature from the template. And seems like I can't do it with PDFBox, close thing I found is flattening this field, but that problem is if a flatten particular PDField (not whole form, but just one field) - all other signatures are loosing their functionality, looks like they are getting flattened as well.
Here is code that does it:
PDDocument document = PDDocument.load(file);
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = documentCatalog.getAcroForm();
List<PDField> flattenList = new ArrayList<>();
for (PDField field : acroForm.getFieldTree()) {
if (field instanceof PDSignatureField && "signature3".equals(field.getFullyQualifiedName())) {
flattenList.add(field);
}
}
acroForm.flatten(flattenList, true);
document.save(dest);
document.close();

As Tilman already mentioned in a comment, PDFBox doesn't have a method to remove a field from the field tree. Nonetheless it has methods to manipulate the underlying PDF structure, so one can write such a method oneself, e.g. like this:
PDField removeField(PDDocument document, String fullFieldName) throws IOException {
PDDocumentCatalog documentCatalog = document.getDocumentCatalog();
PDAcroForm acroForm = documentCatalog.getAcroForm();
if (acroForm == null) {
System.out.println("No form defined.");
return null;
}
PDField targetField = null;
for (PDField field : acroForm.getFieldTree()) {
if (fullFieldName.equals(field.getFullyQualifiedName())) {
targetField = field;
break;
}
}
if (targetField == null) {
System.out.println("Form does not contain field with given name.");
return null;
}
PDNonTerminalField parentField = targetField.getParent();
if (parentField != null) {
List<PDField> childFields = parentField.getChildren();
boolean removed = false;
for (PDField field : childFields)
{
if (field.getCOSObject().equals(targetField.getCOSObject())) {
removed = childFields.remove(field);
parentField.setChildren(childFields);
break;
}
}
if (!removed)
System.out.println("Inconsistent form definition: Parent field does not reference the target field.");
} else {
List<PDField> rootFields = acroForm.getFields();
boolean removed = false;
for (PDField field : rootFields)
{
if (field.getCOSObject().equals(targetField.getCOSObject())) {
removed = rootFields.remove(field);
break;
}
}
if (!removed)
System.out.println("Inconsistent form definition: Root fields do not include the target field.");
}
removeWidgets(targetField);
return targetField;
}
void removeWidgets(PDField targetField) throws IOException {
if (targetField instanceof PDTerminalField) {
List<PDAnnotationWidget> widgets = ((PDTerminalField)targetField).getWidgets();
for (PDAnnotationWidget widget : widgets) {
PDPage page = widget.getPage();
if (page != null) {
List<PDAnnotation> annotations = page.getAnnotations();
boolean removed = false;
for (PDAnnotation annotation : annotations) {
if (annotation.getCOSObject().equals(widget.getCOSObject()))
{
removed = annotations.remove(annotation);
break;
}
}
if (!removed)
System.out.println("Inconsistent annotation definition: Page annotations do not include the target widget.");
} else {
System.out.println("Widget annotation does not have an associated page; cannot remove widget.");
// TODO: In this case iterate all pages and try to find and remove widget in all of them
}
}
} else if (targetField instanceof PDNonTerminalField) {
List<PDField> childFields = ((PDNonTerminalField)targetField).getChildren();
for (PDField field : childFields)
removeWidgets(field);
} else {
System.out.println("Target field is neither terminal nor non-terminal; cannot remove widgets.");
}
}
(RemoveField helper methods removeField and removeWidgets)
One can apply this to a document and field like this:
PDDocument document = PDDocument.load(SOURCE_PDF);
PDField field = removeField(document, "Signature1");
Assert.assertNotNull("Field not found", field);
document.save(TARGET_PDF);
document.close();
(RemoveField test testRemoveInvisibleSignature)
PS: I am not sure how much form related information PDFBox actually caches somewhere. Thus, I would propose not to manipulate the form information any further in the same document manipulation session, at least not without tests.
PPS: You find a TODO in the removeWidgets helper method. If the method outputs "Widget annotation does not have an associated page; cannot remove widget", you'll have to add the missing code.

Thanks to #mkl, I managed to do so with a shorter implementation using version pdfbox-3.0.0-RC1. in this case, to hide a button (check):
var check = (PDPushButton) pdAcroForm.getField(name);
List<PDField> fields = pdAcroForm.getFields();
fields.removeIf(x -> x.getCOSObject().equals(check.getCOSObject()));
pdAcroForm.setFields(fields);
check.getWidgets().forEach(widget -> widget.setNoView(true));

Please pay attention to the FILENAME and how to print out the filename that I choose from my computer?

it is a simple question, how to print out the selected file name, thanks.
public class CSVMAX {
public CSVRecord hottestInManyDays() {
//select many csv files from my computer
DirectoryResource dr = new DirectoryResource();
CSVRecord largestSoFar = null;
//read every row and implement the method we just define
for(File f : dr.selectedFiles()) {
FileResource fr = new FileResource(f);
CSVRecord currentRow = hottestHourInFile(fr.getCSVParser());
if (largestSoFar == null) {
largestSoFar = currentRow;
}
else {
double currentTemp = Double.parseDouble(currentRow.get("TemperatureF"));
double largestTemp = Double.parseDouble(largestSoFar.get("TemperatureF"));
//Check if currentRow’s temperature > largestSoFar’s
if (currentTemp > largestTemp) {
//If so update largestSoFar to currentRow
largestSoFar = currentRow;
}
}
}
return largestSoFar;
}
here I want to print out the file name but I dont know how to do that.
public void testHottestInManyDay () {
CSVRecord largest = hottestInManyDays();
System.out.println("hottest temperature on that day was in file " + ***FILENAME*** + largest.get("TemperatureF") +
" at " + largest.get("TimeEST"));
}
}

Ultimately, it seems that hottestInManyDays() will need to return this information.
Does CSVRecord have a property for that?
Something like this:
CSVRecord currentRow = hottestHourInFile(fr.getCSVParser());
currentRow.setFileName(f.getName());
If not, can such a property be added to it?
Maybe CSVRecord doesn't have that property. But it can be added?:
private String _fileName;
public void setFileName(String fileName) {
this._fileName = fileName;
}
public String getFileName() {
return this._fileName;
}
If not, can you create a wrapper class for both pieces of information?
If you can't modify CSVRecord and it doesn't have a place for the information you want, wrap it in a class which does. Something as simple as this:
class CSVWrapper {
private CSVRecord _csvRecord;
private String _fileName;
// getters and setters for the above
// maybe also a constructor? make them final? your call
}
Then return that from hottestInManyDays() instead of a CSVRecord. Something like this:
CSVWrapper csvWrapper = new csvWrapper();
csvWrapper.setCSVRecord(currentRow);
csvWrapper.setFileName(f.getName());
Changing the method signature and return value as needed, of course.
However you do it, once it's on the return value from hottestInManyDays() you can use it in the method which consumes that:
CSVWrapper largest = hottestInManyDays();
System.out.println("hottest temperature on that day was in file " + largest.getFileName() + largest.getCSVRecord().get("TemperatureF") +
" at " + largest.getCSVRecord().get("TimeEST"));
(Note: If the bits at the very end there don't sit right as a Law Of Demeter violation, then feel free to extend the wrapper to include pass-thru operations as needed. Maybe even have it share a common interface with CSVRecord so it can be used as a drop-in replacement for one as needed elsewhere in the system.)

Referring to the line for(File f : dr.selectedFiles())
f is a [File]. It has a toString() method [from docs],
Returns the pathname string of this abstract pathname. This is just
the string returned by the getPath() method.
So, in the first line inside the loop, you can put System.out.println(f.toString()); to print out the file path.
Hope this helps clear a part of the story.
Now, to update this string, I see you are using some object that is called largest in testHottestInManyDay(). You should add a filepath string in this object and set it inside the else block.

One has to return both the CSVRecord and the File. Either in a newly made class.
As CSVRecord can be converted to a map, add the file name to the map, using a new column name, here "FILENAME."
public Map<String, String> hottestInManyDays() {
//select many csv files from my computer
DirectoryResource dr = new DirectoryResource();
CSVRecord largestSoFar = null;
File fileOfLargestSoFar = null;
//read every row and implement the method we just define
for (File f : dr.selectedFiles()) {
FileResource fr = new FileResource(f);
CSVRecord currentRow = hottestHourInFile(fr.getCSVParser());
if (largestSoFar == null) {
largestSoFar = currentRow;
fileOfLargestSoFar = f;
}
else {
double currentTemp = Double.parseDouble(currentRow.get("TemperatureF"));
double largestTemp = Double.parseDouble(largestSoFar.get("TemperatureF"));
//Check if currentRow’s temperature > largestSoFar’s
if (currentTemp > largestTemp) {
//If so update largestSoFar to currentRow
largestSoFar = currentRow;
fileOfLargestSoFar = f;
}
}
}
Map<String, String> map = new HashMap<>(largestSoFar.toMap());
map.put("FILENAME", fileOfLargestSoFar.getPath());
return map;
}
Map<String, String> largest = hottestInManyDays();
System.out.println("hottest temperature on that day was in file "
+ largest.get("FILENAME") + largest.get("TemperatureF") +
" at " + largest.get("TimeEST"));

Created documents are not versionable

I use OpenCmis in-memory for testing. But when I create a document I am not allowed to set the versioningState to something else then versioningState.NONE.
The doc created is not versionable some way... I used the code from http://chemistry.apache.org/java/examples/example-create-update.html
The test method:
public void test() {
String filename = "test123";
Folder folder = this.session.getRootFolder();
// Create a doc
Map<String, Object> properties = new HashMap<String, Object>();
properties.put(PropertyIds.OBJECT_TYPE_ID, "cmis:document");
properties.put(PropertyIds.NAME, filename);
String docText = "This is a sample document";
byte[] content = docText.getBytes();
InputStream stream = new ByteArrayInputStream(content);
ContentStream contentStream = this.session.getObjectFactory().createContentStream(filename, Long.valueOf(content.length), "text/plain", stream);
Document doc = folder.createDocument(
properties,
contentStream,
VersioningState.MAJOR);
}
The exception I get:
org.apache.chemistry.opencmis.commons.exceptions.CmisConstraintException: The versioning state flag is imcompatible to the type definition.
What am I missing?

I found the reason...
By executing the following code I discovered that the OBJECT_TYPE_ID 'cmis:document' don't allow versioning.
Code to view all available OBJECT_TYPE_ID's (source):
boolean includePropertyDefintions = true;
for (t in session.getTypeDescendants(
null, // start at the top of the tree
-1, // infinite depth recursion
includePropertyDefintions // include prop defs
)) {
printTypes(t, "");
}
static void printTypes(Tree tree, String tab) {
ObjectType objType = tree.getItem();
println(tab + "TYPE:" + objType.getDisplayName() +
" (" + objType.getDescription() + ")");
// Print some of the common attributes for this type
print(tab + " Id:" + objType.getId());
print(" Fileable:" + objType.isFileable());
print(" Queryable:" + objType.isQueryable());
if (objType instanceof DocumentType) {
print(" [DOC Attrs->] Versionable:" +
((DocumentType)objType).isVersionable());
print(" Content:" +
((DocumentType)objType).getContentStreamAllowed());
}
println(""); // end the line
for (t in tree.getChildren()) {
// there are more - call self for next level
printTypes(t, tab + " ");
}
}
This resulted in a list like this:
TYPE:CMIS Folder (Description of CMIS Folder Type) Id:cmis:folder
Fileable:true Queryable:true
TYPE:CMIS Document (Description of CMIS Document Type)
Id:cmis:document Fileable:true Queryable:true [DOC Attrs->]
Versionable:false Content:ALLOWED
TYPE:My Type 1 Level 1 (Description of My Type 1 Level 1 Type)
Id:MyDocType1 Fileable:true Queryable:true [DOC Attrs->]
Versionable:false Content:ALLOWED
TYPE:VersionedType (Description of VersionedType Type)
Id:VersionableType Fileable:true Queryable:true [DOC Attrs->]
Versionable:true Content:ALLOWED
As you can see the last OBJECT_TYPE_ID has versionable: true... and when I use that it does work.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Apache POI docx file content control parse - java

Related

Apache POI: Remove Chart from Word Template file entirely

Java + MongoDB: how get a nested field value using complete path?

Does PDFBox allow to remove one field from AcroForm?

Please pay attention to the FILENAME and how to print out the filename that I choose from my computer?

Created documents are not versionable

Categories

Resources