I am trying to use docx4j to make partial copy of the document to a new document. I am able to copy most part - text, format etc. However, in case of image, I am not able to copy. Doing a deep copy for inline images leave the document incorrectly formatted, and same happens for linked images.
I am attaching code used to make copy
word = WordprocessingMLPackage.load(new File("C:\\Users\\prerak\\Documents\\Projects\\EME\\Exam System\\Documents\\T.docx"));
//newDoc = WordprocessingMLPackage.createPackage();
newDoc = WordprocessingMLPackage.load(new File("C:\\Users\\prerak\\Documents\\Projects\\EME\\Exam System\\Documents\\T.docx"));
MainDocumentPart mdp = word.getMainDocumentPart();
newDoc.getMainDocumentPart().getContent().clear();
Document contents = mdp.getContents();
Body body = contents.getBody();
List<Object> content = body.getContent();
ArrayList<ArrayList<Object>> allQ = new ArrayList<>();
ArrayList<Object> next = null;
for (Object o : content) {
if (o instanceof P) {
P p = (P) o;
List<Object> rs = DocxUtils.getAllElementFromObject(p, R.class);
for(Object d:rs){
R tt = (R) d;
List<Object> ds = tt.getContent();
for(Object dd:ds){
System.out.println(dd.getClass().getName());;
}
}
PPr ppr = (PPr) p.getPPr();
if (ppr != null && ppr.getPStyle() != null) {
System.out.println("Style: " + ppr.getPStyle().getVal());
if (ppr.getPStyle().getVal().equals("Heading1")) {
//System.out.println(o.toString());
}
}
}
if (o.toString().startsWith("##")) {
next = new ArrayList<>();
allQ.add(next);
}
if (next != null) {
next.add(o);
}
}
//System.out.println("Total number of questions " + allQ.size());
for (Object o : allQ.get(0)) {
newDoc.getMainDocumentPart().getContent().add(XmlUtils.deepCopy(o));
}
//System.out.println(DocxUtils.paraToHtml(allQ.get(0)));
newDoc.save(new File("C:\\Users\\prerak\\Documents\\Projects\\EME\\Exam System\\Documents\\newt90.docx"));
Do I have to do anything more than making a deep copy?
Your help is greatly appreciated.
Thanks
See http://www.docx4java.org/blog/2010/11/merging-word-documents/ regarding referential integrity.
The "brute force" approach is to make a copy of the WordprocessingMLPackage using WordprocessingMLPackage's clone() method, then delete the content you don't want. That'll leave the image reference and its corresponding part, but also other images etc which you aren't using anymore. Depending on your circumstances that may or may not be ok. If it isn't, the commercial MergeDocx code is one solution.
Alternatively, if you know the only references you have are say, images (as opposed to comments, footnotes etc etc), you could handle those specifically in your code.
Related
I need to extract word document comments and the text they comment on. Below is my current solution, but it is not working as expcted
public class Main {
public static void main(String[] args) throws Exception {
var document = new Document("sample.docx");
NodeCollection<Paragraph> paragraphs = document.getChildNodes(PARAGRAPH, true);
List<MyComment> myComments = new ArrayList<>();
for (Paragraph paragraph : paragraphs) {
var comments = getComments(paragraph);
int commentIndex = 0;
if (comments.isEmpty()) continue;
for (Run run : paragraph.getRuns()) {
var runText = run.getText();
for (int i = commentIndex; i < comments.size(); i++) {
Comment comment = comments.get(i);
String commentText = comment.getText();
if (paragraph.getText().contains(runText + commentText)) {
myComments.add(new MyComment(runText, commentText));
commentIndex++;
break;
}
}
}
}
myComments.forEach(System.out::println);
}
private static List<Comment> getComments(Paragraph paragraph) {
#SuppressWarnings("unchecked")
NodeCollection<Comment> comments = paragraph.getChildNodes(COMMENT, false);
List<Comment> commentList = new ArrayList<>();
comments.forEach(commentList::add);
return commentList;
}
static class MyComment {
String text;
String commentText;
public MyComment(String text, String commentText) {
this.text = text;
this.commentText = commentText;
}
#Override
public String toString() {
return text + "-->" + commentText;
}
}
}
sample.docx contents are:
And the output is (which is incorrect):
factors-->This is word comment
%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
Expected output is:
factors-->This is word comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->Second paragraph comment
These factors act, at least partly, by changing the genes of a cell. Typically, many genetic changes are required before cancer develops. Approximately 5%–10% of cancers are caused by inherited genetic defects from a person's parents.-->First paragraph comment
Please help me with a better way of extarcting word document comments and the text they comment on. If you need additional details let me know, I will provide all the required details
The commented text is marked by special nodes CommentRangeStart and CommentRangeEnd. CommentRangeStart and CommentRangeEnd nodes has Id, which corresponds the Comment id the range is linked to. So you need to extract content between the corresponding start and end nodes.
By the way, the code example in the Aspose.Words API reference shows how print the contents of all comments and their comment ranges using a document visitor. Looks like exactly what you are looking for.
EDIT: You can use code like the following to accomplish your task. I did not provide full code for extracting content between nodes, is is availabel on GitHub
Document doc = new Document("C:\\Temp\\in.docx");
// Get the comments in the document.
Iterable<Comment> comments = doc.getChildNodes(NodeType.COMMENT, true);
Iterable<CommentRangeStart> commentRangeStarts = doc.getChildNodes(NodeType.COMMENT_RANGE_START, true);
Iterable<CommentRangeEnd> commentRangeEnds = doc.getChildNodes(NodeType.COMMENT_RANGE_END, true);
for (Comment c : comments)
{
System.out.println(String.format("Comment %d : %s", c.getId(), c.toString(SaveFormat.TEXT)));
CommentRangeStart start = null;
CommentRangeEnd end = null;
// Search for an appropriate start and end.
for (CommentRangeStart s : commentRangeStarts)
{
if (c.getId() == s.getId())
{
start = s;
break;
}
}
for (CommentRangeEnd e : commentRangeEnds)
{
if (c.getId() == e.getId())
{
end = e;
break;
}
}
if (start != null && end != null)
{
// Extract content between the start and end nodes.
// Code example how to extract content between nodes is here
// https://github.com/aspose-words/Aspose.Words-for-Java/blob/master/Examples/src/main/java/com/aspose/words/examples/programming_documents/document/ExtractContentBetweenCommentRange.java
}
else
{
System.out.println(String.format("Comment %d Does not have comment range"));
}
}
I have a small java app and I have used JInterface to essentially expose it as an OTP process in my elixir app. I can call it and get a response successfully.
My problem is that the response I get back in elixir is of a binary but I cannot figure out how to convert a binary to a list of strings which is what the response is.
The code for my OTP node in Java using JInterface is below:
public void performAction(Object requestData, OtpMbox mbox, OtpErlangPid lastPid){
List<String> sentences = paragraphSplitter.splitParagraphIntoSentences((String) requestData, Locale.JAPAN);
mbox.send(lastPid, new OtpErlangBinary(getOtpStrings(sentences)));
System.out.println("OK");
}
private List<OtpErlangString> getOtpStrings(List<String> sentences) {
List<OtpErlangString> erlangStrings = new ArrayList<>();
for(int i = 0; i < sentences.size(); i++){
erlangStrings.add(new OtpErlangString(sentences.get(i)));
}
return erlangStrings;
}
It is necessary to wrap the response in an OtpErlangBinary and I have concerted the strings to OTPErlangString. I have also tried without converting the strings to OTPErlangString.
On the elixir side I can receive the binary response and IO.inspect it.
Does anybody know how to use JInterface to deserialise the results correctly when it's anything other than a single string? Or maybe, if I have made some mistake, how to build the correct response type so that I can deserialise it correctly?
Any help would be really appreciated as I have been trying to figure this out for ages.
Thanks in advance.
I have been playing around with JInterface and Elixir and I think I've got your problem figured out.
So you are trying to send a list of strings from an Elixir/Erlang node to a Java node, but you cannot get it to de-serialize properly.
Elixir has its own types (e.g., atoms, tuples, ..) and Java has its own types (e.g., Object, String, List<String>,..). There needs to be a conversion from the one type to the other if they're supposed to talk to each other. In the end it's just a bunch of 1's and 0's that get sent over the wire anyway.
If an Erlang list is sent to Java, what arrives can always be interpreted as an OtpErlangObject. It's up to you to then try and guess what the actual type is before we can even begin turning it into a Java value.
// We know that everything is at least an OtpErlangObject value!
OtpErlangObject o = mbox.receive();
But given that you know that it's in fact a list, we can turn it into an OtpErlangList value.
// We know o is an Erlang list!
OtpErlangList erlList = (OtpErlangList) o;
The elements of this list however, are still unknown. So at this point its still a list of OtpErlangObjects.
But, we know that it's a list of strings, so we can interpret the list of OtpErlangObjects as list of OtpErlangStrings, and convert those to Java strings.
public static List<String> ErlangListToStringList(OtpErlangList estrs) {
OtpErlangObject[] erlObjs = estrs.elements();
List<String> strs = new LinkedList<String>();
for (OtpErlangObject erlO : erlObjs) {
strs.add(erlO.toString());
}
return strs;
}
Note that I used the term list here a lot, because it's in fact an Erlang list, in Java it's all represented as an array!
My entire code is listed below.
The way to run this is to paste it into a Java IDE, and start a REPL with the following parameters:
iex --name bob#127.0.0.1 --cookie "secret"
Java part:
import com.ericsson.otp.erlang.*;
import java.io.IOException;
import java.util.LinkedList;
import java.util.List;
public class Main {
public static OtpErlangList StringListToErlangList(List<String> strs) {
OtpErlangObject[] elems = new OtpErlangObject[strs.size()];
int idx = 0;
for (String str : strs) {
elems[idx] = new OtpErlangString(str);
idx++;
}
return new OtpErlangList(elems);
}
public static List<String> ErlangListToStringList(OtpErlangList estrs) {
OtpErlangObject[] erlObjs = estrs.elements();
List<String> strs = new LinkedList<String>();
for (OtpErlangObject erlO : erlObjs) {
strs.add(erlO.toString());
}
return strs;
}
public static void main(String[] args) throws IOException, InterruptedException {
// Do some initial setup.
OtpNode node = new OtpNode("alice", "secret");
OtpMbox mbox = node.createMbox();
mbox.registerName("alice");
// Check that the remote node is actually online.
if (node.ping("bob#127.0.0.1", 2000)) {
System.out.println("remote is up");
} else {
System.out.println("remote is not up");
}
// Create the list of strings that needs to be sent to the other node.
List<String> strs = new LinkedList<String>();
strs.add("foo");
strs.add("bar");
OtpErlangList erlangStrs = StringListToErlangList(strs);
// Create a tuple so the other node can reply to use.
OtpErlangObject[] msg = new OtpErlangObject[2];
msg[0] = mbox.self();
msg[1] = erlangStrs;
OtpErlangTuple tuple = new OtpErlangTuple(msg);
// Send the tuple to the other node.
mbox.send("echo", "bob#127.0.0.1", tuple);
// Await the reply.
while (true) {
try {
System.out.println("Waiting for response!");
OtpErlangObject o = mbox.receive();
if (o instanceof OtpErlangList) {
OtpErlangList erlList = (OtpErlangList) o;
List<String> receivedStrings = ErlangListToStringList(erlList);
for (String s : receivedStrings) {
System.out.println(s);
}
}
if (o instanceof OtpErlangTuple) {
OtpErlangTuple m = (OtpErlangTuple) o;
OtpErlangPid from = (OtpErlangPid) (m.elementAt(0));
OtpErlangList value = (OtpErlangList) m.elementAt(1);
List<String> receivedStrings = ErlangListToStringList(value);
for (String s : receivedStrings) {
System.out.println(s);
}
}
} catch (OtpErlangExit otpErlangExit) {
otpErlangExit.printStackTrace();
} catch (OtpErlangDecodeException e) {
e.printStackTrace();
}
}
}
}
I want to extract some data from a shapefile,
I have this function that can read a shapefile file is show on a map
i can read some informations but i m lost with extract many info in this file
public class Quickstart {
public static void main(String[] args) throws Exception {
// display a data store file chooser dialog for shapefiles
File file = JFileDataStoreChooser.showOpenFile("shp", null);
if (file == null) {
return;
}
FileDataStore dataStore = FileDataStoreFinder.getDataStore(file);
//SimpleFeatureSource featureSource = dataStore.getFeatureSource();
String t = dataStore.getTypeNames()[0];
SimpleFeatureSource featureSource = dataStore.getFeatureSource(t);
SimpleFeatureType schema = featureSource.getSchema();
//String geomType = schema.getGeometryDescriptor().getType().getBinding().getName();
GeometryDescriptor geom = schema.getGeometryDescriptor();
List<AttributeDescriptor> attributes = schema.getAttributeDescriptors();
GeometryType geomType = null;
List<AttributeDescriptor> attribs = new ArrayList<AttributeDescriptor>();
for (AttributeDescriptor attrib : attributes) {
AttributeType type = attrib.getType();
if (type instanceof GeometryType) {
geomType = (GeometryType) type;
} else {
attribs.add(attrib);
}
}
GeometryTypeImpl gt = new GeometryTypeImpl(
new NameImpl("the_geom"), geomType.getBinding(),
geomType.getCoordinateReferenceSystem(),
geomType.isIdentified(), geomType.isAbstract(),
geomType.getRestrictions(), geomType.getSuper(),
geomType.getDescription());
GeometryDescriptor geomDesc = new GeometryDescriptorImpl(
gt, new NameImpl("the_geom"),
geom.getMinOccurs(), geom.getMaxOccurs(),
geom.isNillable(), geom.getDefaultValue());
attribs.add(0, geomDesc);
SimpleFeatureType shpType = new SimpleFeatureTypeImpl(
schema.getName(), attribs, geomDesc,
schema.isAbstract(), schema.getRestrictions(),
schema.getSuper(), schema.getDescription());
dataStore.createSchema(shpType);
CachingFeatureSource cache = new CachingFeatureSource(featureSource);
// Create a map context and add our shapefile to it
MapContext map = new DefaultMapContext();
map.setTitle("Using cached features");
map.addLayer(cache, null);
// Now display the map
JMapFrame.showMap(map);
}
i want to extract this information :
POLYGON((0.6883 49.4666,0.6836 49.4664,0.6836 49.4663,0.6841 49.466,0.6844 49.4658,0.6847 49.4653,0.685 49.465,
0.6852 49.4646,0.6865 49.4624,0.6868 49.4621,0.6869 49.4618,0.6873 49.4617,0.6874 49.4617,0.6878 49.4616,0.6884 49.4615,
0.6898 49.4614,0.6909 49.4613,0.6909 49.4618,0.6913 49.4618,0.6906 49.4667,0.6883 49.4666))
I do not know how to do
plz help
It is hard to see how that snippet is giving you any attributes back.
If you want the geometry of a feature then you need to use
Geometry geom = feature.getDefaultGeometry();
to write it out as Well Known Text (WKT) you can use the .toString() method or use a WKTWriter to give you greater control over the formatting.
I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.
Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}
I'm using Milo and its example server and client. I'm adding nodes to the server but I can't figure out how to add EuInformation, i.e., unit and description. I thought about using the ExtensionObject but since EuInformation does not implement Serializable I don't know how to pass it to the ExtensionObject. I'd also like to know how I can get the namespace ID and URI on client side. So far I just set them statically as I have access to the classes.
I've implemeted the AddNodes on server side. I can add nodes, read nodes and write to nodes.
Here's what I'm doing on client side:
// Should somehow get the namespace ID and namespace dynamically.
// Maybe by iterating through all nodes??
ExpandedNodeId parentNodeId = new ExpandedNodeId(
new nodeId(2,DatatypeNamespace.NODE_IDENTIFIER),
datatypeNamespace.NAMESPACE_URI, 0);
NodeId referenceTypeId = Identifiers.String;
// Define the new node.
ExpandedNodeId requestedNewNodeId = new ExpandedNodeId(new NodeId(2, "NewNode"),
DatatypeNamespace.NAMESPACE_URI, 0);
QualifiedName browseName = new QualifiedName(2, "NewNode");
// How to get this to the server??
EUInformation euinfo = new EUInformation(null,-1,LocalizedText.english("MyUnit"),
LocalizedText.english("My Description"));
ExpandedNodeId typeDef = new ExpandedNodeId(Identifiers.BaseVariableType,
DatatypeNamespace.NAMESPACE_URI, 0);
AddNodesItem newItem = new AddNodesItem(parentNodeId, referenceTypeId,
requestedNewNodeId,rowseName,NodeClass.VariableType, null, typeDef);
List<AddNodesItem> items = new ArrayList<AddNodesItem>();
items.add(newItem);
client.addNodes(items).get();
EDIT
With the help of Kevin Herron's answer I worked something out: I adjusted the write() in my namespace class. I can now modify the display name and description of the node with the values of the EUInformation. Here's my write() method:
#Override
public void write(WriteContext context, List<WriteValue> writeValues) {
List<StatusCode> results = Lists.newArrayListWithCapacity(writeValues.size());
for (WriteValue writeValue : writeValues) {
ServerNode node = server.getNodeMap().get(writeValue.getNodeId());
if (node != null) {
// Get the type of the variant thats about to be written to the node
NodeId variantType = writeValue.getValue().getValue().getDataType().get();
if (variantType.equals(Identifiers.Structure)) {
ExtensionObject o = (ExtensionObject) writeValue.getValue().getValue().getValue();
if (o.getEncodingTypeId().equals(Identifiers.EUInformation_Encoding_DefaultBinary)) {
EUInformation euInformation = (EUInformation) o.decode();
node.setDescription(euInformation.getDescription());
node.setDisplayName(euInformation.getDisplayName());
System.out.println("Wrote EUInformation " + euInformation);
results.add(StatusCode.GOOD);
context.complete(results);
return;
}
}
try {
node.writeAttribute(new AttributeContext(context), writeValue.getAttributeId(),
writeValue.getValue(), writeValue.getIndexRange());
results.add(StatusCode.GOOD);
System.out.println(String.format("Wrote value %s to %s attribute of %s",
writeValue.getValue().getValue(),
AttributeId.from(writeValue.getAttributeId()).map(Object::toString).orElse("unknown"),
node.getNodeId()));
} catch (UaException e) {
System.out.println(String.format("Unable to write %s", writeValue.getValue()));
results.add(e.getStatusCode());
}
} else {
results.add(new StatusCode(StatusCodes.Bad_NodeIdUnknown));
}
}
context.complete(results);
}
Ok, so you would add a new VaribleNode with a TypeDefinition of Property (Identifiers.PropertyType).
Then you would write to its Value attribute so it contains the EUInformation object:
EUInformation euInformation = ...
Variant v = new Variant(ExtensionObject.encode(euInformation));
...write the value to the node you created...