I need to remove XML declaration from dom4j document type
I am creating document by
doc = (Document) DocumentHelper.parseText(someXMLstringWithoutXMLDeclaration);
String parsed to Document doc by DocumenHelper contains no XML declaration (it comes from XML => XSL => XML transformation)
I think that DocumentHelper is adding declaration to a document body ?
Is there any way to remove XML declaration from the body of
doc
The simpler solution I choose is
doc.getRootElement().asXML();
I'm not sure where exactly this the declaration is a problem in your code.
I had this once when I wanted to write an xml file without declaration (using dom4j).
So if this is your use case: "omit declaration" is what you're looking for.
http://dom4j.sourceforge.net/dom4j-1.6.1/apidocs/org/dom4j/io/OutputFormat.html
Google says this can be set as a property as well, not sure what it does though.
You need to interact with the root element instead of the document.
For example, using the default, compact OutputFormat mentioned by PhilW:
Document doc = (Document) DocumentHelper.parseText(someXMLstringWithoutXMLDeclaration);
final Writer writer = new StringWriter();
new XMLWriter(writer).write(doc.getRootElement());
String out = writer.toString();
Related
I have a file containing an xml fragment. I need to add a child element into this file and resave it. I'm trying to use xom in Java (1.6).
The problem is that the data in the file contains a namespace prefix so when I construct my Document object I get :
[Fatal Error] tsip:1:33: The prefix "tsip" for attribute "tsip:action" associated with an element type "publications" is not bound.
The file contains eg:
<publications tsip:action="replace">
<publication tsip:dw="000000" tsip:status="dwpi:equivalent" tsip:lang="ja" tsip:drawings="0">
<documentId>
<number tsip:form="dwpi">58071346</number>
<countryCode>JP</countryCode>
<kindCode>A</kindCode>
</documentId>
</publication>
</publications>
My Java code is :
FileInputStream fisTargetFile;
// Read file into a string
fisTargetFile = new FileInputStream(new File("C:\myFileName"));
pubLuStr = IOUtils.toString(fisTargetFile, "UTF-8");
Document doc = new Builder().build(pubLuStr, ""); // This fails
I suspect I need to make the code namespace aware ie add something like:
doc.getRootElement().addNamespaceDeclaration("tsip", "http://schemas.thomson.com/ts/20041221/tsip");
but I can't see how to do this BEFORE i create the Document doc.
Any help , suggestions, appreciated.
One solution is to read the xml fragment into a string, and then wrap it in a dummy tag containing the namespace declaration. For example:
StringBuilder sb = new StringBuilder();
sb.append("<tag xmlns:tsip=\"http://schemas.thomson.com/ts/20041221/tsip\">");
String xmlFrag = new String(Files.readAllBytes(Paths.get("C:/myFileName")));
sb.append(xmlFrag);
sb.append("</tag>");
Builder parser = new Builder();
Document doc = parser.build(sb.toString(), null);
You can then happily parse the string into an XOM Document, and add the required changes before saving the revised fragment. To strip out the wrapper you can use XOM to pull out the fragment from the document by searching for the first real tag, e.g.
Element wrapperRoot = doc.getRootElement();
Element realRoot = root.getFirstChildElement("publications");
Then use XOM as normal to write out he revised fragment.
Is there a way to get the path of a XML-Document from a xPath- or Document-Object in the xPath-API ?
That´s how the Objects are initalized:
FileInputStream file = new FileInputStream(new File("C:\ExampleFile.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
So the question is:
Could the Objects xmlDocument or xpath somehow return "C:\ExampleFile.xml" ?
Using the Document object xmlDocument you can return the path of the file with:
xmlDocument.getDocumentURI();
Rather than creating a FileInputStream from the File and passing that to the parse method
FileInputStream file = new FileInputStream(new File("C:\\ExampleFile.xml"));
use the version of parse that takes a File directly.
File file = new File("C:\\ExampleFile.xml");
// rest of your code is unchanged - parse(file) is now the
// java.io.File version rather than the InputStream version
When you pass just a stream the parser has no way of knowing that that stream was created from a file, as far as the parser is concerned that could be a stream you received from a web server, or a ByteArrayInputStream, or some other non-file source. If you pass the File directly to parse then the parser will handle opening and closing the streams itself, and will be able to provide a meaningful URI to downstream components, and you'll get a sensible result from xmlDocument.getDocumentURI().
As an aside, if you want XPath to work reliably then you need to enable namespaces by calling builderFactory.setNamespaceAware(true) before you call newDocumentBuilder(). Even if your XML doesn't actually use any namespaces, you still need to parse with a namespace-aware DOM parser.
I am constructing an XML DOM Document with a SAX parser. I have written methods to handle the startCDATA and endCDATA methods and in the endCDATA method I construct a new CDATA section like this:
public void onEndCData() {
xmlStructure.cData = false;
Document document = xmlStructure.xmlResult.document;
Element element = (Element) xmlStructure.xmlResult.stack.peek();
CDATASection section = document.createCDATASection(xmlStructure.stack.peek().characters);
element.appendChild(section);
}
When I serialize this to an XML file I use the following line to configure the transformer:
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "con:setting");
Never the less no <![CDATA[ tags appear in my XML file and instead all backets are escaped to > and <, this is no problem for other tools but it is a problem for humans who need to read the file as well. I am positive that the "con:setting" tag is the right one. So is there maybe a problem with the namespace prefix?
Also this question indicates that it is not possible to omit the CDATA_SECTION_ELEMENTS property and generally serialize all CDATA nodes without escaping the data. Is that information correct, or are there maybe other methods that the author of the answer was not aware of?
Update: It seems I had a mistake in my code. When using the document.createCDATASection() function, and then serializing the code with the Transformer it DOES output CDATA tags, even without the use of the CDATA_SECTION_ELEMENTS property in the transformer.
It looks like you have a namespace-aware DOM. The docs say you need to provide the Qualified Name Representation of the element:
private static String qualifiedNameRepresentation(Element e) {
String ns = e.getNamespaceURI();
String local = e.getLocalName();
return (ns == null) ? local : '{' + ns + '}' + local;
}
So the value of the property will be of the form {http://your.conn.namespace}setting.
In this line
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "con:setting");
try replacing "con:setting" with "{http://con.namespace/}setting"
using the appropriate namespace
Instead of using a no-op Transformer to serialize your DOM tree you could try using the DOM-native "load and save" mechanism, which should preserve the CDATASection nodes from the DOM tree and write them as CDATA sections in the resulting XML.
DOMImplementationLS ls = (DOMImplementationLS)document.getImplementation();
LSOutput output = ls.createLSOutput();
LSSerializer ser = ls.createLSSerializer();
try (FileOutputStream outStream = new FileOutputStream(...)) {
output.setByteStream(outStream);
output.setEncoding("UTF-8");
ser.write(document, output);
}
Is there a way I could tell the xml transformer to sort alphabetically all the attributes for the tags of a given XML? So lets say...
<MyTag paramter1="lol" andTheOtherThing="potato"/>
Would turn into
<MyTag andTheOtherThing="potato" paramter1="lol"/>
I saw how to format it from the examples I found here and here, but sorting the tag attributes would be the last issue I have.
I was hoping there was something like:
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.SORTATT, "yes"); // <-- no such thing
Which seems to be what they say:
http://docs.oracle.com/javase/1.4.2/docs/api/javax/xml/transform/OutputKeys.html
As mentioned, by forty-two, you can make canonical XML from the XML and that will order the attributes alphabetically for you.
In Java we can use something like Apache's Canonicalizer
org.apache.xml.security.c14n.Canonicalizer
Something like this (assuming that the Document inXMLDoc is already a DOM):
Document retDoc;
byte[] c14nOutputbytes;
DocumentBuilderFactory factory;
DocumentBuilder parser;
// CANONICALIZE THE ORIGINAL DOM
c14nOutputbytes = Canonicalizer.getInstance(Canonicalizer.ALGO_ID_C14N_WITH_COMMENTS).canonicalizeSubtree(inXMLDoc.getDocumentElement());
// PARSE THE CANONICALIZED BYTES (IF YOU WANT ANOTHER DOM) OR JUST USE THE BYTES
factory = DocumentBuilderFactory.newInstance();
factory.set ... // SETUP THE FACTORY
parser = factory.newDocumentBuilder();
// REPARSE TO GET ANOTHER DOM WITH THE ATTRIBUTES IN ALPHA ORDER
ByteArrayInputStream bais = new ByteArrayInputStream(c14nOutputbytes);
retDoc = parser.parse(bais);
Other things will get changed when Canonicalizing of course (it will become Canonical XML http://en.wikipedia.org/wiki/Canonical_XML) so just expect some changes other than the attribute order.
I get a XML file from website (http://www.abc.com/),
URL is: http://www.abc.com/api/api.xml
content is:
<?xml version="1.0" encoding="utf-8"?>
<root xmlns="http://www.abc.com/">
<name>Hello!</name>
</root>
it has xmlns="http://www.abc.com/" in XML file,
now, I using JDOM XPath to get text Hello!
XPath xpath = XPath.newInstance("/root/name/text()");
SAXBuilder builder = new SAXBuilder();
Document doc = builder.build(new URL("http://www.abc.com/api/api.xml"));
System.out.println(xpath.valueOf(doc)); //nothing to print...
I test to remove xmlns="http://www.abc.com/" from XML file, it's be work!
how to change my java code to get Hello!, if xmlns="http://www.abc.com/" exist?
(I can't chagne this XML file)
thanks for help :)
You'll need to make the query aware of the xml namespace. This answer here looks like it will do the trick:
Default Xml Namespace JDOM and XPATH
You might also change your query to use local-name to ignore namespaces:
XPath xpath = XPath.newInstance("/*[local-name() = 'root']");
That should return the node named root. That is, if it supports it and I typed it correctly! :) I'm not familiar with java API's for XML + XPATH.
Be aware that xml namespaces exist to distinguish node 'root' from any other node named 'root'. Just like class/package namespaces. Ignoring them could lead to a name collision. Your milage may vary.
HTH,
Zach
I have not done this recently. But a quick search found
http://illegalargumentexception.blogspot.com/2009/05/java-using-xpath-with-namespaces-and.html
which point to the usage of a XPathFactory:
NamespaceContext context = new NamespaceContextMap("http://www.abc.com/" );
Or, you could use Zach's answer and just ignore the given namespace (if i understood him right). This could lead to problems if there are more 'root' nodes at the same hierarchy level..