Create xom Document from an xml string fragment containing namespace prefixes - java

I have a file containing an xml fragment. I need to add a child element into this file and resave it. I'm trying to use xom in Java (1.6).
The problem is that the data in the file contains a namespace prefix so when I construct my Document object I get :
[Fatal Error] tsip:1:33: The prefix "tsip" for attribute "tsip:action" associated with an element type "publications" is not bound.
The file contains eg:
<publications tsip:action="replace">
<publication tsip:dw="000000" tsip:status="dwpi:equivalent" tsip:lang="ja" tsip:drawings="0">
<documentId>
<number tsip:form="dwpi">58071346</number>
<countryCode>JP</countryCode>
<kindCode>A</kindCode>
</documentId>
</publication>
</publications>
My Java code is :
FileInputStream fisTargetFile;
// Read file into a string
fisTargetFile = new FileInputStream(new File("C:\myFileName"));
pubLuStr = IOUtils.toString(fisTargetFile, "UTF-8");
Document doc = new Builder().build(pubLuStr, ""); // This fails
I suspect I need to make the code namespace aware ie add something like:
doc.getRootElement().addNamespaceDeclaration("tsip", "http://schemas.thomson.com/ts/20041221/tsip");
but I can't see how to do this BEFORE i create the Document doc.
Any help , suggestions, appreciated.

One solution is to read the xml fragment into a string, and then wrap it in a dummy tag containing the namespace declaration. For example:
StringBuilder sb = new StringBuilder();
sb.append("<tag xmlns:tsip=\"http://schemas.thomson.com/ts/20041221/tsip\">");
String xmlFrag = new String(Files.readAllBytes(Paths.get("C:/myFileName")));
sb.append(xmlFrag);
sb.append("</tag>");
Builder parser = new Builder();
Document doc = parser.build(sb.toString(), null);
You can then happily parse the string into an XOM Document, and add the required changes before saving the revised fragment. To strip out the wrapper you can use XOM to pull out the fragment from the document by searching for the first real tag, e.g.
Element wrapperRoot = doc.getRootElement();
Element realRoot = root.getFirstChildElement("publications");
Then use XOM as normal to write out he revised fragment.

Related

How to append the Elements to existing Nodelist during the parse of XSD file in Java DocumentBuilder

Application Background:
Basically, I am building an application in which I am parsing the XML document using SAX PARSER for every incoming tag I would like to know its datatype and other information so I am using the XSD associated with that XML file to get the datatype and other information related to those tags. Hence, I am parsing the XSD file and storing all the information in Hashmap so that whenever the tag comes I can pass that XML TAG as key to my Hashmap and obtain the value (information associated with it which is obtained during XSD parsing) associated with it.
Problem I am facing:
As of now, I am able to parse my XSD using the DocumentBuilderFactory. But during the collection of elements, I am able to get only one type of element and store it in my NODELIST such as elements with tag name "xs:element". My XSD also has some other element type such as "xs:complexType", xs:any etc. I would like to read all of them and store them into a single NODELIST which I can later loop and push to HASHMAP. However I am unable to add any additional elements to my NODELIST after adding one type to it:
Below code will add tags with the xs:element
NodeList list = doc.getElementsByTagName("xs:element");
How can I add the tags with xs:complexType and xs:any to the same NODELIST?
Is this a good way to find the datatype and other attributes of the XSD or any other better approach available. As I may need to hit the HASHMAP many times for every TAG in XML will there be a performance issue?
Is DocumentBuilderFactory is a good approach to parse XML or are there any better libaraies for XSD parsing? I looked into Xerces2 but could not find any good example and I got struck and posted the question here.
Following is my code for parsing the XSD using DocumentBuilderFactory:
public class DOMParser {
private static Map<String, Element> xmlTags = new HashMap<String, Element>();
public static void main(String[] args) throws URISyntaxException, SAXException, IOException, ParserConfigurationException {
String xsdPath1 = Paths.get(Xerces2Parser.class.getClassLoader().getResource("test.xsd").toURI()).toFile().getAbsolutePath();
String filePath1 = Path.of(xsdPath1).toString();
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File(filePath1));
NodeList list = doc.getElementsByTagName("xs:element");
System.out.println(list.getLength());
// How to add the xs:complexType to same list as above
// list.add(doc.getElementsByTagName("xs:complexType"));
// list = doc.getElementsByTagName("xs:complexType");
// Loop and add data to Map for future lookups
for (int i = 0; i < list.getLength(); i++) {
Element element = (Element) list.item(i);
if (element.hasAttributes()) {
xmlTags.put(element.getAttribute("name"), element);
}
}
}
}
I don't know what you are trying to achieve (you have described the code you are writing, not the problem it is designed to solve) but what you are doing seems misguided. Trying to get useful information out of an XSD schema by parsing it at the XML level is really hard work, and it's clear from the questions you are asking that you haven't appreciated the complexities of what you are attempting.
It's hard to advise you on the low-level detail of maintaining hash maps and node lists when we don't understand what you are trying to achieve. What information are you trying to extract from the schema, and why?
There are a number of ways of getting information out of a schema at a higher level. Xerces has a Java API for accessing a compiled schema. Saxon has an XML representation of compiled schemas called SCM (the difference from raw XSD is that all the work of expanding xs:include and xs:import, expanding attribute groups, model groups, and substitution groups etc has been done for you). Saxon also has an XPath API (a set of extension functions) for accessing compiled schema information.

Remove DOCTYPE and its containing tags from xml using XmlOptions

I have below request and i want to remove DOCTYPE and its contaning ENTITY tag(s) . i dont have parser access but in class i can pass XMlOptions so is there any way that i can remove DOCTYPE using XMLOptions , so XMl enity expansion vunerbility will removed
request i am using to send
<!DOCTYPE foo [
<!ENTITY xeebri2n0 "o16ja">
<!ENTITY xeebri2n1 "&xeebri2n0;&xeebri2n0;">
<!ENTITY xeebri2n2 "&xeebri2n1;&xeebri2n1;">
<!ENTITY xeebri2n3 "&xeebri2n2;&xeebri2n2;">]>
<SubmitPaymentRequest xmlns="http://www.qwest.com/XMLSchema" xmlns:bim="http://www.qwest.com/XMLSchema/BIM">
<EPWFHeaderInfo>
<RequestId>IR1BCSRDQBSIRW7745 &xeebri2n3;<RequestId>
<SendTimeStamp>2019-12-23T14:23:01.183-05:00<SendTimeStamp>
<MessageSrcSystem>IPS<MessageSrcSystem>
</EPWFHeaderInfo>
</SubmitPaymentRequest>
EPWFSubmitPaymentEventHandler.java class where i am using above class to parse
EPWFSubmitPaymentEventHandler{
public String handleEventMessage(String inXml, XmlObject xmlBean, Map<String, String> metaInfo) {
SubmitPaymentRequestWrapper request = new SubmitPaymentRequestWrapper(inXml);
}
}
class where i am parsing the xml.
SubmitPaymentRequestWrapper {
public SubmitPaymentRequestWrapper(String reqXml, XmlOptions options) throws XmlException {
this(SubmitPaymentRequestDocument.Factory.parse(reqXml, options));
}
}
in Above SubmitPaymentRequestWrapper.java class i can not access SubmitPaymentRequestDocument.java So,
is there any way that i can disbale or remove the DOCTYPE using passing XMLOptions ?
We usually don't remove the doctype elements manually, but parametrise the the parser to ignore it. How you can do that depends unfortunately strong on which parser it is. In JAXB you can do it like this:
XMLInputFactory xif = XMLInputFactory.newFactory();
xif.setProperty(XMLInputFactory.SUPPORT_DTD, false);
// xif.setProperty(XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES, false);
XMLStreamReader xsr = xif.createXMLStreamReader(new StreamSource("test.xml"));
The line with XMLInputFactory.SUPPORT_DTD will disable the doctype completely. If I recall it correctly all entities defined will be replaced with empty strings then (don't take my word - test it).
The line with XMLInputFactory.IS_SUPPORTING_EXTERNAL_ENTITIES is there, because I have found out, that XML parsers in Java's default configuration are not vulnerable to XML bomb attacks (like the one you have in the XML). The attack will stop after 60k iterations quite fast (remember - don't take my word). So after a lot of testing I decided to stop only external entities, which are a nuisance and an insecure default in Java.
If you don't use JAXB but JDOM, then external entity prevention will look differently:
SAXBuilder builder = new SAXBuilder();
File xmlFile = new File("test.xml");
builder.setExpandEntities(false);
Document document = builder.build(xmlFile);
It's different in Dom4J too:
SAXReader reader = new SAXReader();
reader.setFeature("http://xml.org/sax/features/external-general-entities", false);
Document document = reader.read("test.xml");

How to get all XML branches

How can I get all XML branches using Java.
For example if i have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<addresses xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='test.xsd'>
<address>
<name>Joe Tester</name>
<street>Baker street 5</street>
</address>
<person>
<name>Joe Tester</name>
<age>44</age>
</person>
</addresses>
I want to obtain the following branches:
addresses
addresses_address
addresses_address_name
addresses_address_street
addresses_person
addresses_person_name
addresses_person_age
Thanks.
You can get XML root, its' node and sub node names easily using any template engine. i.e Velocity, FreeMarker and other, FreeMarker have powerful new facilities for XML processing. You can drop XML documents into the data model, and templates can pull data from them in a variety of ways, such as with XPath expressions. FreeMarker, as an XML transformation tool with the much better-known XSLT stylesheet approach promulgated by the Worldwide Web Consortium (W3C).
FrerMarker support XPath to using jaxen,XPath expression needs Jaxen. downlaod
FreeMarker will use Xalan, unless you choose Jaxen by calling freemarker.ext.dom.NodeModel.useJaxenXPathSupport() from Java.
Just you need One Template, that will generate all XML branches according to input XML. really Put any XML on run-time to data model freemarker will process the template and generate XML branches corresponding to that XML structure. If your XML structure will change then no need of to change your Java code. Even if you want to change the output then changes will comes in template file hence no need recompilation Java code.
Just change in template, get get changes on the fly.
FTL File [One template for multiple XML document for creating xml branch names]
<#list doc ['/*' ] as rootNode>
<#assign rootNodeValue="${rootNode?node_name}">
${rootNodeValue}
<#list doc ['/*/*' ] as childNodes>
<#if childNodes?is_node==true>
${rootNodeValue}-${childNodes?node_name}
<#list doc ['/*/${childNodes?node_name}/*' ] as subNodes>
${rootNodeValue}-${childNodes?node_name}-${subNodes?node_name}
</#list>
</#if>
</#list>
</#list>
XMLTest.Java for process template
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import freemarker.ext.dom.NodeModel;
import freemarker.template.Configuration;
import freemarker.template.DefaultObjectWrapper;
import freemarker.template.ObjectWrapper;
import freemarker.template.Template;
import freemarker.template.TemplateException;
public class XMLTest {
public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TemplateException {
Configuration config = new Configuration();
config.setClassForTemplateLoading(XMLTest.class, "");
config.setObjectWrapper(new DefaultObjectWrapper());
config.setObjectWrapper(ObjectWrapper.BEANS_WRAPPER);
Map<String, Object> dataModel = new HashMap<String, Object>();
//load xml
InputStream stream = XMLTest.class.getClassLoader().getResourceAsStream(xml_path);
// if you xml sting then then pass it from InputSource constructor, no need of load xml from dir
InputSource source = new InputSource(stream);
NodeModel xmlNodeModel = NodeModel.parse(source);
dataModel.put("doc", xmlNodeModel);
Template template = config.getTemplate("test.ftl");
StringWriter out = new StringWriter();
template.process(dataModel, out);
System.out.println(out.getBuffer().toString());
}
}
Final OutPut
addresses
addresses-address
addresses-address-name
addresses-address-street
addresses-person
addresses-person-name
addresses-person-age
See doc for 1.XML Node Model 2.XML Node MOdel
Download FreeMarker from here
Downlaod Jaxen from here
There are many ways that you can extract data from XML and use it in Java. The one you choose will depend on how you want to use the data.
Some scenarios are:
You might want to manipulate nodes, order, remove and add others and transform the XML.
You might just want to read (and possibly change) the text contained in elements and attributes.
You might have a very large file and you just want to find some particular data and ignore the rest of the file.
For scenario #3, the best option is some memory-efficient stream-based parser, such as SAX or XML reader with the StAX API.
You can also use that for scenario #2, if you do mostly reading (and not writing), but DOM-based APIs might be easier to work with. You can use the standard DOM org.w3c.dom API or a more Java-like API such as JDOM or DOM4J. If you wish to synchronize XML files with Java objects you also might want to use a full Java-XML mapping framework such as JAXB.
DOM APIs are also great for scenario #1, but in many cases it might be simpler to use XSLT (via the javax.xml.transform TrAX API in Java). If you use DOM you can also use XPath to select the nodes.
I will show you an example on how to extract the individual nodes of your file using the standard DOM API (org.w3c.dom) and also using XPath (javax.xml.xpath).
1. Setup
Initialize the parser:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Parse file into a Document Object Model:
Document source = builder.parse(new File("src/main/resources/addresses.xml"));
2. Selecting nodes with J2SE DOM
You get the root element using getDocumentElement():
Element addresses = source.getDocumentElement();
From there you can get the child nodes using getChildNodes() but that will return all child nodes, which includes text nodes (the whitespace between elements). addresses.getChildNodes().item(0) returns the whitespace after the <addresses> tag and before the <address> tag. To get the element you would have to go for the second item. An easier way to do that is use getElementsByTagName, which returns a node-set and get the first item:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Many of the DOM methods return org.w3c.dom.Node objects, which you have to cast. Sometimes they might not be Element objects so you have to check. Node sets are not automatically converted into arrays. They are org.w3c.dom.NodeList so you have to use .item(0) and not [0] (if you use other DOM APIs such as JDOM or DOM4J, it will seem more intuitive).
You could use addresses.getElementsByTagName to get all the elements you need, but you would have to deal with the context for the two <name> elements. So a better way is to call it in the appropriate context:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Element addresses_address_name = (Element)addresses_address.getElementsByTagName("name").item(0);
Element addresses_address_street = (Element)addresses_address.getElementsByTagName("street").item(0);
Element addresses_person = (Element)addresses.getElementsByTagName("person").item(0);
Element addresses_person_name = (Element)addresses_person.getElementsByTagName("name").item(0);
Element addresses_person_age = (Element)addresses_person.getElementsByTagName("age").item(0);
That will give you all the Element nodes (or branches as you called them) for your file. If you want the text nodes (as actual Node objects) you need to get it as the first child:
Node textNode = addresses2_address_street.getFirstChild();
And if you want the String contents you can use:
String street = addresses2_address_street.getTextContent();
3. Selecting nodes with XPath
Another way to select nodes is using XPath. You will need the DOM source and you also need to initialize the XPath processor:
XPath xPath = XPathFactory.newInstance().newXPath();
You can extract the root node like this:
Element addresses = (Element)xPath.evaluate("/addresses", source, XPathConstants.NODE);
And all the other nodes using a path-like syntax:
Element addresses_address = (Element)xPath.evaluate("/addresses/address", source, XPathConstants.NODE);
Element addresses_address_name = (Element)xPath.evaluate("/addresses/address/name", source, XPathConstants.NODE);
Element addresses_address_street = (Element)xPath.evaluate("/addresses/address/street", source, XPathConstants.NODE);
You can also use relative paths, choosing a different element as the root:
Element addresses_person = (Element)xPath.evaluate("person", addresses, XPathConstants.NODE);
Element addresses_person_name = (Element)xPath.evaluate("person/name", addresses, XPathConstants.NODE);
Element addresses_person_age = (Element)xPath.evaluate("age", addresses_person, XPathConstants.NODE);
You can get the text contents as before, since you have Element objects:
String addressName = addresses_address_name.getTextContent();
But you can also do it directly using the same methods above without the last argument (which defaults to string). Here I'm using different relative and absolute XPath expressions:
String addressName = xPath.evaluate("name", addresses_address);
String addressStreet = xPath.evaluate("address/street", addresses);
String personName = xPath.evaluate("name", addresses_person);
String personAge = xPath.evaluate("/addresses/person/age", source);

Serializing Java DOM Document to XML: Add CData Elements

I am constructing an XML DOM Document with a SAX parser. I have written methods to handle the startCDATA and endCDATA methods and in the endCDATA method I construct a new CDATA section like this:
public void onEndCData() {
xmlStructure.cData = false;
Document document = xmlStructure.xmlResult.document;
Element element = (Element) xmlStructure.xmlResult.stack.peek();
CDATASection section = document.createCDATASection(xmlStructure.stack.peek().characters);
element.appendChild(section);
}
When I serialize this to an XML file I use the following line to configure the transformer:
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "con:setting");
Never the less no <![CDATA[ tags appear in my XML file and instead all backets are escaped to > and <, this is no problem for other tools but it is a problem for humans who need to read the file as well. I am positive that the "con:setting" tag is the right one. So is there maybe a problem with the namespace prefix?
Also this question indicates that it is not possible to omit the CDATA_SECTION_ELEMENTS property and generally serialize all CDATA nodes without escaping the data. Is that information correct, or are there maybe other methods that the author of the answer was not aware of?
Update: It seems I had a mistake in my code. When using the document.createCDATASection() function, and then serializing the code with the Transformer it DOES output CDATA tags, even without the use of the CDATA_SECTION_ELEMENTS property in the transformer.
It looks like you have a namespace-aware DOM. The docs say you need to provide the Qualified Name Representation of the element:
private static String qualifiedNameRepresentation(Element e) {
String ns = e.getNamespaceURI();
String local = e.getLocalName();
return (ns == null) ? local : '{' + ns + '}' + local;
}
So the value of the property will be of the form {http://your.conn.namespace}setting.
In this line
transformer.setOutputProperty(OutputKeys.CDATA_SECTION_ELEMENTS, "con:setting");
try replacing "con:setting" with "{http://con.namespace/}setting"
using the appropriate namespace
Instead of using a no-op Transformer to serialize your DOM tree you could try using the DOM-native "load and save" mechanism, which should preserve the CDATASection nodes from the DOM tree and write them as CDATA sections in the resulting XML.
DOMImplementationLS ls = (DOMImplementationLS)document.getImplementation();
LSOutput output = ls.createLSOutput();
LSSerializer ser = ls.createLSSerializer();
try (FileOutputStream outStream = new FileOutputStream(...)) {
output.setByteStream(outStream);
output.setEncoding("UTF-8");
ser.write(document, output);
}

dom4j XML declaration in document

I need to remove XML declaration from dom4j document type
I am creating document by
doc = (Document) DocumentHelper.parseText(someXMLstringWithoutXMLDeclaration);
String parsed to Document doc by DocumenHelper contains no XML declaration (it comes from XML => XSL => XML transformation)
I think that DocumentHelper is adding declaration to a document body ?
Is there any way to remove XML declaration from the body of
doc
The simpler solution I choose is
doc.getRootElement().asXML();
I'm not sure where exactly this the declaration is a problem in your code.
I had this once when I wanted to write an xml file without declaration (using dom4j).
So if this is your use case: "omit declaration" is what you're looking for.
http://dom4j.sourceforge.net/dom4j-1.6.1/apidocs/org/dom4j/io/OutputFormat.html
Google says this can be set as a property as well, not sure what it does though.
You need to interact with the root element instead of the document.
For example, using the default, compact OutputFormat mentioned by PhilW:
Document doc = (Document) DocumentHelper.parseText(someXMLstringWithoutXMLDeclaration);
final Writer writer = new StringWriter();
new XMLWriter(writer).write(doc.getRootElement());
String out = writer.toString();

Categories