Parse XML escaped in CDATA mixed with invalid HTML

Parse XML escaped in CDATA mixed with invalid HTML - java

I have the below element in a web service response. As you can see, it's escaped XML dumped as CDATA, so the XML parser just looks at it as a string and I'm unable to get the data I need from it through the usual means of XSLT and XPath. I need to turn this ugly string back into XML so that I can read it properly.
I have tried to do a search replace and simply converted all < to < and > to > and this works great, but there is a problem: The message.body element can actually contain HTML which is not valid XML. Might not even be valid HTML for all I know. So if I just replace everything, this will probably crash when I try to turn the string back into an XML document.
How can I unescape this safely? Is there a good way to do the replacement in the whole string except between the message.body open and closing tags for example?
<output><item type="object">
<ticket.id type="string">171</ticket.id>
<ticket.title type="string">SoapUI Test</ticket.title>
<ticket.created_at type="string">2013-12-03 12:50:54</ticket.created_at>
<ticket.status type="string">Open</ticket.status>
<updated type="string">false</updated>
<message type="object">
<message.id type="string">520</message.id>
<message.created_at type="string">2013-12-03 12:50:54.000</message.created_at>
<message.author type="string"/>
<message.body type="string">Just a test message...</message.body>
</message>
<message type="object">
<message.id type="string">521</message.id>
<message.created_at type="string">2013-12-03 13:58:32.000</message.created_at>
<message.author type="string"/>
<message.body type="string">Another message!</message.body>
</message>
</item>
</output>

This is actually lifted from the project i'm working on right now.
private Node stringToNode(String textContent) {
Element node = null;
try {
node = DocumentBuilderFactory.newInstance().newDocumentBuilder()
.parse(new ByteArrayInputStream(textContent.getBytes()))
.getDocumentElement();
} catch (SAXException e) {
logger.error(e.getMessage(), e);
} catch (IOException e) {
logger.error(e.getMessage(), e);
} catch (ParserConfigurationException e) {
logger.error(e.getMessage(), e);
}
return node;
}
This will give you a document object representing the string. I use this to get this back into the original document:
if (textContent.contains(XML_HEADER)) {
textContent = textContent.substring(textContent.indexOf(XML_HEADER) + XML_HEADER.length());
}
Node newNode = stringToNode(textContent);
if (newNode != null) {
Node importedNode = soapBody.getOwnerDocument().importNode(newNode, true);
nextChild.setTextContent(null);
nextChild.appendChild(importedNode);
}

This is my current solution. You give it an XPath for the nodes that are messed up and a set of element names that might include messed up HTML and other problems. Works roughly as follows
Pull out text content of nodes matched by XPATH
Run regex to wrap problematic child elements in CDATA
Wrap text in temporary element (otherwise it crashes if there are multiple root nodes)
Parse text back to DOM
Add child nodes of temporary node back in place of previous text content.
The regex solution in step 2 is probably not fool-proof, but don't really see a better solution at the moment. If you do, let me know!
CDataFixer
import java.util.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
public class CDataFixer
{
private final XmlHelper xml = XmlHelper.getInstance();
public Document fix(Document document, String nodesToFix, Set<String> excludes) throws XPathExpressionException, XmlException
{
return fix(document, xml.newXPath().compile(nodesToFix), excludes);
}
private Document fix(Document document, XPathExpression nodesToFix, Set<String> excludes) throws XPathExpressionException, XmlException
{
Document wc = xml.copy(document);
NodeList nodes = (NodeList) nodesToFix.evaluate(wc, XPathConstants.NODESET);
int nodeCount = nodes.getLength();
for(int n=0; n<nodeCount; n++)
parse(nodes.item(n), excludes);
return wc;
}
private void parse(Node node, Set<String> excludes) throws XmlException
{
String text = node.getTextContent();
for(String exclude : excludes)
{
String regex = String.format("(?s)(<%1$s\\b[^>]*>)(.*?)(</%1$s>)", Pattern.quote(exclude));
text = text.replaceAll(regex, "$1<![CDATA[$2]]>$3");
}
String randomNode = "tmp_"+UUID.randomUUID().toString();
text = String.format("<%1$s>%2$s</%1$s>", randomNode, text);
NodeList parsed = xml
.parse(text)
.getFirstChild()
.getChildNodes();
node.setTextContent(null);
for(int n=0; n<parsed.getLength(); n++)
node.appendChild(node.getOwnerDocument().importNode(parsed.item(n), true));
}
}
XmlHelper
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.*;
import javax.xml.transform.sax.*;
import javax.xml.transform.stream.*;
import javax.xml.xpath.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public final class XmlHelper
{
private static final XmlHelper instance = new XmlHelper();
public static XmlHelper getInstance()
{
return instance;
}
private final SAXTransformerFactory transformerFactory;
private final DocumentBuilderFactory documentBuilderFactory;
private final XPathFactory xpathFactory;
private XmlHelper()
{
documentBuilderFactory = DocumentBuilderFactory.newInstance();
documentBuilderFactory.setNamespaceAware(true);
xpathFactory = XPathFactory.newInstance();
TransformerFactory tf = TransformerFactory.newInstance();
if (!tf.getFeature(SAXTransformerFactory.FEATURE))
throw new RuntimeException("Failed to create SAX-compatible TransformerFactory.");
transformerFactory = (SAXTransformerFactory) tf;
}
public DocumentBuilder newDocumentBuilder()
{
try
{
return documentBuilderFactory.newDocumentBuilder();
}
catch (ParserConfigurationException e)
{
throw new RuntimeException("Failed to create new "+DocumentBuilder.class, e);
}
}
public XPath newXPath()
{
return xpathFactory.newXPath();
}
public Transformer newIdentityTransformer(boolean omitXmlDeclaration, boolean indent)
{
try
{
Transformer transformer = transformerFactory.newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, indent ? "yes" : "no");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, omitXmlDeclaration ? "yes" : "no");
return transformer;
}
catch (TransformerConfigurationException e)
{
throw new RuntimeException("Failed to create Transformer instance: "+e.getMessage(), e);
}
}
public Templates newTemplates(String xslt) throws XmlException
{
try
{
return transformerFactory.newTemplates(new DOMSource(parse(xslt)));
}
catch (TransformerConfigurationException e)
{
throw new RuntimeException("Failed to create templates: "+e.getMessage(), e);
}
}
public Document parse(String xml) throws XmlException
{
return parse(new InputSource(new StringReader(xml)));
}
public Document parse(InputSource xml) throws XmlException
{
try
{
return newDocumentBuilder().parse(xml);
}
catch (SAXException e)
{
throw new XmlException("Failed to parse xml: "+e.getMessage(), e);
}
catch (IOException e)
{
throw new XmlException("Failed to read xml: "+e.getMessage(), e);
}
}
public String toString(Node node)
{
return toString(node, true, false);
}
public String toString(Node node, boolean omitXMLDeclaration, boolean indent)
{
try
{
StringWriter writer = new StringWriter();
newIdentityTransformer(omitXMLDeclaration, indent)
.transform(new DOMSource(node), new StreamResult(writer));
return writer.toString();
}
catch (TransformerException e)
{
throw new RuntimeException("Failed to transform XML into string: " + e.getMessage(), e);
}
}
public Document copy(Document document)
{
DOMSource source = new DOMSource(document);
DOMResult result = new DOMResult();
try
{
newIdentityTransformer(true, false)
.transform(source, result);
return (Document) result.getNode();
}
catch (TransformerException e)
{
throw new RuntimeException("Failed to copy XML: " + e.getMessage(), e);
}
}
}

Related

How can I obtain a list of String with the values with XPath? [duplicate]

I want to read XML data using XPath in Java, so for the information I have gathered I am not able to parse XML according to my requirement.
here is what I want to do:
Get XML file from online via its URL, then use XPath to parse it, I want to create two methods in it. One is in which I enter a specific node attribute id, and I get all the child nodes as result, and second is suppose I just want to get a specific child node value only
<?xml version="1.0"?>
<howto>
<topic name="Java">
<url>http://www.rgagnonjavahowto.htm</url>
<car>taxi</car>
</topic>
<topic name="PowerBuilder">
<url>http://www.rgagnon/pbhowto.htm</url>
<url>http://www.rgagnon/pbhowtonew.htm</url>
</topic>
<topic name="Javascript">
<url>http://www.rgagnon/jshowto.htm</url>
</topic>
<topic name="VBScript">
<url>http://www.rgagnon/vbshowto.htm</url>
</topic>
</howto>
In above example I want to read all the elements if I search via #name and also one function in which I just want the url from #name 'Javascript' only return one node element.

You need something along the lines of this:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
Then you call expr.evaluate() passing in the document defined in that code and the return type you are expecting, and cast the result to the object type of the result.
If you need help with a specific XPath expressions, you should probably ask it as separate questions (unless that was your question in the first place here - I understood your question to be how to use the API in Java).
Edit: (Response to comment): This XPath expression will get you the text of the first URL element under PowerBuilder:
/howto/topic[#name='PowerBuilder']/url/text()
This will get you the second:
/howto/topic[#name='PowerBuilder']/url[2]/text()
You get that with this code:
expr.evaluate(doc, XPathConstants.STRING);
If you don't know how many URLs are in a given node, then you should rather do something like this:
XPathExpression expr = xpath.compile("/howto/topic[#name='PowerBuilder']/url");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And then loop over the NodeList.

You can try this.
XML Document
Save as employees.xml.
<?xml version="1.0" encoding="UTF-8"?>
<Employees>
<Employee id="1">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="2">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
<Employee id="4">
<age>25</age>
<name>Meghan</name>
<gender>Female</gender>
<role>Manager</role>
</Employee>
</Employees>
Parser class
The class have following methods
List item
A Method that will return the Employee Name for input ID.
A Method that will return list of Employees Name with age greater than the input age.
A Method that will return list of Female Employees Name.
Source Code
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Parser {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
doc = builder.parse("employees.xml");
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
String name = getEmployeeNameById(doc, xpath, 4);
System.out.println("Employee Name with ID 4: " + name);
List<String> names = getEmployeeNameWithAge(doc, xpath, 30);
System.out.println("Employees with 'age>30' are:" + Arrays.toString(names.toArray()));
List<String> femaleEmps = getFemaleEmployeesName(doc, xpath);
System.out.println("Female Employees names are:" +
Arrays.toString(femaleEmps.toArray()));
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
}
private static List<String> getFemaleEmployeesName(Document doc, XPath xpath) {
List<String> list = new ArrayList<>();
try {
//create XPathExpression object
XPathExpression expr =
xpath.compile("/Employees/Employee[gender='Female']/name/text()");
//evaluate expression result on XML document
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++)
list.add(nodes.item(i).getNodeValue());
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return list;
}
private static List<String> getEmployeeNameWithAge(Document doc, XPath xpath, int age) {
List<String> list = new ArrayList<>();
try {
XPathExpression expr =
xpath.compile("/Employees/Employee[age>" + age + "]/name/text()");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++)
list.add(nodes.item(i).getNodeValue());
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return list;
}
private static String getEmployeeNameById(Document doc, XPath xpath, int id) {
String name = null;
try {
XPathExpression expr =
xpath.compile("/Employees/Employee[#id='" + id + "']/name/text()");
name = (String) expr.evaluate(doc, XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return name;
}
}

Getting started example:
xml file:
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
<!-- more books... -->
</inventory>
Java code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("c:\\tmp\\my.xml"));
// normalize text representation
doc.getDocumentElement().normalize();
System.out.println ("Root element of the doc is " + doc.getDocumentElement().getNodeName());
NodeList listOfBooks = doc.getElementsByTagName("book");
int totalBooks = listOfBooks.getLength();
System.out.println("Total no of books : " + totalBooks);
for(int i=0; i<listOfBooks.getLength() ; i++) {
Node firstBookNode = listOfBooks.item(i);
if(firstBookNode.getNodeType() == Node.ELEMENT_NODE) {
Element firstElement = (Element)firstBookNode;
System.out.println("Year :"+firstElement.getAttribute("year"));
//-------
NodeList firstNameList = firstElement.getElementsByTagName("title");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("title : " + ((Node)textFNList.item(0)).getNodeValue().trim());
}
}//end of for loop with s var
} catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line " + err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
} catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
} catch (Throwable t) {
t.printStackTrace ();
}

Here is an example of processing xpath with vtd-xml... for heavy duty XML processing it is second to none. here is the a recent paper on this subject Processing XML with Java – A Performance Benchmark
import com.ximpleware.*;
public class changeAttrVal {
public static void main(String s[]) throws VTDException,java.io.UnsupportedEncodingException,java.io.IOException{
VTDGen vg = new VTDGen();
if (!vg.parseFile("input.xml", false))
return;
VTDNav vn = vg.getNav();
AutoPilot ap = new AutoPilot(vn);
XMLModifier xm = new XMLModifier(vn);
ap.selectXPath("/*/place[#id=\"p14\" and #initialMarking=\"2\"]/#initialMarking");
int i=0;
while((i=ap.evalXPath())!=-1){
xm.updateToken(i+1, "499");// change initial marking from 2 to 499
}
xm.output("new.xml");
}
}

If you have a xml like below
<e:Envelope
xmlns:d = "http://www.w3.org/2001/XMLSchema"
xmlns:e = "http://schemas.xmlsoap.org/soap/envelope/"
xmlns:wn0 = "http://systinet.com/xsd/SchemaTypes/"
xmlns:i = "http://www.w3.org/2001/XMLSchema-instance">
<e:Header>
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
</e:Header>
<e:Body>
<n0:ForAnsiHeaderOperResponse xmlns:n0 = "http://systinet.com/wsdl/com/magicsoftware/ibolt/localhost/ForAnsiHeader/ForAnsiHeaderImpl#ForAnsiHeaderOper?KExqYXZhL2xhbmcvU3RyaW5nOylMamF2YS9sYW5nL1N0cmluZzs=">
<response i:type = "d:string">12--abc--pqr</response>
</n0:ForAnsiHeaderOperResponse>
</e:Body>
</e:Envelope>
and wanted to extract the below xml
<e:Header>
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
</e:Header>
The below code helps to achieve the same
public static void main(String[] args) {
File fXmlFile = new File("C://Users//abhijitb//Desktop//Test.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document;
Node result = null;
try {
document = dbf.newDocumentBuilder().parse(fXmlFile);
XPath xPath = XPathFactory.newInstance().newXPath();
String xpathStr = "//Envelope//Header";
result = (Node) xPath.evaluate(xpathStr, document, XPathConstants.NODE);
System.out.println(nodeToString(result));
} catch (SAXException | IOException | ParserConfigurationException | XPathExpressionException
| TransformerException e) {
e.printStackTrace();
}
}
private static String nodeToString(Node node) throws TransformerException {
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return (buf.toString());
}
Now if you want only the xml like below
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
You need to change the
String xpathStr = "//Envelope//Header"; to String xpathStr = "//Envelope//Header/*";

This shows you how to
Read in an XML file to a DOM
Filter out a set of Nodes with XPath
Perform a certain action on each of the extracted Nodes.
We will call the code with the following statement
processFilteredXml(xmlIn, xpathExpr,(node) -> {/*Do something...*/;});
In our case we want to print some creatorNames from a book.xml using "//book/creators/creator/creatorName" as xpath to perform a printNode action on each Node that matches the XPath.
Full code
#Test
public void printXml() {
try (InputStream in = readFile("book.xml")) {
processFilteredXml(in, "//book/creators/creator/creatorName", (node) -> {
printNode(node, System.out);
});
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private InputStream readFile(String yourSampleFile) {
return Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile);
}
private void processFilteredXml(InputStream in, String xpath, Consumer<Node> process) {
Document doc = readXml(in);
NodeList list = filterNodesByXPath(doc, xpath);
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
process.accept(node);
}
}
public Document readXml(InputStream xmlin) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(xmlin);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private NodeList filterNodesByXPath(Document doc, String xpathExpr) {
try {
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpr);
Object eval = expr.evaluate(doc, XPathConstants.NODESET);
return (NodeList) eval;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private void printNode(Node node, PrintStream out) {
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(node);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
out.println(xmlString);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Prints
<creatorName>Fosmire, Michael</creatorName>
<creatorName>Wertz, Ruth</creatorName>
<creatorName>Purzer, Senay</creatorName>
For book.xml
<book>
<creators>
<creator>
<creatorName>Fosmire, Michael</creatorName>
<givenName>Michael</givenName>
<familyName>Fosmire</familyName>
</creator>
<creator>
<creatorName>Wertz, Ruth</creatorName>
<givenName>Ruth</givenName>
<familyName>Wertz</familyName>
</creator>
<creator>
<creatorName>Purzer, Senay</creatorName>
<givenName>Senay</givenName>
<familyName>Purzer</familyName>
</creator>
</creators>
<titles>
<title>Critical Engineering Literacy Test (CELT)</title>
</titles>
</book>

Expanding on the excellent answer by #bluish and #Yishai, here is how you make the NodeLists and node attributes support iterators, i.e. the for(Node n: nodelist) interface.
Use it like:
NodeList nl = ...
for(Node n : XmlUtil.asList(nl))
{...}
and
Node n = ...
for(Node attr : XmlUtil.asList(n.getAttributes())
{...}
The code:
/**
* Converts NodeList to an iterable construct.
* From: https://stackoverflow.com/a/19591302/779521
*/
public final class XmlUtil {
private XmlUtil() {}
public static List<Node> asList(NodeList n) {
return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeListWrapper(n);
}
static final class NodeListWrapper extends AbstractList<Node> implements RandomAccess {
private final NodeList list;
NodeListWrapper(NodeList l) {
this.list = l;
}
public Node get(int index) {
return this.list.item(index);
}
public int size() {
return this.list.getLength();
}
}
public static List<Node> asList(NamedNodeMap n) {
return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeMapWrapper(n);
}
static final class NodeMapWrapper extends AbstractList<Node> implements RandomAccess {
private final NamedNodeMap list;
NodeMapWrapper(NamedNodeMap l) {
this.list = l;
}
public Node get(int index) {
return this.list.item(index);
}
public int size() {
return this.list.getLength();
}
}
}

Read XML file using XPathFactory, SAXParserFactory and StAX (JSR-173).
Using XPath get node and its child data.
public static void main(String[] args) {
String xml = "<soapenv:Body xmlns:soapenv='http://schemas.xmlsoap.org/soap/envelope/'>"
+ "<Yash:Data xmlns:Yash='http://Yash.stackoverflow.com/Services/Yash'>"
+ "<Yash:Tags>Java</Yash:Tags><Yash:Tags>Javascript</Yash:Tags><Yash:Tags>Selenium</Yash:Tags>"
+ "<Yash:Top>javascript</Yash:Top><Yash:User>Yash-777</Yash:User>"
+ "</Yash:Data></soapenv:Body>";
String jsonNameSpaces = "{'soapenv':'http://schemas.xmlsoap.org/soap/envelope/',"
+ "'Yash':'http://Yash.stackoverflow.com/Services/Yash'}";
String xpathExpression = "//Yash:Data";
Document doc1 = getDocument(false, "fileName", xml);
getNodesFromXpath(doc1, xpathExpression, jsonNameSpaces);
System.out.println("\n===== ***** =====");
Document doc2 = getDocument(true, "./books.xml", xml);
getNodesFromXpath(doc2, "//person", "{}");
}
static Document getDocument( boolean isFileName, String fileName, String xml ) {
Document doc = null;
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setIgnoringComments(true);
factory.setIgnoringElementContentWhitespace(true);
DocumentBuilder builder = factory.newDocumentBuilder();
if( isFileName ) {
File file = new File( fileName );
FileInputStream stream = new FileInputStream( file );
doc = builder.parse( stream );
} else {
doc = builder.parse( string2Source( xml ) );
}
} catch (SAXException | IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
return doc;
}
/**
* ELEMENT_NODE[1],ATTRIBUTE_NODE[2],TEXT_NODE[3],CDATA_SECTION_NODE[4],
* ENTITY_REFERENCE_NODE[5],ENTITY_NODE[6],PROCESSING_INSTRUCTION_NODE[7],
* COMMENT_NODE[8],DOCUMENT_NODE[9],DOCUMENT_TYPE_NODE[10],DOCUMENT_FRAGMENT_NODE[11],NOTATION_NODE[12]
*/
public static void getNodesFromXpath( Document doc, String xpathExpression, String jsonNameSpaces ) {
try {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
JSONObject namespaces = getJSONObjectNameSpaces(jsonNameSpaces);
if ( namespaces.size() > 0 ) {
NamespaceContextImpl nsContext = new NamespaceContextImpl();
Iterator<?> key = namespaces.keySet().iterator();
while (key.hasNext()) { // Apache WebServices Common Utilities
String pPrefix = key.next().toString();
String pURI = namespaces.get(pPrefix).toString();
nsContext.startPrefixMapping(pPrefix, pURI);
}
xpath.setNamespaceContext(nsContext );
}
XPathExpression compile = xpath.compile(xpathExpression);
NodeList nodeList = (NodeList) compile.evaluate(doc, XPathConstants.NODESET);
displayNodeList(nodeList);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
static void displayNodeList( NodeList nodeList ) {
for (int i = 0; i < nodeList.getLength(); i++) {
Node node = nodeList.item(i);
String NodeName = node.getNodeName();
NodeList childNodes = node.getChildNodes();
if ( childNodes.getLength() > 1 ) {
for (int j = 0; j < childNodes.getLength(); j++) {
Node child = childNodes.item(j);
short nodeType = child.getNodeType();
if ( nodeType == 1 ) {
System.out.format( "\n\t Node Name:[%s], Text[%s] ", child.getNodeName(), child.getTextContent() );
}
}
} else {
System.out.format( "\n Node Name:[%s], Text[%s] ", NodeName, node.getTextContent() );
}
}
}
static InputSource string2Source( String str ) {
InputSource inputSource = new InputSource( new StringReader( str ) );
return inputSource;
}
static JSONObject getJSONObjectNameSpaces( String jsonNameSpaces ) {
if(jsonNameSpaces.indexOf("'") > -1) jsonNameSpaces = jsonNameSpaces.replace("'", "\"");
JSONParser parser = new JSONParser();
JSONObject namespaces = null;
try {
namespaces = (JSONObject) parser.parse(jsonNameSpaces);
} catch (ParseException e) {
e.printStackTrace();
}
return namespaces;
}
XML Document
<?xml version="1.0" encoding="UTF-8"?>
<book>
<person>
<first>Yash</first>
<last>M</last>
<age>22</age>
</person>
<person>
<first>Bill</first>
<last>Gates</last>
<age>46</age>
</person>
<person>
<first>Steve</first>
<last>Jobs</last>
<age>40</age>
</person>
</book>
Out put for the given XPathExpression:
String xpathExpression = "//person/first";
/*OutPut:
Node Name:[first], Text[Yash]
Node Name:[first], Text[Bill]
Node Name:[first], Text[Steve] */
String xpathExpression = "//person";
/*OutPut:
Node Name:[first], Text[Yash]
Node Name:[last], Text[M]
Node Name:[age], Text[22]
Node Name:[first], Text[Bill]
Node Name:[last], Text[Gates]
Node Name:[age], Text[46]
Node Name:[first], Text[Steve]
Node Name:[last], Text[Jobs]
Node Name:[age], Text[40] */
String xpathExpression = "//Yash:Data";
/*OutPut:
Node Name:[Yash:Tags], Text[Java]
Node Name:[Yash:Tags], Text[Javascript]
Node Name:[Yash:Tags], Text[Selenium]
Node Name:[Yash:Top], Text[javascript]
Node Name:[Yash:User], Text[Yash-777] */
See this link for our own Implementation of NamespaceContext

Transform XML to declare all namespaces on root element

Is there a simple Java method way of "moving" all XML namespace declarations of an XML document to the root element? Due to a bug in parser implementation of an unnamed huge company, I need to programmatically rewrite our well formed and valid RPC requests in a way that the root element declares all used namespaces.
Not OK:
<document-element xmlns="uri:ns1">
<foo>
<bar xmlns="uri:ns2" xmlns:ns3="uri:ns3">
<ns3:foobar/>
<ns1:sigh xmlns:ns1="uri:ns1"/>
</bar>
</foo>
</document-element>
OK:
<document-element xmlns="uri:ns1" xmlns:ns1="uri:ns1" xmlns:ns2="uri:ns2" xmlns:ns3="uri:ns3">
<foo>
<ns2:bar>
<ns3:foobar/>
<ns1:sigh/>
</ns2:bar>
</foo>
</document-element>
Generic names for missing prefixes are acceptable. Default namespace may stay or be replaced/added as long as it is defined on the root element. I don't really mind which specific XML technology is used to achieve this (I would prefer to avoid DOM though).
To clarify, this answer refers to what I'd like to achieve as redeclaring namespace declarations within root element scope (entire document) on the root element. Essentially the related question is asking why oh why would anyone implement what I now need to work around.

Wrote a two-pass StAX reader/writer, which is simple enough.
import java.io.*;
import java.util.*;
import javax.xml.stream.*;
import javax.xml.stream.events.*;
public class NamespacesToRoot {
private static final String GENERATED_PREFIX = "pfx";
private final XMLInputFactory inputFact;
private final XMLOutputFactory outputFact;
private final XMLEventFactory eventFactory;
private NamespacesToRoot() {
inputFact = XMLInputFactory.newInstance();
outputFact = XMLOutputFactory.newInstance();
eventFactory = XMLEventFactory.newInstance();
}
public String transform(String xmlString) throws XMLStreamException {
Map<String, String> pfxToNs = new HashMap<String, String>();
XMLEventReader reader = null;
// first pass - analyze
try {
if (xmlString == null || xmlString.isEmpty()) {
throw new IllegalArgumentException("xmlString is null or empty");
}
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
buildNamespaces(event, pfxToNs);
}
}
System.out.println(pfxToNs);
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
}
// reverse mapping, also gets rid of duplicates
Map<String, String> nsToPfx = new HashMap<String, String>();
for (Map.Entry<String, String> entry : pfxToNs.entrySet()) {
nsToPfx.put(entry.getValue(), entry.getKey());
}
List<Namespace> namespaces = new ArrayList<Namespace>(nsToPfx.size());
for (Map.Entry<String, String> entry : nsToPfx.entrySet()) {
namespaces.add(eventFactory.createNamespace(entry.getValue(), entry.getKey()));
}
// second pass - rewrite
XMLEventWriter writer = null;
try {
StringWriter stringWriter = new StringWriter();
writer = outputFact.createXMLEventWriter(stringWriter);
StringReader stringReader = new StringReader(xmlString);
XMLStreamReader streamReader = inputFact.createXMLStreamReader(stringReader);
reader = inputFact.createXMLEventReader(streamReader);
boolean rootElement = true;
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isStartElement()) {
StartElement origStartElement = event.asStartElement();
String prefix = nsToPfx.get(origStartElement.getName().getNamespaceURI());
String namespace = origStartElement.getName().getNamespaceURI();
String localName = origStartElement.getName().getLocalPart();
Iterator attributes = origStartElement.getAttributes();
Iterator namespaces_;
if (rootElement) {
namespaces_ = namespaces.iterator();
rootElement = false;
} else {
namespaces_ = null;
}
writer.add(eventFactory.createStartElement(
prefix, namespace, localName, attributes, namespaces_));
} else {
writer.add(event);
}
}
writer.flush();
return stringWriter.toString();
} finally {
try {
if (reader != null) {
reader.close();
}
} catch (XMLStreamException ex) {
}
try {
if (writer != null) {
writer.close();
}
} catch (XMLStreamException ex) {
}
}
}
private void buildNamespaces(XMLEvent event, Map<String, String> pfxToNs) {
System.out.println("el: " + event);
StartElement startElement = event.asStartElement();
Iterator nsIternator = startElement.getNamespaces();
while (nsIternator.hasNext()) {
Namespace nsAttr = (Namespace) nsIternator.next();
if (nsAttr.isDefaultNamespaceDeclaration()) {
System.out.println("need to generate a prefix for " + nsAttr.getNamespaceURI());
generatePrefix(nsAttr.getNamespaceURI(), pfxToNs);
} else {
System.out.println("add prefix binding for " + nsAttr.getPrefix() + " --> " + nsAttr.getNamespaceURI());
addPrefix(nsAttr.getPrefix(), nsAttr.getNamespaceURI(), pfxToNs);
}
}
}
private void generatePrefix(String namespace, Map<String, String> pfxToNs) {
int i = 1;
String prefix = GENERATED_PREFIX + i;
while (pfxToNs.keySet().contains(prefix)) {
i++;
prefix = GENERATED_PREFIX + i;
}
pfxToNs.put(prefix, namespace);
}
private void addPrefix(String prefix, String namespace, Map<String, String> pfxToNs) {
String existingNs = pfxToNs.get(prefix);
if (existingNs != null) {
if (existingNs.equals(namespace)) {
// nothing to do
} else {
// prefix clash, need to rename this prefix or reuse an existing
// one
if (pfxToNs.values().contains(namespace)) {
// reuse matching prefix
} else {
// rename
generatePrefix(namespace, pfxToNs);
}
}
} else {
// need to add this prefix
pfxToNs.put(prefix, namespace);
}
}
public static void main(String[] args) throws XMLStreamException {
String xmlString = "" +
"<document-element xmlns=\"uri:ns1\" attr=\"1\">\n" +
" <foo>\n" +
" <bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n" +
" <ns3:foobar ns3:attr1=\"meh\" />\n" +
" <ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n" +
" </bar>\n" +
" </foo>\n" +
"</document-element>";
System.out.println(xmlString);
NamespacesToRoot transformer = new NamespacesToRoot();
System.out.println(transformer.transform(xmlString));
}
}
Note that this is just fast example code which could use some tweaks, but is also a good start for anyone with a similar problem.

Below is a simple app does the namespace re-declaration... based on XPath and VTD-XML.
import com.ximpleware.*;
import java.io.*;
public class moveNSDeclaration {
public static void main(String[] args) throws IOException, VTDException{
// TODO Auto-generated method stub
VTDGen vg = new VTDGen();
String xml="<document-element xmlns=\"uri:ns1\">\n"+
"<foo>\n"+
"<bar xmlns=\"uri:ns2\" xmlns:ns3=\"uri:ns3\">\n"+
"<ns3:foobar/>\n"+
"<ns1:sigh xmlns:ns1=\"uri:ns1\"/>\n"+
"</bar>\n"+
"</foo>\n"+
"</document-element>\n";
vg.setDoc(xml.getBytes());
vg.parse(false); // namespace unaware to all name space nodes addressable using xpath #*
VTDNav vn = vg.getNav();
XMLModifier xm = new XMLModifier(vn);
FastIntBuffer fib = new FastIntBuffer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
// get the index value of xmlns declaration of root element
AutoPilot ap =new AutoPilot (vn);
ap.selectXPath("//#*");
int i=0;
//remove all ns node under root element
//save those nodes to be re-inserted into the root element up on verification of uniqueness
while((i=ap.evalXPath())!=-1){
if (vn.getTokenType(i)==VTDNav.TOKEN_ATTR_NS){
xm.remove(); //remove all ns node
fib.append(i);
}
}
//remove redundant ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
for (i=j+1;i<fib.size();i++){
if (fib.intAt(i)!=-1)
if (vn.compareTokens(fib.intAt(j), vn, fib.intAt(i))==0){
fib.modifyEntry(i, -1);
}
}
}
}
// compose a string to insert back into the root element containing all subordinate ns nodes
for (int j=0;j<fib.size();j++){
if (fib.intAt(j)!=-1){
int os = vn.getTokenOffset(fib.intAt(j));
int len = vn.getTokenOffset(fib.intAt(j)+1)+vn.getTokenLength(fib.intAt(j)+1)+1-os;
//System.out.println(" os len "+ os + " "+len);
//System.out.println(vn.toString(os,len));
baos.write(" ".getBytes());
baos.write(vn.getXML().getBytes(),os,len);
}
}
byte[] attrBytes = baos.toByteArray();
vn.toElement(VTDNav.ROOT);
xm.insertAttribute(attrBytes);
//System.out.println(baos.toString());
baos.reset();
xm.output(baos);
System.out.println(baos.toString());
}
}
Output looks like
<document-element xmlns="uri:ns2" xmlns:ns3="uri:ns3" xmlns:ns1="uri:ns1" >
<foo>
<bar >
<ns3:foobar/>
<ns1:sigh />
</bar>
</foo>
</document-element>

How to use variables in XPath?

I am using Xpath and Java.
The XML got plenty of OBJECT_TYPES and every object type has properties and parameters.
And each property and parameter got elements.
How do I do the following from my XML file.
I wanna know how to select with the XPATH string expression all property elements depending on whats the name of the OBJECT_TYPE string. The object type string name depends on what name the user selects from the list.
How can I do that?
Should be something like :
String expression = "/getObjType()/prop/*";
But the getObjectType is a method so I cant use it in a string expression.
XML looks something like this:
<type>
<OBJECT_TYPE>SiteData</OBJECT_TYPE>
<prop>
<DESCRIPTION>Site parameters</DESCRIPTION>
<PARENT>NULL</PARENT>
<VIRTUAL>0</VIRTUAL>
<VISIBLE>1</VISIBLE>
<PICTURE>NULL</PICTURE>
<HELP>10008</HELP>
<MIN_NO>1</MIN_NO>
<MAX_NO>1</MAX_NO>
<NAME_FORMAT>NULL</NAME_FORMAT>
</prop>
<param>
<PARAMETER>blabla</PARAMETER>
<DATA_TYPE>INTEGER</DATA_TYPE>
<DESCRIPTION>blaba</DESCRIPTION>
<MIN_NO>1</MIN_NO>
<MAX_NO>1</MAX_NO>
<ORDER1>1</ORDER1>
<NESTED>0</NESTED>
<DEFAULT1>NULL</DEFAULT1>
<FORMAT>0:16382</FORMAT>
</param>
<OBJECT_TYPE>Data</OBJECT_TYPE>
<prop>
<DESCRIPTION>Site parameters</DESCRIPTION>
<PARENT>NULL</PARENT>
<VIRTUAL>0</VIRTUAL>
<VISIBLE>1</VISIBLE>
<PICTURE>NULL</PICTURE>
<HELP>10008</HELP>
<MIN_NO>1</MIN_NO>
<MAX_NO>1</MAX_NO>
<NAME_FORMAT>NULL</NAME_FORMAT>
</prop>
<param>
<PARAMETER>gmgm</PARAMETER>
<DATA_TYPE>INTEGER</DATA_TYPE>
<DESCRIPTION>babla</DESCRIPTION>
<MIN_NO>1</MIN_NO>
<MAX_NO>1</MAX_NO>
<ORDER1>1</ORDER1>
<NESTED>0</NESTED>
<DEFAULT1>NULL</DEFAULT1>
<FORMAT>0:16382</FORMAT>
</param>
</type>
So depending on whats the name of the Object_type I wanna get thoose properties and I have list 122 object types so I have to use a varible to pick which one the user selects.
public class PropXMLParsing {
static PropXMLParsing instance = null;
private List<String> list = new ArrayList<String>();
ObjType obj = new ObjType();
public static PropXMLParsing getInstance() {
if (instance == null) {
instance = new PropXMLParsing();
try {
instance.ParserForObjectTypes();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
}
}
return instance;
}
public void ParserForObjectTypes() throws SAXException, IOException,
ParserConfigurationException {
try {
FileInputStream file = new FileInputStream(new File(
"xmlFiles/CoreDatamodel.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
builderFactory.setNamespaceAware(true);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xp = XPathFactory.newInstance().newXPath();
final Map<String, Object> vars = new HashMap<String, Object>();
xp.setXPathVariableResolver(new XPathVariableResolver() {
public Object resolveVariable(QName name) {
return vars.get(name.getLocalPart());
}
});
XPathExpression expr = xp
.compile("/type/OBJECT_TYPE[. = $type]/following-sibling::prop[1]");
vars.put("type", obj.getObjectType());
NodeList objectProps = (NodeList) expr.evaluate(xmlDocument,
XPathConstants.NODESET);
System.out.println(objectProps);
for (int i = 0; i < objectProps.getLength(); i++) {
System.out.println(objectProps.item(i).getFirstChild()
.getNodeValue());
list.add(objectProps.item(i).getFirstChild().getNodeValue());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}
public String convertListToString() {
StringBuilder sb = new StringBuilder();
if (list.size() > 0) {
sb.append(list.get(0));
for (int i = 1; i < list.size(); i++) {
sb.append(list.get(i));
}
}
return sb.toString();
}
}
Second solution I have tried that aint working neither not printing out anything in the console.
public void ParserForObjectTypes() throws SAXException, IOException,
ParserConfigurationException {
try {
FileInputStream file = new FileInputStream(new File(
"xmlFiles/CoreDatamodel.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory
.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.compile(
"//OBJECT_TYPE[text() = '" + obj.getObjectType()
+ "']/following-sibling::prop[1]/*").evaluate(
xmlDocument, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++) {
System.out.println(nodeList.item(i).getNodeName() + " = "
+ nodeList.item(i).getTextContent());
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (XPathExpressionException e) {
e.printStackTrace();
}
}

If you want to extract the prop belonging to a specific OBJECT_TYPE you can do that with
/type/OBJECT_TYPE[. = 'some type']/following-sibling::prop[1]
In Java you could build up this XPath expression dynamically using string concatenation but it would be much safer to use an XPath variable if the library you're using can support that (you don't say in the question what library you're using). For example with javax.xml.xpath
XPath xp = XPathFactory.newInstance().newXPath();
final Map<String, Object> vars = new HashMap<String, Object>();
xp.setXPathVariableResolver(new XPathVariableResolver() {
public Object resolveVariable(QName name) {
return vars.get(name.getLocalPart());
}
});
XPathExpression expr = xp.compile("/type/OBJECT_TYPE[. = $type]/following-sibling::prop[1]");
vars.put("type", "Data");
Node dataProps = (Node)expr.evaluate(doc, XPathConstants.NODE);
vars.put("type", "SiteData");
Node siteProps = (Node)expr.evaluate(doc, XPathConstants.NODE);
// taking the value from a variable
vars.put("type", obj.getObjectType());
Node objectProps = (Node)expr.evaluate(doc, XPathConstants.NODE);

This XPATH will select all the elements within the prop element that follows the OBJECT_TYPE with the text SiteData:
//OBJECT_TYPE[text() = 'SiteData']/following-sibling::prop[1]/*
To change the OBJECT_TYPE being selected just construct the XPATH in the code:
String xpath = "//OBJECT_TYPE[text() = '" + getObjType() + "']/following-sibling::prop[1]/*"
Which results in code like this:
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList)xPath.compile("//OBJECT_TYPE[text() = '" + getObjType() + "']/following-sibling::prop[1]/*").evaluate(document, XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); i++)
{
System.out.println(nodeList.item(i).getNodeName() + " = " + nodeList.item(i).getTextContent());
}
That given the XML from the question and when getObjType() returns SiteData prints:
DESCRIPTION = Site parameters
PARENT = NULL
VIRTUAL = 0
VISIBLE = 1
PICTURE = NULL
HELP = 10008
MIN_NO = 1
MAX_NO = 1
NAME_FORMAT = NULL

Failing to extract xml value element using JDOM & Xpath

I have a method (getSingleNodeValue()) which when passed an xpatch expression will extract the value of the specified element in the xml document refered to in 'doc'. Assume doc at this point has been initialised as shown below and xmlInput is the buffer containing the xml content.
SAXBuilder builder = null;
Document doc = null;
XPath xpathInstance = null;
doc = builder.build(new StringReader(xmlInput));
When i call the method, i pass the following xpath xpression
/TOP4A/PERLODSUMDEC/TINPLD1/text()
Here is the method. It basically just takes an xml buffer and uses xpath to extract the value:
public static String getSingleNodeValue(String xpathExpr) throws Exception{
Text list = null;
try {
xpathInstance = XPath.newInstance(xpathExpr);
list = (Text) xpathInstance.selectSingleNode(doc);
} catch (JDOMException e) {
throw new Exception(e);
}catch (Exception e){
throw new Exception(e);
}
return list==null ? "?" : list.getText();
}
The above method always returns "?" i.e. nothing is found so 'list' is null.
The xml document it looks at is
<TOP4A xmlns="http://www.testurl.co.uk/enment/gqr/3232/1">
<HEAD>
<Doc>ABCDUK1234</Doc>
</HEAD>
<PERLODSUMDEC>
<TINPLD1>10109000000000000</TINPLD1>
</PERLODSUMDEC>
</TOP4A>
The same method works with other xml documents so i am not sure what is special about this one. There is no exception so the xml is valid xml. Its just that the method always sets 'list' to null. Any ideas?
Edit
Ok as suggested, here is a simple running program that demonstrates the above
import org.jdom.*;
import org.jdom.input.*;
import org.jdom.xpath.*;
import java.io.IOException;
import java.io.StringReader;
public class XpathTest {
public static String getSingleNodeValue(String xpathExpr, String xmlInput) throws Exception{
Text list = null;
SAXBuilder builder = null;
Document doc = null;
XPath xpathInstance = null;
try {
builder = new SAXBuilder();
doc = builder.build(new StringReader(xmlInput));
xpathInstance = XPath.newInstance(xpathExpr);
list = (Text) xpathInstance.selectSingleNode(doc);
} catch (JDOMException e) {
throw new Exception(e);
}catch (Exception e){
throw new Exception(e);
}
return list==null ? "Nothing Found" : list.getText();
}
public static void main(String[] args){
String xmlInput1 = "<TOP4A xmlns=\"http://www.testurl.co.uk/enment/gqr/3232/1\"><HEAD><Doc>ABCDUK1234</Doc></HEAD><PERLODSUMDEC><TINPLD1>10109000000000000</TINPLD1></PERLODSUMDEC></TOP4A>";
String xpathExpr = "/TOP4A/PERLODSUMDEC/TINPLD1/text()";
XpathTest xp = new XpathTest();
try {
System.out.println(xp.getSingleNodeValue(xpathExpr, xmlInput1));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
When i run the above, the output is
Nothing found
Edit
I have run some further testing and it appears that if i remove the namespace url it does work. Not sure why yet. Is there any way i can tell it to ignore the namespace?
Edit
Please also note that the above is implemented on JDK1.4.1 so i dont have the options for later version of the JDKs. This is the reason why i had to stick with Jdom.

The problem is with XML namespaces: your XPath query starts by selecting a 'TOP4A' element in the default namespace. Your XML file, however, has a 'TOP4A' element in the 'http://www.testurl.co.uk/enment/gqr/3232/1' namespace instead.
Is it an option to remove the xmlns from the XML?

XML validation in Java - why does this fail?

first time dealing with xml, so please be patient. the code below is probably evil in a million ways (I'd be very happy to hear about all of them), but the main problem is of course that it doesn't work :-)
public class Test {
private static final String JSDL_SCHEMA_URL = "http://schemas.ggf.org/jsdl/2005/11/jsdl";
private static final String JSDL_POSIX_APPLICATION_SCHEMA_URL = "http://schemas.ggf.org/jsdl/2005/11/jsdl-posix";
public static void main(String[] args) {
System.out.println(Test.createJSDLDescription("/bin/echo", "hello world"));
}
private static String createJSDLDescription(String execName, String args) {
Document jsdlJobDefinitionDocument = getJSDLJobDefinitionDocument();
String xmlString = null;
// create the elements
Element jobDescription = jsdlJobDefinitionDocument.createElement("JobDescription");
Element application = jsdlJobDefinitionDocument.createElement("Application");
Element posixApplication = jsdlJobDefinitionDocument.createElementNS(JSDL_POSIX_APPLICATION_SCHEMA_URL, "POSIXApplication");
Element executable = jsdlJobDefinitionDocument.createElement("Executable");
executable.setTextContent(execName);
Element argument = jsdlJobDefinitionDocument.createElement("Argument");
argument.setTextContent(args);
//join them into a tree
posixApplication.appendChild(executable);
posixApplication.appendChild(argument);
application.appendChild(posixApplication);
jobDescription.appendChild(application);
jsdlJobDefinitionDocument.getDocumentElement().appendChild(jobDescription);
DOMSource source = new DOMSource(jsdlJobDefinitionDocument);
validateXML(source);
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
StreamResult result = new StreamResult(new StringWriter());
transformer.transform(source, result);
xmlString = result.getWriter().toString();
} catch (Exception e) {
e.printStackTrace();
}
return xmlString;
}
private static Document getJSDLJobDefinitionDocument() {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
} catch (Exception e) {
e.printStackTrace();
}
DOMImplementation domImpl = builder.getDOMImplementation();
Document theDocument = domImpl.createDocument(JSDL_SCHEMA_URL, "JobDefinition", null);
return theDocument;
}
private static void validateXML(DOMSource source) {
try {
URL schemaFile = new URL(JSDL_SCHEMA_URL);
Sche maFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
DOMResult result = new DOMResult();
validator.validate(source, result);
System.out.println("is valid");
} catch (Exception e) {
e.printStackTrace();
}
}
}
it spits out a somewhat odd message:
org.xml.sax.SAXParseException: cvc-complex-type.2.4.a: Invalid content was found starting with element 'JobDescription'. One of '{"http://schemas.ggf.org/jsdl/2005/11/jsdl":JobDescription}' is expected.
Where am I going wrong here?
Thanks a lot

I think you are missing the namespace on your elements. Rather than calling createElement(), you can try
document.createElementNS(JSDL_SCHEMA_URL, elementName)
If necessary, you may need to use a prefix, e.g.
document.createElementNS(JSDL_SCHEMA_URL, "jsdl:"+elementName)

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse XML escaped in CDATA mixed with invalid HTML - java

Related

How can I obtain a list of String with the values with XPath? [duplicate]

Transform XML to declare all namespaces on root element

How to use variables in XPath?

Failing to extract xml value element using JDOM & Xpath

XML validation in Java - why does this fail?

Categories

Resources