How to read < as < from an XML? [duplicate] - java

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>

If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);

In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}

If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}

There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.

If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>

following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152

There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.

There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs

You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}

Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

Related

Why won't my xpath work?

I have the following xml:
<?xml version="1.0" encoding="UTF-8"?>
<prefix:someName xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:prefix="someUri" xsi:schemaLocation="someLocation.xsd">
<prefix:someName2>
....
</prefix:someName2>
</prefix:someName>
And my code looks like this:
private Node doXpathThingy(Document doc) {
final DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new NamespaceContext(){
#Override
public String getNamespaceURI(String prefix) {
if (prefix == null) {
throw new NullPointerException("Null prefix");
}
return doc.lookupNamespaceURI(prefix);
}
#Override
public String getPrefix(String namespaceURI) {
return null;
}
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
});
try {
XPathExpression expr = xPath.compile(xpathString);
return (Node)expr.evaluate(doc, XPathConstants.NODESET);
} catch (Exception e) {
.... }
}
I'm trying to get this to work with any valid xpath. It works with these xpaths:
"prefix:someName"
"."
But NOT with: "prefix:someName2". It returns null.
I guess I'm still not getting something about namespaces, but I don't understand what? I've tried leaving out the prefixes from my xpath but then nothing works at all.
I've also checked if the correct uri is returned for the prefix at doc.lookupNamespaceURI(prefix), and it is.
Any help would be greatly appreciated.
The query prefix:XXX means child::prefix:XXX, that is, find an element child of the context node whose name is prefix:XXX. Your context node is the document node at the root of the tree. The document node has a child named prefix:someName, but it doesn't have a child named prefix:someName2. If you want to find a grandchild of the document node, try the query */prefix:someName2.
Can't say I'm familiar with the Java way of doing XPath, but it looks like you are making an XPath query from the root of the document, so what you are seeing is the expected behavior.
Try this to find someName2 anywhere in the doc
//prefix:someName2
or this to find it as the child of someName2
/prefix:someName/prefix:someName2
or this to find it as the direct child of any root element
/*/prefix:someName2

Parsing SOAP Response in Java

I do not succeed in parsing a SOAP Response in Java (using Bonita Open Solution BPM).
I have the following SOAP response (searching for a document in the IBM Content Manager; the SOAP Response returns 1 matching document)
<soapenv:Envelope xmlns:soapenv="http://www.w3.org/2003/05/soap-envelope" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<soapenv:Body>
<ns1:RunQueryReply xmlns="http://www.ibm.com/xmlns/db2/cm/beans/1.0/schema" xmlns:ns1="http://www.ibm.com/xmlns/db2/cm/beans/1.0/schema">
<ns1:RequestStatus success="true"></ns1:RequestStatus>
<ns1:ResultSet count="1">
<ns1:Item URI="http://xxxx/CMBSpecificWebService/CMBGetPIDUrl?pid=96 3 ICM8 ICMNLSDB16 ICCSPArchivSuche59 26 A1001001A12D18B30015E9357518 A12D18B30015E935751 14 1087&server=ICMNLSDB&dsType=ICM">
<ns1:ItemXML>
<ICCSPArchivSuche ICCCreatedBy="EBUSINESS\iccadmin" ICCCreatedDate="2012-04-18T10:51:26.000000" ICCFileName="Golem_Artikel.txt" ICCFolderPath="" ICCLastModifiedDate="2012-04-18T10:51:28.000000" ICCLibrary="Dokumente" ICCModifiedBy="EBUSINESS\iccadmin" ICCSharePointGUID="c43f9c93-a228-43f9-8232-06bdea4695d1" ICCSharePointVersion="1.0 " ICCSite="Archiv Suche" cm:PID="96 3 ICM8 ICMNLSDB16 ICCSPArchivSuche59 26 A1001001A12D18B30015E9357518 A12D18B30015E935751 14 1087" xmlns:cm="http://www.ibm.com/xmlns/db2/cm/api/1.0/schema">
<cm:properties type="document">
<cm:lastChangeUserid value="ICCCMADMIN"/>
<cm:lastChangeTime value="2012-04-18T11:00:15.914"/>
<cm:createUserid value="ICCCMADMIN"/>
<cm:createTime value="2012-04-18T11:00:15.914"/>
<cm:semanticType value="1"/>
<cm:ACL name="DocRouteACL"/>
<cm:lastOperation name="RETRIEVE" value="SUCCESS"/>
</cm:properties>
<cm:resourceObject CCSID="0" MIMEType="text/plain" RMName="rmdb" SMSCollName="CBR.CLLCT001" externalObjectName=" " originalFileName="" resourceFlag="2" resourceName=" " size="702" textSearchable="true" xsi:type="cm:TextObjectType">
<cm:URL value="http://cmwin01.ebusiness.local:9080/icmrm/ICMResourceManager/A1001001A12D18B30015E93575.txt?order=retrieve&item-id=A1001001A12D18B30015E93575&version=1&collection=CBR.CLLCT001&libname=icmnlsdb&update-date=2012-04-18+11%3A00%3A15.001593&token=A4E6.IcQyRE6_QbBPESDGxK2;&content-length=0"/>
</cm:resourceObject>
</ICCSPArchivSuche>
</ns1:ItemXML>
</ns1:Item>
</ns1:ResultSet>
</ns1:RunQueryReply>
</soapenv:Body>
</soapenv:Envelope>
I would like to get the filename (ICCFileName="Golem_Artikel.txt") and the url to this file ( <cm:URL value="http://cmwin01.ebusiness.local:9080/icmrm/ICMResourceManager/A10...) in string Variables using Java. I read several articles on how to do this (Can't process SOAP response , How to do the Parsing of SOAP Response) but without success. Here is what I tried:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
// Clean response xml document
responseDocumentBody.normalizeDocument();
// Get result node
NodeList resultList = responseDocumentBody.getElementsByTagName("ICCSPArchivSuche");
Element resultElement = (Element) resultList.item(0);
String XMLData = resultElement.getTextContent();
// Check for empty result
if ("Data Not Found".equalsIgnoreCase(XMLData))
return null;
DocumentBuilder documentBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
InputSource inputSource = new InputSource();
inputSource.setCharacterStream(new StringReader(XMLData));
Document doc = documentBuilder.parse(inputSource);
Node node = doc.getDocumentElement();
String result = doc.getNodeType();
return result;
From Bonita, I only get responseDocumentBody or responseDocumentEnvelope (org.w3c.dom.Document) as webservice response. Therefore, I need to navigate from the SOAP Body to my variables. I would be pleased if someone could help.
Best regards
If you do a lot of work with this, I would definitively recommend using JAXB as MGoron suggests. If this is a one shot excersize, XPATH could also work well.
/*
* Must use a namespace aware factory
*/
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(...);
/*
* Create an XPath object
*/
XPath p = XPathFactory.newInstance().newXPath();
/*
* Must use a namespace context
*/
p.setNamespaceContext(new NamespaceContext() {
public Iterator getPrefixes(String namespaceURI) {
return null;
}
public String getPrefix(String namespaceURI) {
return null;
}
public String getNamespaceURI(String prefix) {
if (prefix.equals("ns1"))
return "http://www.ibm.com/xmlns/db2/cm/beans/1.0/schema";
if (prefix.equals("cm"))
return "http://www.ibm.com/xmlns/db2/cm/api/1.0/schema";
return null;
}
});
/*
* Find the ICCSFileName attribute
*/
Node iccsFileName = (Node) p.evaluate("//ns1:ICCSPArchivSuche/#ICCFileName", doc, XPathConstants.NODE);
System.out.println(iccsFileName.getNodeValue());
/*
* Find the URL
*/
Node url = (Node) p.evaluate("//ns1:ICCSPArchivSuche/cm:resourceObject/cm:URL/#value", doc, XPathConstants.NODE);
System.out.println(url.getNodeValue());
get RunQueryReply schema
map xsd to java classes using jax-b
unmarshall response string to jax-b class object
Below is the code to do this in VTD-XML, it basically consists of 2 XPath queries, each returning one result... however the code is robust as it doesn't assume those queries will return non-empty result...
import com.ximpleware.*;
public class parseSOAP {
public static void main(String[] s) throws VTDException, Exception{
VTDGen vg = new VTDGen();
vg.selectLcDepth(5);// soap has deep nesting so set to 5 to speed up navigation
if (!vg.parseFile("d:\\xml\\soap2.xml", true))
return;
VTDNav vn = vg.getNav();
AutoPilot ap =new AutoPilot(vn);
//declare name space for xpath
ap.declareXPathNameSpace("ns", "http://www.ibm.com/xmlns/db2/cm/beans/1.0/schema");
ap.declareXPathNameSpace("ns1", "http://www.ibm.com/xmlns/db2/cm/beans/1.0/schema");
ap.declareXPathNameSpace("cm", "http://www.ibm.com/xmlns/db2/cm/api/1.0/schema");
ap.declareXPathNameSpace("soapenv", "http://www.w3.org/2003/05/soap-envelope");
ap.selectXPath("/soapenv:Envelope/soapenv:Body/ns1:RunQueryReply/ns1:ResultSet/ns1:Item/ns1:ItemXML//ICCSPArchivSuche/#ICCFileName");
int i=0;
if ((i=ap.evalXPath())!=-1){
System.out.println("file name ==>"+vn.toString(i+1));
}
ap.selectXPath("/soapenv:Envelope/soapenv:Body/ns1:RunQueryReply/ns1:ResultSet/ns1:Item/ns1:ItemXML//ICCSPArchivSuche/cm:resourceObject/cm:URL/#value");
if ((i=ap.evalXPath())!=-1){
System.out.println("file name ==>"+vn.toString(i+1));
}
}
}

Cannot return an org.w3c.dom.Document by web-service method

I am trying to return an XML Document Object from a java axis2 web service. When I am trying to get the Document object on the client side, it gives me these exceptions.
org.apache.axis2.AxisFault: org.apache.axis2.AxisFault: Mapping qname not fond for the package: com.sun.org.apache.xerces.internal.dom
at org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
at org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:375)
at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:421)
at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:229)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:165)
at com.turnkey.DataCollectorStub.getData(DataCollectorStub.java:194)
at com.turnkey.TestClient.main(TestClient.java:28)
Can I not return the Document object from a webservice ??
This service does return the XML string though.
Below is the pseudo code for the method I am using
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public Document getData(args)
{
String xmlSource = "/*XML string*/";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDoc = builder.parse(new InputSource(new StringReader(xmlSource)));
return xmlDoc;
}
BTW, this method works fine on the server side, But on the client side I cannot receive the Document object
Can anybody please help me.
Simple way doesn't use Document as return value, because axis2 cannot find suitable import in schema. If you generate wsdl every time you should add import org.w3c.dom.Document to wsdl schema (it is a inconvenient solution). That's why the best way in my point of view return specific entity
public Credit[] getCreditList(){
Credit[] credits = null;
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = factory.newDocumentBuilder();
Document xmlDoc = documentBuilder.parse(XML_REAL_PATH);
Element root = xmlDoc.getDocumentElement();
List<Credit> creditsList = new ArrayList<>();
NodeList creditNodes = root.getElementsByTagName(CREDIT);
int countCreditNodes = creditNodes.getLength();
for (int i = 0; i < countCreditNodes; i++) {
Element creditElement = (Element) creditNodes.item(i);
Credit credit = new Credit();
Element child = (Element) creditElement.getElementsByTagName(OWNER).item(0);
String owner = child.getFirstChild().getNodeValue();
credit.setOwner(owner);
//...
creditsList.add(credit);
}
credits = creditsList.toArray(new Credit[creditsList.size()]);
} catch (SAXException | IOException | ParserConfigurationException ex) {
Logger.getLogger(CreditPayService.class.getName()).log(Level.SEVERE, null, ex);
}
return credits;
}

How to skip well-formed for java DOM parser

I know this has been asked multiple times here, but I've a different issue dealing with it. In my case, the app receives a non well-formed dom structure passed as a string. Here's a sample :
<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>
As you can see, the content is not well-formed. Now, if I try to parse using a normal SAX or DOM parse it'll throw an exception which is understood.
org.xml.sax.SAXParseException: The reference to entity "feature" must end with the ';' delimiter.
As per the requirement, I need to read this document,add few additional div tags and send the content back as a string. This works great by using a DOM parser as I can read through the input structure and add additional tags at their required position.
I tried using tools like JTidy to do a pre-processing and then parse, but that results in converting the document to a fully-blown html, which I don't want. Here's a sample code :
StringWriter writer = new StringWriter();
Tidy tidy = new Tidy(); // obtain a new Tidy instance
tidy.setXHTML(true);
tidy.parse(new ByteArrayInputStream(content.getBytes()), writer);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(writer.toString().getBytes()));
// Traverse thru the content and add new tags
....
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
This completely converts the input to a well-formed html document. It then becomes hard to remove html tags manually. The other option I tried was to use SAX2DOM, which too creates a HTML doc. Here's a sample code .
ByteArrayInputStream is = new ByteArrayInputStream(content.getBytes());
Parser p = new Parser();
p.setFeature(IContentExtractionConstant.SAX_NAMESPACE,true);
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(is));
Document doc = (Document)sax2dom.getDOM();
I'll appreciate if someone can share their ideas.
Thanks
The simplest way is replacing xml reserved characters with the corresponding xml entities. You can do this manually:
content.replaceAll("&", "&");
If you don't want to modify your string before parsing it, I could propose you another way using SaxParser, but this solution is more complicated. Basically you have to:
write a LexicalHandler in
combination with ContentHandler
tell the parser to continue its
execution after fatal error (the
ErrorHandler isn't enough)
treat undeclared entities as simple
text
UPDATE
According to your comment, I'm going to add some details regarding the second solution. I've writed a class which extends DefaulHandler (default implementation of EntityResolver, DTDHandler, ContentHandler and ErrorHandler) and implements LexicalHandler. I've extended ErrorHandler's fatalError method (my implementations does nothing instead of throwing the exception) and ContentHandler's characters method which works in combination with startEntity method of LexicalHandler.
public class MyHandler extends DefaultHandler implements LexicalHandler {
private String currentEntity = null;
#Override
public void fatalError(SAXParseException e) throws SAXException {
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content = new String(ch, start, length);
if (currentEntity != null) {
content = "&" + currentEntity + content;
currentEntity = null;
}
System.out.print(content);
}
#Override
public void startEntity(String name) throws SAXException {
currentEntity = name;
}
#Override
public void endEntity(String name) throws SAXException {
}
#Override
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
}
#Override
public void endDTD() throws SAXException {
}
#Override
public void startCDATA() throws SAXException {
}
#Override
public void endCDATA() throws SAXException {
}
#Override
public void comment(char[] ch, int start, int length) throws SAXException {
}
}
This is my main which parses your xml not well formed. It's very important the setFeature, because without it the parser throws the SaxParseException despite of the ErrorHandler empty implementation.
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
String xml = "<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlReader.setContentHandler(myHandler);
xmlReader.setErrorHandler(myHandler);
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
xmlReader.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
xmlReader.parse(new InputSource(new StringReader(xml)));
}
This main prints out the content of your div element which contains the error:
http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata
Keep in mind that this is an example which works with your input, maybe you'll have to complete it...for instance if you have some characters correctly escaped you should add some lines of code to handle this situation etc.
Hope this helps.

convert nu.XOM.Element to org.w3c.dom.Element

Is it possible to convert nu.XOM.Element to org.w3c.dom.Element?
Am trying to construct XML using XOM APIs. But few of my legacy APIs expects org.w3c.dom.Element. So, I just want to know if I can convert.
Thank You :)
There is the nu.xom.converters.DOMConverter class, which provides a way of translating an entire XOM document into a corresponding DOM document, but you can't do it for individual elements, probably because a W3C Element can't exist without a parent Document.
XOM Document:
final nu.xom.Element root = new nu.xom.Element("root");
root.appendChild("Hello World!");
final nu.xom.Document xomDoc = new nu.xom.Document(root);
using DOMConverter:
final DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
final DocumentBuilder builder = factory.newDocumentBuilder();
final DOMImplementation impl = builder.getDOMImplementation();
final Document w3cDoc= DOMConverter.convert(xomDoc, impl);
public static org.w3c.dom.Document xomToDom(Element elem) {
try {
elem = (Element)elem.copy();
return
DOMConverter.convert(new Document(elem),
DocumentBuilderFactory.newInstance().newDocumentBuilder().getDOMImplementation());
} catch (ParserConfigurationException e) {
throw new RuntimeException(e);
}
}

Categories