convert nu.XOM.Element to org.w3c.dom.Element - java

Is it possible to convert nu.XOM.Element to org.w3c.dom.Element?
Am trying to construct XML using XOM APIs. But few of my legacy APIs expects org.w3c.dom.Element. So, I just want to know if I can convert.
Thank You :)

There is the nu.xom.converters.DOMConverter class, which provides a way of translating an entire XOM document into a corresponding DOM document, but you can't do it for individual elements, probably because a W3C Element can't exist without a parent Document.

XOM Document:
final nu.xom.Element root = new nu.xom.Element("root");
root.appendChild("Hello World!");
final nu.xom.Document xomDoc = new nu.xom.Document(root);
using DOMConverter:
final DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
final DocumentBuilder builder = factory.newDocumentBuilder();
final DOMImplementation impl = builder.getDOMImplementation();
final Document w3cDoc= DOMConverter.convert(xomDoc, impl);

public static org.w3c.dom.Document xomToDom(Element elem) {
try {
elem = (Element)elem.copy();
return
DOMConverter.convert(new Document(elem),
DocumentBuilderFactory.newInstance().newDocumentBuilder().getDOMImplementation());
} catch (ParserConfigurationException e) {
throw new RuntimeException(e);
}
}

Related

How to read < as < from an XML? [duplicate]

I am new to XML. I want to read the following XML on the basis of request name. Please help me on how to read the below XML in Java -
<?xml version="1.0"?>
<config>
<Request name="ValidateEmailRequest">
<requestqueue>emailrequest</requestqueue>
<responsequeue>emailresponse</responsequeue>
</Request>
<Request name="CleanEmail">
<requestqueue>Cleanrequest</requestqueue>
<responsequeue>Cleanresponse</responsequeue>
</Request>
</config>
If your XML is a String, Then you can do the following:
String xml = ""; //Populated XML String....
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(xml)));
Element rootElement = document.getDocumentElement();
If your XML is in a file, then Document document will be instantiated like this:
Document document = builder.parse(new File("file.xml"));
The document.getDocumentElement() returns you the node that is the document element of the document (in your case <config>).
Once you have a rootElement, you can access the element's attribute (by calling rootElement.getAttribute() method), etc. For more methods on java's org.w3c.dom.Element
More info on java DocumentBuilder & DocumentBuilderFactory. Bear in mind, the example provided creates a XML DOM tree so if you have a huge XML data, the tree can be huge.
Related question.
Update Here's an example to get "value" of element <requestqueue>
protected String getString(String tagName, Element element) {
NodeList list = element.getElementsByTagName(tagName);
if (list != null && list.getLength() > 0) {
NodeList subList = list.item(0).getChildNodes();
if (subList != null && subList.getLength() > 0) {
return subList.item(0).getNodeValue();
}
}
return null;
}
You can effectively call it as,
String requestQueueName = getString("requestqueue", element);
In case you just need one (first) value to retrieve from xml:
public static String getTagValue(String xml, String tagName){
return xml.split("<"+tagName+">")[1].split("</"+tagName+">")[0];
}
In case you want to parse whole xml document use JSoup:
Document doc = Jsoup.parse(xml, "", Parser.xmlParser());
for (Element e : doc.select("Request")) {
System.out.println(e);
}
If you are just looking to get a single value from the XML you may want to use Java's XPath library. For an example see my answer to a previous question:
How to use XPath on xml docs having default namespace
It would look something like:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
Node node = (Node) xPath.evaluate("/Request/#name", dDoc, XPathConstants.NODE);
System.out.println(node.getNodeValue());
} catch (Exception e) {
e.printStackTrace();
}
}
}
There are a number of different ways to do this. You might want to check out XStream or JAXB. There are tutorials and the examples.
If the XML is well formed then you can convert it to Document. By using the XPath you can get the XML Elements.
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Form XML-String Create Document and find the elements using its XML-Path.
Document doc = getDocument(xml, true);
public static Document getDocument(String xmlData, boolean isXMLData) throws Exception {
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
dbFactory.setNamespaceAware(true);
dbFactory.setIgnoringComments(true);
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc;
if (isXMLData) {
InputSource ips = new org.xml.sax.InputSource(new StringReader(xmlData));
doc = dBuilder.parse(ips);
} else {
doc = dBuilder.parse( new File(xmlData) );
}
return doc;
}
Use org.apache.xpath.XPathAPI to get Node or NodeList.
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
public static String getNodeValue(Document doc, String xpathExpression) throws Exception {
Node node = org.apache.xpath.XPathAPI.selectSingleNode(doc, xpathExpression);
String nodeValue = node.getNodeValue();
return nodeValue;
}
public static NodeList getNodeList(Document doc, String xpathExpression) throws Exception {
NodeList result = org.apache.xpath.XPathAPI.selectNodeList(doc, xpathExpression);
return result;
}
Using javax.xml.xpath.XPathFactory
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
static XPath xpath = javax.xml.xpath.XPathFactory.newInstance().newXPath();
public static String getXPathFactoryValue(Document doc, String xpathExpression) throws XPathExpressionException, TransformerException, IOException {
Node node = (Node) xpath.evaluate(xpathExpression, doc, XPathConstants.NODE);
String nodeStr = getXmlContentAsString(node);
return nodeStr;
}
Using Document Element.
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
public static String getDocumentElementText(Document doc, String elementName) {
return doc.getElementsByTagName(elementName).item(0).getTextContent();
}
Get value in between two strings.
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
Full Example:
public static void main(String[] args) throws Exception {
String xml = "<stackusers><name>Yash</name><age>30</age></stackusers>";
Document doc = getDocument(xml, true);
String nodeVlaue = org.apache.commons.lang.StringUtils.substringBetween(xml, "<age>", "</age>");
System.out.println("StringUtils.substringBetween():"+nodeVlaue);
System.out.println("DocumentElementText:"+getDocumentElementText(doc, "age"));
System.out.println("javax.xml.xpath.XPathFactory:"+getXPathFactoryValue(doc, "/stackusers/age"));
System.out.println("XPathAPI:"+getNodeValue(doc, "/stackusers/age/text()"));
NodeList nodeList = getNodeList(doc, "/stackusers");
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList));
System.out.println("XPathAPI NodeList:"+ getXmlContentAsString(nodeList.item(0)));
}
public static String getXmlContentAsString(Node node) throws TransformerException, IOException {
StringBuilder stringBuilder = new StringBuilder();
NodeList childNodes = node.getChildNodes();
int length = childNodes.getLength();
for (int i = 0; i < length; i++) {
stringBuilder.append( toString(childNodes.item(i), true) );
}
return stringBuilder.toString();
}
OutPut:
StringUtils.substringBetween():30
DocumentElementText:30
javax.xml.xpath.XPathFactory:30
XPathAPI:30
XPathAPI NodeList:<stackusers>
<name>Yash</name>
<age>30</age>
</stackusers>
XPathAPI NodeList:<name>Yash</name><age>30</age>
following links might help
http://labe.felk.cvut.cz/~xfaigl/mep/xml/java-xml.htm
http://developerlife.com/tutorials/?p=25
http://www.java-samples.com/showtutorial.php?tutorialid=152
There are two general ways of doing that. You will either create a Domain Object Model of that XML file, take a look at this
and the second choice is using event driven parsing, which is an alternative to DOM xml representation. Imho you can find the best overall comparison of these two basic techniques here. Of course there are much more to know about processing xml, for instance if you are given XML schema definition (XSD), you could use JAXB.
There are various APIs available to read/write XML files through Java.
I would refer using StaX
Also This can be useful - Java XML APIs
You can make a class which extends org.xml.sax.helpers.DefaultHandler and call
start_<tag_name>(Attributes attrs);
and
end_<tag_name>();
For it is:
start_request_queue(attrs);
etc.
And then extends that class and implement xml configuration file parsers you want. Example:
...
public void startElement(String uri, String name, String qname,
org.xml.sax.Attributes attrs)
throws org.xml.sax.SAXException {
Class[] args = new Class[2];
args[0] = uri.getClass();
args[1] = org.xml.sax.Attributes.class;
try {
String mname = name.replace("-", "");
java.lang.reflect.Method m =
getClass().getDeclaredMethod("start" + mname, args);
m.invoke(this, new Object[] { uri, (org.xml.sax.Attributes)attrs });
}
catch (IllegalAccessException e) {
throw new RuntimeException(e);
}
catch (NoSuchMethodException e) {
throw new RuntimeException(e); }
catch (java.lang.reflect.InvocationTargetException e) {
org.xml.sax.SAXException se =
new org.xml.sax.SAXException(e.getTargetException());
se.setStackTrace(e.getTargetException().getStackTrace());
}
and in a particular configuration parser:
public void start_Request(String uri, org.xml.sax.Attributes attrs) {
// make sure to read attributes correctly
System.err.println("Request, name="+ attrs.getValue(0);
}
Since you are using this for configuration, your best bet is apache commons-configuration. For simple files it's way easier to use than "raw" XML parsers.
See the XML how-to

Unmarshall xml using Document instead of StreamSource

I previously unmarshaled my documents with this Method from "javax.xml.bind.Unmarshaller" (i am using Moxy as jaxb provider):
<Object> JAXBElement<Object> javax.xml.bind.Unmarshaller.unmarshal(Source source, Class<Object> declaredType) throws JAXBException
Now i have to refactor my code to use
<?> JAXBElement<?> javax.xml.bind.Unmarshaller.unmarshal(Node node, Class<?> declaredType) throws JAXBException
The problem is that i get an unmarshal exception:
Caused by: javax.xml.bind.UnmarshalException
- with linked exception:
[Exception [EclipseLink-25004] (Eclipse Persistence Services - 2.4.0.v20120608-r11652): org.eclipse.persistence.exceptions.XMLMarshalException
Exception Description: An error occurred unmarshalling the document
Internal Exception: org.xml.sax.SAXParseExceptionpublicId: ; systemId: ; lineNumber: 0; columnNumber: 0; cvc-elt.1: Cannot find the declaration of element 'invoice:request'.]
i tried unmarshal with Document and with Document.getDocumentElement(). The data in document is the same as in InputStream. Document is created this way:
protected static Document extractDocument(InputStream sourceData) {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
Document doc;
try {
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.parse(new InputSource(sourceData));
} catch (Exception e) {
throw new IllegalArgumentException("Problem on reading xml, cause: ", e);
}
return doc;
}
I need the "intermediate" Document to do the type recognition/ to get the second argument for unmarshal.
so how to use the unmarshal method with Node/Document that the outcome is the same as used with the inputstream?
Edit for rick
I am unmarshalling xml data from here xsds are in this zip and examples are here.
i wrote a simple test (jaxb provider is moxy, it was also used to generate the binding classes):
#Test
public void test() throws Exception {
Unmarshaller um;
JAXBContext jaxbContext = JAXBContext
.newInstance("....generatedClasses.invoice.general430.request");
um = jaxbContext.createUnmarshaller();
InputStream xmlIs = SingleTests.class.getResourceAsStream("/430_requests/dentist_ersred_TG_430.xml");
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(this.getClass().getResource("/xsd/" + "generalInvoiceRequest_430.xsd"));
// start
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
Document doc;
try {
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
doc = docBuilder.parse(new InputSource(xmlIs));
} catch (Exception e) {
throw new Exception("Problem on recognizing invoice type of given xml, cause: ", e);
}
// end
um.setSchema(schema);
um.unmarshal(doc, ....generatedClasses.invoice.general430.request.RequestType.class);
}
This is not working with the error described above. But as soon as deleted the "start end block" and unmarshall the stream xmlIs directly (not using the Document) the code works.
Ok i asked the same Question here the answer is:
simple add
docBuilderFactory.setNamespaceAware(true);
The following code works for me with a simple test model and gives the same result in all three cases:
JAXBContext ctx = JAXBContext.newInstance(new Class[] { TestObject.class }, null);
Source source = new StreamSource("src/stack16038604/instance.xml");
JAXBElement o1 = ctx.createUnmarshaller().unmarshal(source, TestObject.class);
System.out.println(o1.getValue());
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
InputStream inputStream2 = ClassLoader.getSystemResourceAsStream("stack16038604/instance.xml");
Document node = builderFactory.newDocumentBuilder().parse(inputStream2);
JAXBElement o2 = ctx.createUnmarshaller().unmarshal(node, TestObject.class);
System.out.println(o2.getValue());
DocumentBuilderFactory builderFactory2 = DocumentBuilderFactory.newInstance();
InputStream inputStream3 = ClassLoader.getSystemResourceAsStream("stack16038604/instance.xml");
InputSource inputSource = new InputSource(inputStream3);
Document node2 = builderFactory2.newDocumentBuilder().parse(inputSource);
JAXBElement o3 = ctx.createUnmarshaller().unmarshal(node2, TestObject.class);
System.out.println(o3.getValue());
If you provided a systemId in the Source use-case, then you should use the same systemId on the InputSource you create:
inputSource.setSystemId(sameIdAsYouUsedForSource);
If you are still having problems could you post the XML that you are trying to unmarshal? Also if your JAXB objects were generated from an XML Schema that could provide useful information as well.
Hope this helps,
Rick

Cannot return an org.w3c.dom.Document by web-service method

I am trying to return an XML Document Object from a java axis2 web service. When I am trying to get the Document object on the client side, it gives me these exceptions.
org.apache.axis2.AxisFault: org.apache.axis2.AxisFault: Mapping qname not fond for the package: com.sun.org.apache.xerces.internal.dom
at org.apache.axis2.util.Utils.getInboundFaultFromMessageContext(Utils.java:531)
at org.apache.axis2.description.OutInAxisOperationClient.handleResponse(OutInAxisOperation.java:375)
at org.apache.axis2.description.OutInAxisOperationClient.send(OutInAxisOperation.java:421)
at org.apache.axis2.description.OutInAxisOperationClient.executeImpl(OutInAxisOperation.java:229)
at org.apache.axis2.client.OperationClient.execute(OperationClient.java:165)
at com.turnkey.DataCollectorStub.getData(DataCollectorStub.java:194)
at com.turnkey.TestClient.main(TestClient.java:28)
Can I not return the Document object from a webservice ??
This service does return the XML string though.
Below is the pseudo code for the method I am using
import java.io.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public Document getData(args)
{
String xmlSource = "/*XML string*/";
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document xmlDoc = builder.parse(new InputSource(new StringReader(xmlSource)));
return xmlDoc;
}
BTW, this method works fine on the server side, But on the client side I cannot receive the Document object
Can anybody please help me.
Simple way doesn't use Document as return value, because axis2 cannot find suitable import in schema. If you generate wsdl every time you should add import org.w3c.dom.Document to wsdl schema (it is a inconvenient solution). That's why the best way in my point of view return specific entity
public Credit[] getCreditList(){
Credit[] credits = null;
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = factory.newDocumentBuilder();
Document xmlDoc = documentBuilder.parse(XML_REAL_PATH);
Element root = xmlDoc.getDocumentElement();
List<Credit> creditsList = new ArrayList<>();
NodeList creditNodes = root.getElementsByTagName(CREDIT);
int countCreditNodes = creditNodes.getLength();
for (int i = 0; i < countCreditNodes; i++) {
Element creditElement = (Element) creditNodes.item(i);
Credit credit = new Credit();
Element child = (Element) creditElement.getElementsByTagName(OWNER).item(0);
String owner = child.getFirstChild().getNodeValue();
credit.setOwner(owner);
//...
creditsList.add(credit);
}
credits = creditsList.toArray(new Credit[creditsList.size()]);
} catch (SAXException | IOException | ParserConfigurationException ex) {
Logger.getLogger(CreditPayService.class.getName()).log(Level.SEVERE, null, ex);
}
return credits;
}

XML with namespace Validation with XSD throws exception

I have a xml file which is a response from Webservice.It has got various namespaces involved with it. When I try to validate it with appropriate XSD its throwing "org.xml.sax.SAXParseException: cvc-elt.1: Cannot find the declaration of element 'SOAP-ENV:Envelope'." The namespace declaration for all the namespaces are declared in the response.
Following is my code
try {
DocumentBuilderFactory xmlFact = DocumentBuilderFactory.newInstance();
SchemaFactory schemaFactory = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
SAXSource mainInputStream = new SAXSource(new InputSource(new FileInputStream(new File("FIXEDINCOME_v3_0.xsd"))));
SAXSource importInputStream1 =new SAXSource(new InputSource(new FileInputStream(new File("Rating.xsd"))));
SAXSource importInputStream2 = new SAXSource(new InputSource(new FileInputStream(new File("Datatypes.xsd"))));
Source[] sourceSchema = new SAXSource[]{mainInputStream , importInputStream1, importInputStream2};
Schema schema = schemaFactory.newSchema(sourceSchema);
xmlFact.setNamespaceAware(true);
xmlFact.setSchema(schema);
DocumentBuilder builder = xmlFact.newDocumentBuilder();
xmlDOC = builder.parse(new InputSource(new StringReader(inputXML)));
NamespaceContext ctx = new NamespaceContext() {
public String getNamespaceURI(String prefix) {
String uri;
if (prefix.equals("ns0"))
uri = "http://namespace.worldnet.ml.com/EDS/Standards/Common/Service_v1_0/";
else if (prefix.equals("ns1"))
uri = "http://namespace.worldnet.ml.com/EDS/Product/Services/Get_Product_Data_Svc_v3_0/";
else if (prefix.equals("ns2"))
uri = "http://namespace.worldnet.ml.com/DataSOA/Product/Objects/FixedIncome/FixedIncome_v3_0/";
else if (prefix.equals("ns3")) {
uri = "http://namespace.worldnet.ml.com/DataSOA/Product/Objects/Rating/Rating_v2_0/";
} else if (prefix.equals("SOAP-ENV")) {
uri = "http://schemas.xmlsoap.org/soap-envelope/";
} else
uri = null;
return uri;
}
// Dummy implementation - not used!
public Iterator getPrefixes(String val) {
return null;
}
// Dummy implemenation - not used!
public String getPrefix(String uri) {
return null;
}
};
XPathFactory xpathFact = XPathFactory.newInstance();
xPath = xpathFact.newXPath();
xPath.setNamespaceContext(ctx);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I don't think the problem is with detecting the namespace definition for the SOAP-ENV prefix. The validator needs the XSD file that defines elements used in that namespace in order to validate the SOAP-ENV:Envelope element, so you need to tell the validator where that schema is located.
I think the solution is either to add the following to your XML reponse:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://schemas.xmlsoap.org/soap/envelope/"
xsi:schemaLocation="http://schemas.xmlsoap.org/soap/envelope/
http://schemas.xmlsoap.org/soap/envelope/">
Or, go download that schema off the web, save it to your local filesystem as an XSD file, and add it to your sourceSchema array. The first method should be preferred as it leads to more portable code (and XML).
Have you tried using the following URI for the SOAP-ENV?
http://schemas.xmlsoap.org/soap/envelope/
The set of schemas you are providing for validation does not include the soap schema. you can either include the soap schema in the schema collection, or, if you don't care about validating the soap wrapper, just grab the actual body content element and run your validation from there.

How to skip well-formed for java DOM parser

I know this has been asked multiple times here, but I've a different issue dealing with it. In my case, the app receives a non well-formed dom structure passed as a string. Here's a sample :
<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>
As you can see, the content is not well-formed. Now, if I try to parse using a normal SAX or DOM parse it'll throw an exception which is understood.
org.xml.sax.SAXParseException: The reference to entity "feature" must end with the ';' delimiter.
As per the requirement, I need to read this document,add few additional div tags and send the content back as a string. This works great by using a DOM parser as I can read through the input structure and add additional tags at their required position.
I tried using tools like JTidy to do a pre-processing and then parse, but that results in converting the document to a fully-blown html, which I don't want. Here's a sample code :
StringWriter writer = new StringWriter();
Tidy tidy = new Tidy(); // obtain a new Tidy instance
tidy.setXHTML(true);
tidy.parse(new ByteArrayInputStream(content.getBytes()), writer);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(writer.toString().getBytes()));
// Traverse thru the content and add new tags
....
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
This completely converts the input to a well-formed html document. It then becomes hard to remove html tags manually. The other option I tried was to use SAX2DOM, which too creates a HTML doc. Here's a sample code .
ByteArrayInputStream is = new ByteArrayInputStream(content.getBytes());
Parser p = new Parser();
p.setFeature(IContentExtractionConstant.SAX_NAMESPACE,true);
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(is));
Document doc = (Document)sax2dom.getDOM();
I'll appreciate if someone can share their ideas.
Thanks
The simplest way is replacing xml reserved characters with the corresponding xml entities. You can do this manually:
content.replaceAll("&", "&");
If you don't want to modify your string before parsing it, I could propose you another way using SaxParser, but this solution is more complicated. Basically you have to:
write a LexicalHandler in
combination with ContentHandler
tell the parser to continue its
execution after fatal error (the
ErrorHandler isn't enough)
treat undeclared entities as simple
text
UPDATE
According to your comment, I'm going to add some details regarding the second solution. I've writed a class which extends DefaulHandler (default implementation of EntityResolver, DTDHandler, ContentHandler and ErrorHandler) and implements LexicalHandler. I've extended ErrorHandler's fatalError method (my implementations does nothing instead of throwing the exception) and ContentHandler's characters method which works in combination with startEntity method of LexicalHandler.
public class MyHandler extends DefaultHandler implements LexicalHandler {
private String currentEntity = null;
#Override
public void fatalError(SAXParseException e) throws SAXException {
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content = new String(ch, start, length);
if (currentEntity != null) {
content = "&" + currentEntity + content;
currentEntity = null;
}
System.out.print(content);
}
#Override
public void startEntity(String name) throws SAXException {
currentEntity = name;
}
#Override
public void endEntity(String name) throws SAXException {
}
#Override
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
}
#Override
public void endDTD() throws SAXException {
}
#Override
public void startCDATA() throws SAXException {
}
#Override
public void endCDATA() throws SAXException {
}
#Override
public void comment(char[] ch, int start, int length) throws SAXException {
}
}
This is my main which parses your xml not well formed. It's very important the setFeature, because without it the parser throws the SaxParseException despite of the ErrorHandler empty implementation.
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
String xml = "<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlReader.setContentHandler(myHandler);
xmlReader.setErrorHandler(myHandler);
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
xmlReader.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
xmlReader.parse(new InputSource(new StringReader(xml)));
}
This main prints out the content of your div element which contains the error:
http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata
Keep in mind that this is an example which works with your input, maybe you'll have to complete it...for instance if you have some characters correctly escaped you should add some lines of code to handle this situation etc.
Hope this helps.

Categories