I have JAXB objects created from a schema. While marshalling, the xml elements are getting annotated with ns2. I have tried all the options that exist over the net for this problem, but none of them works. I cannot modify my schema or change package-info.java. Please help
After much research and tinkering I have finally managed to achieve a solution to this problem. Please accept my apologies for not posting links to the original references - there are many and I wasn't taking notes - but this one was certainly useful.
My solution uses a filtering XMLStreamWriter which applies an empty namespace context.
public class NoNamesWriter extends DelegatingXMLStreamWriter {
private static final NamespaceContext emptyNamespaceContext = new NamespaceContext() {
#Override
public String getNamespaceURI(String prefix) {
return "";
}
#Override
public String getPrefix(String namespaceURI) {
return "";
}
#Override
public Iterator getPrefixes(String namespaceURI) {
return null;
}
};
public static XMLStreamWriter filter(Writer writer) throws XMLStreamException {
return new NoNamesWriter(XMLOutputFactory.newInstance().createXMLStreamWriter(writer));
}
public NoNamesWriter(XMLStreamWriter writer) {
super(writer);
}
#Override
public NamespaceContext getNamespaceContext() {
return emptyNamespaceContext;
}
}
You can find a DelegatingXMLStreamWriter here.
You can then filter the marshalling xml with:
// Filter the output to remove namespaces.
m.marshal(it, NoNamesWriter.filter(writer));
I am sure there are more efficient mechanisms but I know this one works.
For me, only changing the package-info.java class worked like a charm, exactly as zatziky stated :
package-info.java
#javax.xml.bind.annotation.XmlSchema
(namespace = "http://example.com",
xmlns = {#XmlNs(prefix = "", namespaceURI = "http://example.com")},
elementFormDefault = javax.xml.bind.annotation.XmlNsForm.QUALIFIED)
package my.package;
import javax.xml.bind.annotation.XmlNs;
You can let the namespaces be written only once. You will need a proxy class of the XMLStreamWriter and a package-info.java. Then you will do in your code:
StringWriter stringWriter = new StringWriter();
XMLStreamWriter writer = new Wrapper((XMLStreamWriter) XMLOutputFactory
.newInstance().createXMLStreamWriter(stringWriter));
JAXBContext jaxbContext = JAXBContext.newInstance(Collection.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
jaxbMarshaller.marshal(books, writer);
System.out.println(stringWriter.toString());
Proxy class (the important method is "writeNamespace"):
class WrapperXMLStreamWriter implements XMLStreamWriter {
private final XMLStreamWriter writer;
public WrapperXMLStreamWriter(XMLStreamWriter writer) {
this.writer = writer;
}
//keeps track of what namespaces were used so that not to
//write them more than once
private List<String> namespaces = new ArrayList<String>();
public void init(){
namespaces.clear();
}
public void writeStartElement(String localName) throws XMLStreamException {
init();
writer.writeStartElement(localName);
}
public void writeStartElement(String namespaceURI, String localName) throws XMLStreamException {
init();
writer.writeStartElement(namespaceURI, localName);
}
public void writeStartElement(String prefix, String localName, String namespaceURI) throws XMLStreamException {
init();
writer.writeStartElement(prefix, localName, namespaceURI);
}
public void writeNamespace(String prefix, String namespaceURI) throws XMLStreamException {
if(namespaces.contains(namespaceURI)){
return;
}
namespaces.add(namespaceURI);
writer.writeNamespace(prefix, namespaceURI);
}
// .. other delegation method, always the same pattern: writer.method() ...
}
package-info.java:
#XmlSchema(elementFormDefault=XmlNsForm.QUALIFIED, attributeFormDefault=XmlNsForm.UNQUALIFIED ,
xmlns = {
#XmlNs(namespaceURI = "http://www.w3.org/2001/XMLSchema-instance", prefix = "xsi")})
package your.package;
import javax.xml.bind.annotation.XmlNs;
import javax.xml.bind.annotation.XmlNsForm;
import javax.xml.bind.annotation.XmlSchema;
You can use the NamespacePrefixMapper extension to control the namespace prefixes for your use case. The same extension is supported by both the JAXB reference implementation and EclipseLink JAXB (MOXy).
http://wiki.eclipse.org/EclipseLink/Release/2.4.0/JAXB_RI_Extensions/Namespace_Prefix_Mapper
Every solution requires complex overwriting or annotations which seems not to work with recent version. I use a simpler approach, just by replacing the annoying namespaces. I wish Google & Co would use JSON and get rid of XML.
kml.marshal(file);
String kmlContent = FileUtils.readFileToString(file, "UTF-8");
kmlContent = kmlContent.replaceAll("ns2:","").replace("<kml xmlns:ns2=\"http://www.opengis.net/kml/2.2\" xmlns:ns3=\"http://www.w3.org/2005/Atom\" xmlns:ns4=\"urn:oasis:names:tc:ciq:xsdschema:xAL:2.0\" xmlns:ns5=\"http://www.google.com/kml/ext/2.2\">", "<kml>");
FileUtils.write(file, kmlContent, "UTF-8");
Related
Sorry for the foggy title, I know it does not tell much.
Please consider the following xsd type definition:
<xsd:complexType name="TopicExpressionType" mixed="true">
<xsd:sequence>
<xsd:any processContents="lax" minOccurs="0"/>
</xsd:sequence>
<xsd:attribute name="Dialect" type="xsd:anyURI" use="required"/>
<xsd:anyAttribute/>
</xsd:complexType>
Complete XSD: http://docs.oasis-open.org/wsn/b-2.xsd
Corresponding JAXB generated Java class:
package org.oasis_open.docs.wsn.b_2;
import org.w3c.dom.Element;
import javax.xml.bind.annotation.*;
import javax.xml.namespace.QName;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
#XmlAccessorType(XmlAccessType.FIELD)
#XmlType(name = "TopicExpressionType", propOrder = {
"content"
})
public class TopicExpressionType {
#XmlMixed
#XmlAnyElement(lax = true)
protected List<Object> content;
#XmlAttribute(name = "Dialect", required = true)
#XmlSchemaType(name = "anyURI")
protected String dialect;
#XmlAnyAttribute
private Map<QName, String> otherAttributes = new HashMap<QName, String>();
public List<Object> getContent() {
if (content == null) {
content = new ArrayList<Object>();
}
return this.content;
}
public String getDialect() {
return dialect;
}
public void setDialect(String value) {
this.dialect = value;
}
public Map<QName, String> getOtherAttributes() {
return otherAttributes;
}
}
The first goal is to produce an XML like this with JAXB:
<wsnt:TopicExpression Dialect="http://docs.oasis-open.org/wsn/t-1/TopicExpression/Concrete" xmlns:tns="http://my.org/TopicNamespace">
tns:t1/*/t3
</wsnt:TopicExpression>
Please note the followings:
The value of the TopicExpression element is basically a query string that refers to QNames. Example: tns:t1/*/t3
The value of the TopicExpression element contains one or more QName like strings (tns:t1). It must be a string as in the example, it cannot be an Element (e.g.: <my-expresseion>tns:t1/*/t3<my-expresseion/>)
The value of the TopicExpression element is an arbitrary string (at least from the schema's perspective, it follows the rules defined here: https://docs.oasis-open.org/wsn/wsn-ws_topics-1.3-spec-os.pdf page 18)
Even though the value is a string, I need to define the corresponding name space declarations. So if I have an expression like this:
tns:t1 then xmlns:tns has to be declared. If my expresseion is tns:t1/*/tns2:t3 then both xmlns:tns and xmlns:tns2 have to be declared.
The second goal is to get the value of TopicExpression on the other side together with the namespace, using JAXB.
I am completely stuck, I don't know how I could implement this. My only idea is to manually build the value for the TopicExpression and somehow tell the marshaller to include the related namespace declaration despite there is no actual element using it.
Update
Example for a complete SOAP request that includes the before mentioned TopicExpression:
<env:Envelope xmlns:env="http://www.w3.org/2003/05/soap-envelope">
<env:Header>
<Action xmlns="http://www.w3.org/2005/08/addressing">http://docs.oasis-open.org/wsn/bw-2/NotificationProducer/SubscribeRequest</Action>
<MessageID xmlns="http://www.w3.org/2005/08/addressing">urn:uuid:57182d32-4e07-4f5f-8ab3-24838b3e33ac</MessageID>
</env:Header>
<env:Body>
<ns3:Subscribe xmlns:ns3="http://docs.oasis-open.org/wsn/b-2" xmlns:ns4="http://www.w3.org/2005/08/addressing" >
<ns3:ConsumerReference>
<ns4:Address>http://my-notification-consumer-url</ns4:Address>
</ns3:ConsumerReference>
<ns3:Filter>
<ns3:TopicExpression Dialect="http://docs.oasis-open.org/wsn/t-1/TopicExpression/Simple" xmlns:ns5="http://my.org/TopicNamespace" xmlns:ns6="http://extension.org/TopicNamespace">
ns5:t1/*/ns6:t3
<ns3:TopicExpression/>
</ns3:Filter>
</ns3:Subscribe>
</env:Body>
</env:Envelope>
Not sure, If I have understood your requirement correctly. See if this code sample is helpful for you. If not then maybe try to edit your question a bit and make me understand what exactly you are looking for. I will try to modify and update the code accordingly. Trying with a simple example would be better than providing the complete XSD. Also, look into the following methods: beforeMarshal and afterUnmarshal.
Following is the XML I am trying to marshal and unmarshal
<tns:TopicExpression Dialect="http://docs.oasis-open.org/wsn/t-1/TopicExpression/Concrete" xmlns:tns="http://my.org/TopicNamespace">
tns:t1/*/t3
</tns:TopicExpression>
TopicExpressionType.class:
#Data
#XmlAccessorType(XmlAccessType.FIELD)
public class TestPojo {
#XmlValue
private String TopicExpression;
#XmlAnyAttribute
private Map<String, Object> anyAttributes = new HashMap<>();
}
Main.class:
public class Main {
public static void main(String[] args) throws JAXBException, XMLStreamException {
final InputStream inputStream = Unmarshalling.class.getClassLoader().getResourceAsStream("topic.xml");
final XMLStreamReader xmlStreamReader = XMLInputFactory.newInstance().createXMLStreamReader(inputStream);
final Unmarshaller unmarshaller = JAXBContext.newInstance(TestPojo.class).createUnmarshaller();
final TestPojo topic = unmarshaller.unmarshal(xmlStreamReader, TestPojo.class).getValue();
System.out.println(topic.toString());
StringWriter stringWriter = new StringWriter();
Map<String, String> namespaces = new HashMap<String, String>();
namespaces.put("http://my.org/TopicNamespace", "tns");
Marshaller marshaller = JAXBContext.newInstance(TestPojo.class).createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
marshaller.setProperty(MarshallerProperties.NAMESPACE_PREFIX_MAPPER, namespaces);
QName qName = new QName("http://my.org/TopicNamespace","TopicExpression","tns");
JAXBElement<TestPojo> root = new JAXBElement<TestPojo>(qName, TestPojo.class, topic);
marshaller.marshal(root, stringWriter);
String result = stringWriter.toString();
System.out.println(result);
}
}
As you can see as of now I have populated the namespaces map directly. If this is dynamic then you can populate the same in a map and accordingly you can add it while marshaling.
This will provide the following output during the unmarshalling:
TestPojo(TopicExpression=
tns:t1/*/t3
, anyAttributes={Dialect=http://docs.oasis-open.org/wsn/t-1/TopicExpression/Concrete})
During marshaling:
<tns:TopicExpression xmlns:tns="http://my.org/TopicNamespace" Dialect="http://docs.oasis-open.org/wsn/t-1/TopicExpression/Concrete">
tns:t1/*/t3
</tns:TopicExpression>
So the solution I've implemented:
Created a new TopicExpressionType class which has fields not only for the expression but for the namespaces too, used in the expression:
public class TopicExpressionType {
String dialect;
String expression;
List<Namespace> namespaces;
public TopicExpressionType(String dialect, String expression, List<Namespace> namespaces) {
this.dialect = dialect;
this.expression = expression;
this.namespaces = namespaces;
}
public static class Namespace {
String prefix;
String namespace;
public Namespace(String prefix, String namespace) {
this.prefix = prefix;
this.namespace = namespace;
}
public String getPrefix() {
return prefix;
}
public String getNamespace() {
return namespace;
}
}
}
Then implemented an XmlAdapter that is aware of the specifics, knows how to extract namespace prefixes from the expression string and it can read/write namespace declarations on the TopicExpression XML element:
public class TopicExpressionTypeAdapter extends XmlAdapter<Element, TopicExpressionType> {
#Override
public Element marshal(TopicExpressionType topicExpression) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.newDocument();
Element element = document.createElementNS("http://docs.oasis-open.org/wsn/b-2", "mns1:TopicExpression");
element.setAttribute("Dialect", topicExpression.getDialect());
element.setTextContent(topicExpression.getExpression());
for (var ns : topicExpression.namespaces) {
element.setAttribute("xmlns:" + ns.prefix, ns.namespace);
}
return element;
}
#Override
public TopicExpressionType unmarshal(Element arg0) throws Exception {
if (arg0.getFirstChild() instanceof Text text) {
var expression = text.getData();
if (expression == null || expression.isBlank())
throw new TopicExpressionAdapterException("Empty content");
// Extract the prefixes from the expression
var namespacePrefixes = new ArrayList<String>();
getNamespacePrefixes(expression, namespacePrefixes);
//Now get the namespaces for the prefixes
var nsMap = new ArrayList<TopicExpressionType.Namespace>();
for (var prefix : namespacePrefixes) {
var namespace = arg0.getAttribute("xmlns:" + prefix);
if (namespace == null || namespace.isBlank())
throw new TopicExpressionAdapterException("Missing namespace declaration for the following prefix: " + prefix);
nsMap.add(new TopicExpressionType.Namespace(prefix, namespace));
}
var dialect = arg0.getAttribute("Dialect");
if (dialect == null || dialect.isBlank())
throw new TopicExpressionAdapterException("Missing Dialect attribute");
return new TopicExpressionType(dialect, expression, nsMap);
} else {
throw new TopicExpressionAdapterException("Unexpected child element type: " + arg0.getFirstChild().getClass().getName());
}
}
public static class TopicExpressionAdapterException extends Exception {
public TopicExpressionAdapterException(String message) {
super(message);
}
}
}
Note: Implementation of the getNamespacePrefixes() method is left out intentionally from this answer.
The last step is to add the following annotation wherever the TopicExpressionType is used in JAXB generated classes:
#XmlJavaTypeAdapter(TopicExpressionTypeAdapter.class)
TopicExpressionType topicExpression;
When I generate my xsd schema file is always called schema1.xsd, how can I set my own name with annotations or?
#XmlType(namespace = "http://www.xyz.nu/myname")
#XmlRootElement(name = "myname")
public class Myclass {
int id;
...
You can specify you xsd file name by defining your SchemaOutputResolver as following code shows:
public void generate() throws IOException, JAXBException {
JAXBContext jaxbContext = JAXBContext.newInstance(GlobalConfig.class);
SchemaOutputResolver sor = new MySchemaOutputResolver();
jaxbContext.generateSchema(sor);
}
private class MySchemaOutputResolver extends SchemaOutputResolver {
#Override
public Result createOutput(String namespaceUri, String suggestedFileName) throws IOException {
File file = new File("your_desired_name.xsd");
StreamResult result = new StreamResult(file);
result.setSystemId(file.toURI().toURL().toString());
return result;
}
}
More related:
JAXB Problem List
As an upshot (and I can expound by code examples if necessary), I have just realized that my REST API, written in Java, provided by CXF and served by Tomcat 7 is case sensitive when it comes to posting XML content.
Is there any way to make the XML, which usually is a marshalled representation of the entity a service creates, can be case insensitive?
I can certainly post examples of the entity class, service, and their annotations if necessary but as bare minimum, if an instance variable in the entity is private String firstName, the XML tag must be <firstName>...</firstName> and not <firstname>...</firstname> but I would like to make the latter marshall-able.
A complete solution involves a lot of work but it is perfectly possible. Following the link posted by #matiaselgart, the general solution would be
1 - Add a CXF Interceptor to manipulate the Message
2 - Read the incoming content, extract the XML, and process it with a StreamReaderDelegate to convert to lowercase
3 - Replace the content in Message with the output
The JAXB tags should be in lowercase, so the streamer could convert them easily, and be processed by JAXB unmarshaller. In your example private String firstName, the XML tag must be <firstname>...</firstname> and not <firstName>...</firstName>.
CXF Interceptor
public class CaseInsensitiveInterceptor extends AbstractPhaseInterceptor<Message> {
public CaseInsensitiveInterceptor () {
super(Phase.RECEIVE);
}
public void handleMessage(Message message) {
//Get the message body as input stream, process the xml, and set a new non-consumed inputStream into Message
InputStream in = message.getContent(InputStream.class);
InputStream bin = xmlToLowerCase (in);
message.setContent(InputStream.class, bin);
}
public void handleFault(Message messageParam) {
//Invoked when interceptor fails
}
}
Configuration
Add the interceptor in the bus or in the provider
<bean id="caseInsensitiveInterceptor" class="CaseInsensitiveInterceptor " />
<cxf:bus>
<cxf:inInterceptors>
<ref bean="caseInsensitiveInterceptor"/>
</cxf:inInterceptors>
</cxf:bus>
Case Insensitive StreamReaderDelegate
I think you can use the StreamReaderDelegate from here and convert the XMLStreamReader to InputStream using this link . The method xmlToLowerCase is called from interceptor
WARNING: I have not tested this part of the code.
private static class MyStreamReaderDelegate extends StreamReaderDelegate {
public MyStreamReaderDelegate(XMLStreamReader xsr) {
super(xsr);
}
#Override
public String getAttributeLocalName(int index) {
return super.getAttributeLocalName(index).toLowerCase();
}
#Override
public String getLocalName() {
return super.getLocalName().toLowerCase();
}
}
public InputStream xmlToLowerCase (InputStream in){
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(in);
xsr = new MyStreamReaderDelegate(xsr);
String xml = getOuterXml(xsr);
return new ByteArrayInputStream (xml.getBytes());
}
//https://coderanch.com/t/478588/XMLStreamReader-InputStream
private String getOuterXml(XMLStreamReader xmlr) throws TransformerConfigurationException,
TransformerFactoryConfigurationError, TransformerException
{
Transformer transformer = TransformerFactory.newInstance().newTransformer();
StringWriter stringWriter = new StringWriter();
transformer.transform(new StAXSource(xmlr), new StreamResult(stringWriter));
return stringWriter.toString();
}
We are performing some operations on embedded/Nested XML.I am using SAXParser to parse the entire XML file.I want to get the entire nested XML with tags and value.For example my XML looks like.
I want entire XML within the <ANY_ELEMENT>.....</ANY-ELEMENT> tag.
<?xml version="1.0" encoding="UTF-8"?>
<x:xMessage xmlns:x="http://www.connecture.com/integration/x" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.connecture.com/integration/x xMessageWrapper.xsd
">
<x:xMessageHeader>
<Version>850</Version>
<Source>Source</Source>
<Target>target</Target>
<Timestamp>2013-12-31T12:00:00</Timestamp>
<RequestID>123456</RequestID>
<ResponseID>54321</ResponseID>
<Priority>3</Priority>
<Username>Deepak</Username>
<Password>Kumar</Password>
</x:xMessageHeader>
<x:xMessageBody>
<ANY-ELEMENT>
<xEnveloped_834A1 xsi:schemaLocation="....." xmlns="......."
..........................
..........................
some Complex XML
..........................
..........................
..........................
</ANY-ELEMENT>
</x:XMessageBody>
</x:XMessage>
Handler class Sample code:
public class MessageWrapperHandler extends DefaultHandler {
private boolean bActualMessage = false;
private String actualMessage = null;
private long lengthActualMessage=0;
public void startElement(String uri, String localName, String qName, Attributes attributes) {
if (qName.equalsIgnoreCase("ANY-ELEMENT")) {
bActualMessage = true;
//lengthActualMessage=How to know the length of Child XML
}
}
public void characters(char ch[], int start, int length) {
if (bActualMessage) {
actualMessage = new String(ch, start, length);
//trying to get embedded XML
bActualMessage = false;
}
}
}
But since next element after is XML content so giving me nothing.SO How to achieve it.
EDIT: You are free to modify XML after <ANY-ELEMENT> like adding contents into CDATA
Instead of SAX, I would recommend using StAX (a StAX implementation is included in the JDK/JRE since Java SE 6). StAX is similar to SAX except instead of having the events pushed to you, you pull (request) them.
In the code below the XMLStreamReader is advanced to the ANY-ELEMENT element. Once it is at the correct position you can interact with it as you wish.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource xmlSource = new StreamSource("src/forum19559825/input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(xmlSource);
Demo demo = new Demo();
demo.positionXMLStreamReaderAtAnyElement(xsr);
demo.processAnyElement(xsr);
}
private void positionXMLStreamReaderAtAnyElement(XMLStreamReader xsr) throws Exception {
while(xsr.hasNext()) {
if(xsr.getEventType() == XMLStreamReader.START_ELEMENT && "ANY-ELEMENT".equals(xsr.getLocalName())) {
break;
}
xsr.next();
}
}
private void processAnyElement(XMLStreamReader xmlStreamReaderAtAnyElement) {
// TODO: Stuff
System.out.println("FOUND IT");
}
}
I know this has been asked multiple times here, but I've a different issue dealing with it. In my case, the app receives a non well-formed dom structure passed as a string. Here's a sample :
<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>
As you can see, the content is not well-formed. Now, if I try to parse using a normal SAX or DOM parse it'll throw an exception which is understood.
org.xml.sax.SAXParseException: The reference to entity "feature" must end with the ';' delimiter.
As per the requirement, I need to read this document,add few additional div tags and send the content back as a string. This works great by using a DOM parser as I can read through the input structure and add additional tags at their required position.
I tried using tools like JTidy to do a pre-processing and then parse, but that results in converting the document to a fully-blown html, which I don't want. Here's a sample code :
StringWriter writer = new StringWriter();
Tidy tidy = new Tidy(); // obtain a new Tidy instance
tidy.setXHTML(true);
tidy.parse(new ByteArrayInputStream(content.getBytes()), writer);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(writer.toString().getBytes()));
// Traverse thru the content and add new tags
....
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
This completely converts the input to a well-formed html document. It then becomes hard to remove html tags manually. The other option I tried was to use SAX2DOM, which too creates a HTML doc. Here's a sample code .
ByteArrayInputStream is = new ByteArrayInputStream(content.getBytes());
Parser p = new Parser();
p.setFeature(IContentExtractionConstant.SAX_NAMESPACE,true);
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(is));
Document doc = (Document)sax2dom.getDOM();
I'll appreciate if someone can share their ideas.
Thanks
The simplest way is replacing xml reserved characters with the corresponding xml entities. You can do this manually:
content.replaceAll("&", "&");
If you don't want to modify your string before parsing it, I could propose you another way using SaxParser, but this solution is more complicated. Basically you have to:
write a LexicalHandler in
combination with ContentHandler
tell the parser to continue its
execution after fatal error (the
ErrorHandler isn't enough)
treat undeclared entities as simple
text
UPDATE
According to your comment, I'm going to add some details regarding the second solution. I've writed a class which extends DefaulHandler (default implementation of EntityResolver, DTDHandler, ContentHandler and ErrorHandler) and implements LexicalHandler. I've extended ErrorHandler's fatalError method (my implementations does nothing instead of throwing the exception) and ContentHandler's characters method which works in combination with startEntity method of LexicalHandler.
public class MyHandler extends DefaultHandler implements LexicalHandler {
private String currentEntity = null;
#Override
public void fatalError(SAXParseException e) throws SAXException {
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content = new String(ch, start, length);
if (currentEntity != null) {
content = "&" + currentEntity + content;
currentEntity = null;
}
System.out.print(content);
}
#Override
public void startEntity(String name) throws SAXException {
currentEntity = name;
}
#Override
public void endEntity(String name) throws SAXException {
}
#Override
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
}
#Override
public void endDTD() throws SAXException {
}
#Override
public void startCDATA() throws SAXException {
}
#Override
public void endCDATA() throws SAXException {
}
#Override
public void comment(char[] ch, int start, int length) throws SAXException {
}
}
This is my main which parses your xml not well formed. It's very important the setFeature, because without it the parser throws the SaxParseException despite of the ErrorHandler empty implementation.
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
String xml = "<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlReader.setContentHandler(myHandler);
xmlReader.setErrorHandler(myHandler);
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
xmlReader.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
xmlReader.parse(new InputSource(new StringReader(xml)));
}
This main prints out the content of your div element which contains the error:
http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata
Keep in mind that this is an example which works with your input, maybe you'll have to complete it...for instance if you have some characters correctly escaped you should add some lines of code to handle this situation etc.
Hope this helps.