How to get XML element information in case of SAXParseException

How to get XML element information in case of SAXParseException - java

When validating an xml source against an xsd schema in a standard java environment, i cannot find a way to get the information about the element that failed validation (in many specific cases).
When catching a SAXParseException, the information of the element is gone. However, when debugging into the xerces.XmlSchemaValidator, i can see that the reason is the specific error message that is not defined to give away information about the element.
For example (and this is also the case in my java demo) the "cvc-mininclusive-valid" error is defined this way:
cvc-minInclusive-valid: Value ''{0}'' is not facet-valid with respect to minInclusive ''{1}'' for type ''{2}''.
https://wiki.xmldation.com/Support/Validator/cvc-mininclusive-valid
What I would would prefer is, that this kind of message would be produced:
cvc-type.3.1.3: The value ''{1}'' of element ''{0}'' is not valid. https://wiki.xmldation.com/Support/Validator/cvc-type-3-1-3
When debugging into xerces.XMLSchemaValidator, I can see that there are two consecutive calls to reportSchemaError(...) - the second only occuring, if the first one did return without an exception being thrown.
Is there any way to configure the validator to use the second way of reporting OR to enrich the SAXParseException with the element information?
Please see my copy&paste&runnable example code below for further explanation:
String xsd =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n" +
"<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" version=\"1.0\">" +
"<xs:element name=\"demo\">" +
"<xs:complexType>" +
"<xs:sequence>" +
// given are two elements that cannot be < 1
"<xs:element name=\"foo\" type=\"xs:positiveInteger\" minOccurs=\"0\" maxOccurs=\"unbounded\" />" +
"<xs:element name=\"bar\" type=\"xs:positiveInteger\" minOccurs=\"0\" maxOccurs=\"unbounded\" />" +
"</xs:sequence>" +
"</xs:complexType>" +
"</xs:element>" +
"</xs:schema>";
String xml =
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<demo>" +
"<foo>1</foo>" +
// invalid!
"<foo>0</foo>" +
"<bar>2</bar>" +
"</demo>";
Validator validator = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
.newSchema(new StreamSource(new StringReader(xsd)))
.newValidator();
try {
validator.validate(new StreamSource(new StringReader(xml)));
} catch (SAXParseException e) {
// unfortunately no element or line/column info:
System.err.println(e.getMessage());
// better, but still no element info:
System.err.println(String.format("Line %s - Column %s - %s",
e.getLineNumber(),
e.getColumnNumber(),
e.getMessage()));
}

This isn't well documented but if you have a recent version of Xerces-J (see SVN Rev 380997), you can validate a DOMSource and query the Validator from your ErrorHandler to retrieve the current Element node that the validator was processing when it reported the error.
For example, you could write an ErrorHandler like:
public class ValidatorErrorHandler implements ErrorHandler {
private Validator validator;
public ValidatorErrorHandler(Validator v) {
validator = v;
}
...
public void error(SAXParseException spe) throws SAXException {
Node node = null;
try {
node = (Node)
validator.getProperty(
"http://apache.org/xml/properties/dom/current-element-node");
}
catch (SAXException se) {}
...
}
and then invoke the Validator with this ErrorHandler like:
Validator validator = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
.newSchema(new StreamSource(new StringReader(xsd)))
.newValidator();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(new StringReader(xml));
ErrorHandler errorHandler = new ValidatorErrorHandler(validator);
validator.setErrorHandler(errorHandler);
validator.validate(new DOMSource(doc));
to obtain the element where an error occurred.

Try using an error handler:
public class LoggingErrorHandler implements ErrorHandler {
private boolean isValid = true;
public boolean isValid() {
return this.isValid;
}
#Override
public void warning(SAXParseException exc) {
System.err.println(exc);
}
#Override
public void error(SAXParseException exc) {
System.err.println(exc);
this.isValid = false;
}
#Override
public void fatalError(SAXParseException exc) throws SAXParseException {
System.err.println(exc);
this.isValid = false;
throw exc;
}
}
and use it in validator:
Validator validator = SchemaFactory
.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI)
.newSchema(new StreamSource(new StringReader(xsd)))
.newValidator();
LoggingErrorHandler errorHandler = new LoggingErrorHandler();
validator.setErrorHandler(errorHandler);
validator.validate(new StreamSource(new StringReader(xml)));
return errorHandler.isValid();

I know this is old, but the answer from Michael Glavassevich works like charme! I'm not yet able to upvote or comment, but this one offers his real deep knowledge.

Related

Disallow entity declarations but allow DTDs

I am given an XML document that must be allowed to have a Document Type Declaration (DTD), but we prohibit any ENTITY declarations.
The XML document is parsed with SAXParser.parse(), as follows:
SAXParserFactory factory = SAXParserFactory.newInstance();
factory.setFeature("http://xml.org/sax/features/external-general-entities", false);
factory.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
factory.setValidating(true);
SAXParser parser = factory.newSAXParser();
The XML is then passed into the parser as an InputSource:
InputSource inputSource= ... ;
parser.parse(inputSource, handler);
And the handler has a resolveEntity method, which SAXParser.parse() calls:
public InputSource resolveEntity(String pubID, String sysID) throws SAXException {
InputSource inputSource = null;
try {
inputSource = entityResolver.resolveEntity(publicId, systemId);
}
catch (IOException e) {
throw new SAXException(e);
}
return inputSource;
}
When I pass in an XML file with an ENTITY reference, it seems that nothing is being done - no exceptions are thrown and nothing is stripped - to or about the prohibited ENTITY reference.
Here is an example of the bad XML I am using. The DTD should be allowed, but the !ENTITY line should be disallowed:
<!DOCTYPE foo SYSTEM "foo.dtd" [
<!ENTITY gotcha SYSTEM "file:///gotcha.txt"> <!-- This is disallowed-->
]>
<label>&gotcha;</label>
What do I need to do to make sure that ENTITY references are disallowed in the XML, but that DTDs are still allowed?

Set a org.xml.sax.ext.DeclHandler on the SaxParser.
parser.setProperty("http://xml.org/sax/properties/declaration-handler", myDeclHandler);
The DeclHandler gets notified when a internal entity declaration is parsed. To disallow entity decls you can simple throw a SAXException:
public class MyDeclHandler extends org.xml.sax.ext.DefaultHandler2 {
public void internalEntityDecl(String name, String value) throws SAXException {
throw new SAXException("not allowed");
}
}

JAXB Validator does not detect syntax errors?

I want to validate a xml file with its xsd before unmarshalling it.
The code is as follows :
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(xsdFilePath);
Validator validator = schema.newValidator();
validator.setErrorHandler(new MyValidationErrorHandler());
validator.validate(new StreamSource(xmlFilePath));
I found that when a xml element is not closed, Validator failed to record it as an error, But the UnMarshaller recognizes this and throws an "Invalid content was found starting with element.." Error.
I want the Validation and the Unmarshalling/Marshalling to be different operations.
Are there ways to have the Validator detect such syntax errors in the xml file?

You'll have to distinguish two things:
The elementary syntax of an XML document
The document's compliance with an XML SChema
If the elementary syntax isn't right, there's no document that can be investigated for its element structure, attribure existence, value compliance with facets and so on and so on.
I'm afraid you'll have to catch both kinds of exceptions.
You may, however, handle everything in a single unmarshalling operation:
JAXBContext payloadContext = JAXBContext.newInstance("generated");
Unmarshaller unmarshaller = payloadContext.createUnmarshaller();
unmarshaller.setSchema(schemaFactory.newSchema(... )););
unmarshaller.setEventHandler( new ValidationEventHandler(){
public boolean handleEvent(ValidationEvent event) {
System.out.println( "Event! " + event );
return true;
}
} );
Later
To have validation only, you'll still have to parse, but if you don't have JAXB-ish classes, you get by with JAXP:
static class Handler implements ErrorHandler {
public void error(SAXParseException exception){
System.out.println( "error: " + exception.getMessage() );
}
public void fatalError(SAXParseException exception){
System.out.println( "fatal: " + exception.getMessage() );
}
public void warning(SAXParseException exception){
System.out.println( "warning: " + exception.getMessage() );
}
}
Handler handler = new Handler();
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
parser.setErrorHandler( handler );
try {
Document document = parser.parse(new File("test.xml"));
SchemaFactory factory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source schemaFile = new StreamSource(new File("test.xsd"));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.setErrorHandler( handler );
try {
validator.validate(new DOMSource(document));
} catch (SAXException e) {
// ...
System.out.println( "VAlidation error" );
}
} catch (SAXParseException e) {
// syntax error in XML document
System.out.println( "Syntax error" );
}
For validation, setting a handler will not throw a ParseException, so one of these is redundant.

How can I get more information on an invalid DOM element through the Validator?

I am validating an in-memory DOM object using the javax.xml.validation.Validator class against an XSD schema. I am getting a SAXParseException being thrown during the validation whenever there is some data corruption in the information I populate my DOM from.
An example error:
org.xml.SAXParseException: cvc-datatype-valid.1.2.1: '???"??[?????G?>???p~tn??~0?1]' is not a valid valud for 'hexBinary'.
What I am hoping is that there is a way to find the location of this error in my in-memory DOM and print out the offending element and its parent element. My current code is:
public void writeDocumentToFile(Document document) throws XMLWriteException {
try {
// Validate the document against the schema
Validator validator = getSchema(xmlSchema).newValidator();
validator.validate(new DOMSource(document));
// Serialisation logic here.
} catch(SAXException e) {
throw new XMLWriteException(e); // This is being thrown
} // Some other exceptions caught here.
}
private Schema getSchema(URL schema) throws SAXException {
SchemaFactory schemaFactory =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Some logic here to specify a ResourceResolver
return schemaFactory.newSchema(schema);
}
I have looked into the Validator#setErrorHandler(ErrorHandler handler) method but the ErrorHandler interface only gives me exposure to a SAXParseException which only exposes the line number and column number of the error. Because I am using an in-memory DOM this returns -1 for both line and column number.
Is there a better way to do this? I don't really want to have to manually validate the Strings before I add them to the DOM if the libraries provide me the function I'm looking for.
I'm using JDK 6 update 26 and JDK 6 update 7 depending on where this code is running.
EDIT: With this code added -
validator.setErrorHandler(new ErrorHandler() {
#Override
public void warning(SAXParseException exception) throws SAXException {
printException(exception);
throw exception;
}
#Override
public void error(SAXParseException exception) throws SAXException {
printException(exception);
throw exception;
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
printException(exception);
throw exception;
}
private void printException(SAXParseException exception) {
System.out.println("exception.getPublicId() = " + exception.getPublicId());
System.out.println("exception.getSystemId() = " + exception.getSystemId());
System.out.println("exception.getColumnNumber() = " + exception.getColumnNumber());
System.out.println("exception.getLineNumber() = " + exception.getLineNumber());
}
});
I get the output:
exception.getPublicId() = null
exception.getSystemId() = null
exception.getColumnNumber() = -1
exception.getLineNumber() = -1

If you are using Xerces (the Sun JDK default), you can get the element that failed validation through the http://apache.org/xml/properties/dom/current-element-node property:
...
catch (SAXParseException e)
{
Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
System.out.println("Validation error: " + e.getMessage());
System.out.println("Element: " + curElement);
}
Example:
String xml = "<root xmlns=\"http://www.myschema.org\">\n" +
"<text>This is text</text>\n" +
"<number>32</number>\n" +
"<number>abc</number>\n" +
"</root>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(new ByteArrayInputStream(xml.getBytes("UTF-8")));
Schema schema = getSchema(getClass().getResource("myschema.xsd"));
Validator validator = schema.newValidator();
try
{
validator.validate(new DOMSource(doc));
}
catch (SAXParseException e)
{
Element curElement = (Element)validator.getProperty("http://apache.org/xml/properties/dom/current-element-node");
System.out.println("Validation error: " + e.getMessage());
System.out.println(curElement.getLocalName() + ": " + curElement.getTextContent());
//Use curElement.getParentNode() or whatever you need here
}
If you need to get line/column numbers from the DOM, this answer has a solution to that problem.

SaxParseException exposes the SystemId and PublicId. Does that not give you enough information?

How to skip well-formed for java DOM parser

I know this has been asked multiple times here, but I've a different issue dealing with it. In my case, the app receives a non well-formed dom structure passed as a string. Here's a sample :
<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>
As you can see, the content is not well-formed. Now, if I try to parse using a normal SAX or DOM parse it'll throw an exception which is understood.
org.xml.sax.SAXParseException: The reference to entity "feature" must end with the ';' delimiter.
As per the requirement, I need to read this document,add few additional div tags and send the content back as a string. This works great by using a DOM parser as I can read through the input structure and add additional tags at their required position.
I tried using tools like JTidy to do a pre-processing and then parse, but that results in converting the document to a fully-blown html, which I don't want. Here's a sample code :
StringWriter writer = new StringWriter();
Tidy tidy = new Tidy(); // obtain a new Tidy instance
tidy.setXHTML(true);
tidy.parse(new ByteArrayInputStream(content.getBytes()), writer);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(writer.toString().getBytes()));
// Traverse thru the content and add new tags
....
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(doc);
transformer.transform(source, result);
This completely converts the input to a well-formed html document. It then becomes hard to remove html tags manually. The other option I tried was to use SAX2DOM, which too creates a HTML doc. Here's a sample code .
ByteArrayInputStream is = new ByteArrayInputStream(content.getBytes());
Parser p = new Parser();
p.setFeature(IContentExtractionConstant.SAX_NAMESPACE,true);
SAX2DOM sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(is));
Document doc = (Document)sax2dom.getDOM();
I'll appreciate if someone can share their ideas.
Thanks

The simplest way is replacing xml reserved characters with the corresponding xml entities. You can do this manually:
content.replaceAll("&", "&");
If you don't want to modify your string before parsing it, I could propose you another way using SaxParser, but this solution is more complicated. Basically you have to:
write a LexicalHandler in
combination with ContentHandler
tell the parser to continue its
execution after fatal error (the
ErrorHandler isn't enough)
treat undeclared entities as simple
text
UPDATE
According to your comment, I'm going to add some details regarding the second solution. I've writed a class which extends DefaulHandler (default implementation of EntityResolver, DTDHandler, ContentHandler and ErrorHandler) and implements LexicalHandler. I've extended ErrorHandler's fatalError method (my implementations does nothing instead of throwing the exception) and ContentHandler's characters method which works in combination with startEntity method of LexicalHandler.
public class MyHandler extends DefaultHandler implements LexicalHandler {
private String currentEntity = null;
#Override
public void fatalError(SAXParseException e) throws SAXException {
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String content = new String(ch, start, length);
if (currentEntity != null) {
content = "&" + currentEntity + content;
currentEntity = null;
}
System.out.print(content);
}
#Override
public void startEntity(String name) throws SAXException {
currentEntity = name;
}
#Override
public void endEntity(String name) throws SAXException {
}
#Override
public void startDTD(String name, String publicId, String systemId)
throws SAXException {
}
#Override
public void endDTD() throws SAXException {
}
#Override
public void startCDATA() throws SAXException {
}
#Override
public void endCDATA() throws SAXException {
}
#Override
public void comment(char[] ch, int start, int length) throws SAXException {
}
}
This is my main which parses your xml not well formed. It's very important the setFeature, because without it the parser throws the SaxParseException despite of the ErrorHandler empty implementation.
public static void main(String[] args) throws ParserConfigurationException,
SAXException, IOException {
String xml = "<div class='video yt'><div class='yt_url'>http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata</div></div>";
SAXParser saxParser = SAXParserFactory.newInstance().newSAXParser();
XMLReader xmlReader = saxParser.getXMLReader();
MyHandler myHandler = new MyHandler();
xmlReader.setContentHandler(myHandler);
xmlReader.setErrorHandler(myHandler);
xmlReader.setProperty("http://xml.org/sax/properties/lexical-handler",
myHandler);
xmlReader.setFeature(
"http://apache.org/xml/features/continue-after-fatal-error",
true);
xmlReader.parse(new InputSource(new StringReader(xml)));
}
This main prints out the content of your div element which contains the error:
http://www.youtube.com/watch?v=U_QLu_Twd0g&feature=abcde_gdata
Keep in mind that this is an example which works with your input, maybe you'll have to complete it...for instance if you have some characters correctly escaped you should add some lines of code to handle this situation etc.
Hope this helps.

Validate an XML File Against Multiple Schema Definitions

I'm trying to validate an XML file against a number of different schemas (apologies for the contrived example):
a.xsd
b.xsd
c.xsd
c.xsd in particular imports b.xsd and b.xsd imports a.xsd, using:
<xs:include schemaLocation="b.xsd"/>
I'm trying to do this via Xerces in the following manner:
XMLSchemaFactory xmlSchemaFactory = new XMLSchemaFactory();
Schema schema = xmlSchemaFactory.newSchema(new StreamSource[] { new StreamSource(this.getClass().getResourceAsStream("a.xsd"), "a.xsd"),
new StreamSource(this.getClass().getResourceAsStream("b.xsd"), "b.xsd"),
new StreamSource(this.getClass().getResourceAsStream("c.xsd"), "c.xsd")});
Validator validator = schema.newValidator();
validator.validate(new StreamSource(new StringReader(xmlContent)));
but this is failing to import all three of the schemas correctly resulting in cannot resolve the name 'blah' to a(n) 'group' component.
I've validated this successfully using Python, but having real problems with Java 6.0 and Xerces 2.8.1. Can anybody suggest what's going wrong here, or an easier approach to validate my XML documents?

So just in case anybody else runs into the same issue here, I needed to load a parent schema (and implicit child schemas) from a unit test - as a resource - to validate an XML String. I used the Xerces XMLSchemFactory to do this along with the Java 6 validator.
In order to load the child schema's correctly via an include I had to write a custom resource resolver. Code can be found here:
https://code.google.com/p/xmlsanity/source/browse/src/com/arc90/xmlsanity/validation/ResourceResolver.java
To use the resolver specify it on the schema factory:
xmlSchemaFactory.setResourceResolver(new ResourceResolver());
and it will use it to resolve your resources via the classpath (in my case from src/main/resources). Any comments are welcome on this...

http://www.kdgregory.com/index.php?page=xml.parsing
section 'Multiple schemas for a single document'
My solution based on that document:
URL xsdUrlA = this.getClass().getResource("a.xsd");
URL xsdUrlB = this.getClass().getResource("b.xsd");
URL xsdUrlC = this.getClass().getResource("c.xsd");
SchemaFactory schemaFactory = schemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
//---
String W3C_XSD_TOP_ELEMENT =
"<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?>\n"
+ "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"qualified\">\n"
+ "<xs:include schemaLocation=\"" +xsdUrlA.getPath() +"\"/>\n"
+ "<xs:include schemaLocation=\"" +xsdUrlB.getPath() +"\"/>\n"
+ "<xs:include schemaLocation=\"" +xsdUrlC.getPath() +"\"/>\n"
+"</xs:schema>";
Schema schema = schemaFactory.newSchema(new StreamSource(new StringReader(W3C_XSD_TOP_ELEMENT), "xsdTop"));

The schema stuff in Xerces is (a) very, very pedantic, and (b) gives utterly useless error messages when it doesn't like what it finds. It's a frustrating combination.
The schema stuff in python may be a lot more forgiving, and was letting small errors in the schema go past unreported.
Now if, as you say, c.xsd includes b.xsd, and b.xsd includes a.xsd, then there's no need to load all three into the schema factory. Not only is it unnecessary, it will likely confuse Xerces and result in errors, so this may be your problem. Just pass c.xsd to the factory, and let it resolve b.xsd and a.xsd itself, which it should do relative to c.xsd.

From the xerces documentation :
http://xerces.apache.org/xerces2-j/faq-xs.html
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
...
StreamSource[] schemaDocuments = /* created by your application */;
Source instanceDocument = /* created by your application */;
SchemaFactory sf = SchemaFactory.newInstance(
"http://www.w3.org/XML/XMLSchema/v1.1");
Schema s = sf.newSchema(schemaDocuments);
Validator v = s.newValidator();
v.validate(instanceDocument);

I faced the same problem and after investigating found this solution. It works for me.
Enum to setup the different XSDs:
public enum XsdFile {
// #formatter:off
A("a.xsd"),
B("b.xsd"),
C("c.xsd");
// #formatter:on
private final String value;
private XsdFile(String value) {
this.value = value;
}
public String getValue() {
return this.value;
}
}
Method to validate:
public static void validateXmlAgainstManyXsds() {
final SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
String xmlFile;
xmlFile = "example.xml";
// Use of Enum class in order to get the different XSDs
Source[] sources = new Source[XsdFile.class.getEnumConstants().length];
for (XsdFile xsdFile : XsdFile.class.getEnumConstants()) {
sources[xsdFile.ordinal()] = new StreamSource(xsdFile.getValue());
}
try {
final Schema schema = schemaFactory.newSchema(sources);
final Validator validator = schema.newValidator();
System.out.println("Validating " + xmlFile + " against XSDs " + Arrays.toString(sources));
validator.validate(new StreamSource(new File(xmlFile)));
} catch (Exception exception) {
System.out.println("ERROR: Unable to validate " + xmlFile + " against XSDs " + Arrays.toString(sources)
+ " - " + exception);
}
System.out.println("Validation process completed.");
}

I ended up using this:
import org.apache.xerces.parsers.SAXParser;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
import java.io.IOException;
.
.
.
try {
SAXParser parser = new SAXParser();
parser.setFeature("http://xml.org/sax/features/validation", true);
parser.setFeature("http://apache.org/xml/features/validation/schema", true);
parser.setFeature("http://apache.org/xml/features/validation/schema-full-checking", true);
parser.setProperty("http://apache.org/xml/properties/schema/external-noNamespaceSchemaLocation", "http://your_url_schema_location");
Validator handler = new Validator();
parser.setErrorHandler(handler);
parser.parse("file:///" + "/home/user/myfile.xml");
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException ex) {
e.printStackTrace();
}
class Validator extends DefaultHandler {
public boolean validationError = false;
public SAXParseException saxParseException = null;
public void error(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException = exception;
}
public void fatalError(SAXParseException exception)
throws SAXException {
validationError = true;
saxParseException = exception;
}
public void warning(SAXParseException exception)
throws SAXException {
}
}
Remember to change:
1) The parameter "http://your_url_schema_location" for you xsd file location.
2) The string "/home/user/myfile.xml" for the one pointing to your xml file.
I didn't have to set the variable: -Djavax.xml.validation.SchemaFactory:http://www.w3.org/2001/XMLSchema=org.apache.xerces.jaxp.validation.XMLSchemaFactory

Just in case, anybody still come here to find the solution for validating xml or object against multiple XSDs, I am mentioning it here
//Using **URL** is the most important here. With URL, the relative paths are resolved for include, import inside the xsd file. Just get the parent level xsd here (not all included xsds).
URL xsdUrl = getClass().getClassLoader().getResource("my/parent/schema.xsd");
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = schemaFactory.newSchema(xsdUrl);
JAXBContext jaxbContext = JAXBContext.newInstance(MyClass.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
unmarshaller.setSchema(schema);
/* If you need to validate object against xsd, uncomment this
ObjectFactory objectFactory = new ObjectFactory();
JAXBElement<MyClass> wrappedObject = objectFactory.createMyClassObject(myClassObject);
marshaller.marshal(wrappedShipmentMessage, new DefaultHandler());
*/
unmarshaller.unmarshal(getClass().getClassLoader().getResource("your/xml/file.xml"));

If all XSDs belong to the same namespace then create a new XSD and import other XSDs into it. Then in java create schema with the new XSD.
Schema schema = xmlSchemaFactory.newSchema(
new StreamSource(this.getClass().getResourceAsStream("/path/to/all_in_one.xsd"));
all_in_one.xsd :
<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:ex="http://example.org/schema/"
targetNamespace="http://example.org/schema/"
elementFormDefault="unqualified"
attributeFormDefault="unqualified">
<xs:include schemaLocation="relative/path/to/a.xsd"></xs:include>
<xs:include schemaLocation="relative/path/to/b.xsd"></xs:include>
<xs:include schemaLocation="relative/path/to/c.xsd"></xs:include>
</xs:schema>

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to get XML element information in case of SAXParseException - java

I know this is old, but the answer from Michael Glavassevich works like charme! I'm not yet able to upvote or comment, but this one offers his real deep knowledge.

Related

Disallow entity declarations but allow DTDs

JAXB Validator does not detect syntax errors?

How can I get more information on an invalid DOM element through the Validator?

How to skip well-formed for java DOM parser

Validate an XML File Against Multiple Schema Definitions

Categories

Resources