I have an xsd file and an xml file, I am validating the xml file against the xsd file using the following code
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setValidating(true);
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaLanguage",
"http://www.w3.org/2001/XMLSchema");
factory.setAttribute(
"http://java.sun.com/xml/jaxp/properties/schemaSource",
new InputSource(new StringReader(xsd)));
Document doc = null;
try {
DocumentBuilder parser = factory.newDocumentBuilder();
MyErrorHandler errorHandler = new MyErrorHandler();
parser.setErrorHandler(errorHandler);
doc = parser.parse(new InputSource(new StringReader(xml)));
return true;
} catch (ParserConfigurationException e) {
System.out.println("Parser not configured: " + e.getMessage());
} catch (SAXException e) {
System.out.print("Parsing XML failed due to a "
+ e.getClass().getName() + ":");
System.out.println(e.getMessage());
} catch (IOException e) {
System.out.println("IOException thrown");
e.printStackTrace();
}
return false;
MyErrorHanlder is
private static class MyErrorHandler implements ErrorHandler {
public void warning(SAXParseException spe) throws SAXException {
System.out.println("Warning: " + spe.getMessage() + " getColumnNumber is " + spe.getColumnNumber() + " getLineNumber " + spe.getLineNumber() + " getPublicId " + spe.getPublicId() + " getSystemId " + spe.getSystemId());
}
public void error(SAXParseException spe) throws SAXException {
System.out.println("Error: " + spe.getMessage() + " getColumnNumber is " + spe.getColumnNumber() + " getLineNumber " + spe.getLineNumber() + " getPublicId " + spe.getPublicId() + " getSystemId " + spe.getSystemId());
throw new SAXException("Error: " + spe.getMessage());
}
public void fatalError(SAXParseException spe) throws SAXException {
System.out.println("Fatal Error: " + spe.getMessage() + " getColumnNumber is " + spe.getColumnNumber() + " getLineNumber " + spe.getLineNumber() + " getPublicId " + spe.getPublicId() + " getSystemId " + spe.getSystemId());
throw new SAXException("Fatal Error: " + spe.getMessage());
}
}
And when the xml does not comply with xsd I get an exception.. but this exception does not have the name of the xsd element due to which this error has occured .. The message looks like
Parsing XML failed due to a org.xml.sax.SAXException:Error: cvc-minLength-valid: Value '' with length = '0' is not facet-valid with respect to minLength '1' for type 'null'.
Instead of printing the name of the xsd element, the error message just has ''. Because of this I am not able to find and display(to the user) the exact element which is causing the error.
My xsd element looks like this
<xs:element name="FullName_FirstName">
<xs:annotation>
<xs:appinfo>
<ie:label>First Name</ie:label>
<ie:html_element>0</ie:html_element>
</xs:appinfo>
</xs:annotation>
<xs:simpleType>
<xs:restriction base="xs:string">
<xs:minLength value="1"/>
</xs:restriction>
</xs:simpleType>
</xs:element>
Thanks in advance
First of all, some advice. You don't need to build a DOM document just to do validation. This causes a large amount of memory overhead, maybe even running out on large input XML documents. You could just use a SAXParser. If you're using Java 1.5 or later, that isn't even necessary. From that version on, an XML validation API was included in Java SE. Check package javax.xml.validation for more info. The idea is that you first build a Schema object, then obtain a Validator from that which can be used to do validation. It accepts any Source implementation for input. Validators can also be given ErrorHandlers, so you can just reuse your class. Of course, it is possible that you actually will need a DOM, but in that case it's still better to make a Schema instance and register that with your DocumentBuilderFactory.
Now, for the actual problem. This isn't entirely easy, since the SAXParseException doesn't provide you with much context information. Your best bet is to have a ContentHandler hooked up somewhere and keep track of what element you're in, or some other positional information. You could then have that given to the error handler when needed. The class DefaultHandler or DefaultHandler2 is a convenient way of combining both error and content handling. You'll find those classes in package org.xml.sax.ext.
I've put together a test that I'll post below. Now, I do get two lines of output instead of the expected one. If this is because I'm using a Schema, or because I'm not throwing an exception and keep on processing, I'm not certain. The second line does contain the name of the element, so that might be enough. You could have some flag set on errors instead of throwing an exception and ending the parsing.
package jaxb.test;
import java.io.StringReader;
import javax.xml.XMLConstants;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
public class ValidationTest {
public static void main(String[] args) throws Exception {
//Test XML and schema
final String xml = "<?xml version=\"1.0\"?><test><test2></test2></test>";
final String schemaString =
"<?xml version=\"1.0\"?>"
+ "<xsd:schema xmlns:xsd=\"http://www.w3.org/2001/XMLSchema\" elementFormDefault=\"unqualified\" attributeFormDefault=\"unqualified\">"
+ "<xsd:element name=\"test\" type=\"Test\"/>"
+ "<xsd:element name=\"test2\" type=\"Test2\"/>"
+ "<xsd:complexType name=\"Test\">"
+ "<xsd:sequence>"
+ "<xsd:element ref=\"test2\" minOccurs=\"1\" maxOccurs=\"unbounded\"/>"
+ "</xsd:sequence>"
+ "</xsd:complexType>"
+ "<xsd:simpleType name=\"Test2\">"
+ "<xsd:restriction base=\"xsd:string\"><xsd:minLength value=\"1\"/></xsd:restriction>"
+ "</xsd:simpleType>"
+ "</xsd:schema>";
//Building a Schema instance
final Source schemaSource =
new StreamSource(new StringReader(schemaString));
final Schema schema =
SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI).newSchema(schemaSource);
//Creating a SAXParser for our input XML
//First the factory
final SAXParserFactory factory = SAXParserFactory.newInstance();
//Must be namespace aware to receive element names
factory.setNamespaceAware(true);
//Setting the Schema for validation
factory.setSchema(schema);
//Now the parser itself
final SAXParser parser = factory.newSAXParser();
//Creating an instance of our special handler
final MyContentHandler handler = new MyContentHandler();
//Parsing
parser.parse(new InputSource(new StringReader(xml)), handler);
}
private static class MyContentHandler extends DefaultHandler {
private String element = "";
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if(localName != null && !localName.isEmpty())
element = localName;
else
element = qName;
}
#Override
public void warning(SAXParseException exception) throws SAXException {
System.out.println(element + ": " + exception.getMessage());
}
#Override
public void error(SAXParseException exception) throws SAXException {
System.out.println(element + ": " + exception.getMessage());
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
System.out.println(element + ": " + exception.getMessage());
}
public String getElement() {
return element;
}
}
}
It's a bit rough, but you can work on from this to get what you need.
Related
shorten version of my XML file look like this:
<?xml version="1.0" encoding="UTF-8"?>
<MzIdentML id="MS-GF+">
<SequenceCollection xmlns="http://psidev.info/psi/pi/mzIdentML/1.1">
<DBSequence length="146" id="DBSeq143">
<cvParam cvRef="PSI-MS" accession="MS:1001088"></cvParam>
</DBSequence>
<Peptide id="Pep7">
<PeptideSequence>MFLSFPTTK</PeptideSequence>
<Modification location="1" monoisotopicMassDelta="15.994915">
<cvParam cvRef="UNIMOD" accession="UNIMOD:35" name="Oxidation"></cvParam>
</Modification>
</Peptide>
<PeptideEvidence dBSequence_ref="DBSeq143" id="PepEv_160_1_18"></PeptideEvidence>
<PeptideEvidence dBSequence_ref="DBSeq143" id="PepEv_275_8_133"></PeptideEvidence>
</SequenceCollection>
</MzIdentML>
I want to get DBSequence, Peptide and PeptideEvidence details separately.but attributes of parent and children(or nested children..if there are).In other words, I want all the attribues as key-value pairs in each section I illustrated bellow:
----------------------------------------------------------------------
<DBSequence length="146" id="DBSeq143">
<cvParam cvRef="PSI-MS" accession="MS:1001088"></cvParam>
</DBSequence>
----------------------------------------------------------------------
<Peptide id="Pep7">
<PeptideSequence>MFLSFPTTK</PeptideSequence>
<Modification location="1" monoisotopicMassDelta="15.994915">
<cvParam cvRef="UNIMOD" accession="UNIMOD:35" name="Oxidation"></cvParam>
</Modification>
</Peptide>
----------------------------------------------------------------------
<PeptideEvidence dBSequence_ref="DBSeq143" id="PepEv_160_1_18"></PeptideEvidence>
<PeptideEvidence dBSequence_ref="DBSeq143" id="PepEv_275_8_133"></PeptideEvidence>
----------------------------------------------------------------------
For example, if we consider <DBSequence> section:
<DBSequence length="146" id="DBSeq143">
<cvParam cvRef="PSI-MS" accession="MS:1001088"></cvParam>
</DBSequence>
should be output as:
DBSequence=>length=146;id=DBSeq143;cvRef=PSI-MS;accession=MS:1001088;
This is the code I wrote in SAX:
package lucene.parse;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class MzIdentMLSAXParser extends DefaultHandler {
private boolean isDBsequence = false;
String DBSequenceSection;
String PeptideEvidenceDocument;
public static void main(String[] argv) throws SAXException, ParserConfigurationException, IOException {
MzIdentMLSAXParser ps = new MzIdentMLSAXParser("file_path_here/sample.xml");
}
public MzIdentMLSAXParser(String dataDir) throws FileNotFoundException, SAXException, ParserConfigurationException, IOException {
FileInputStream fis = new FileInputStream(dataDir);
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
parser.parse(fis, this);
}
#Override
public void startElement(String uri, String localName, String qName, Attributes atts) throws SAXException {
if (qName.equals("DBSequence")) {
// each time we found a new DBSequence, we re-initialize DBSequenceSection
DBSequenceSection = "";
// get attributes of DBSequence
for (int i = 0; i < atts.getLength(); i++) {
DBSequenceSection += atts.getQName(i) + "=" + atts.getValue(i) + ";";
}
isDBsequence = true;
} else if ((qName.equals("cvParam")) && (isDBsequence)) {
// get attributes of cvParam which are belongs to DBSequence
// there can be cvParam that are not belongs to DBSequence.
for (int i = 0; i < atts.getLength(); i++) {
DBSequenceSection += atts.getQName(i) + "=" + atts.getValue(i) + ";";
}
} else if (qName.equals("PeptideEvidence")) {
// each time we found a new PeptideEvidence, we re-initialize docuDBSequenceSectionment
PeptideEvidenceDocument = "";
for (int i = 0; i < atts.getLength(); i++) {
PeptideEvidenceDocument += atts.getQName(i) + "=" + atts.getValue(i) + ";";
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("DBSequence")) {
System.out.println(qName +"=>"+DBSequenceSection);
isDBsequence = false;
} else if (qName.equals("PeptideEvidence")) {
System.out.println(qName +"=>"+PeptideEvidenceDocument);
}
}
}
Is there any easy way of doing this? because I have lots of tags like this with nested nodes. Challenge here is <cvParam> appears not only in <DBSequence> tag, but in other tags like <Modification> etc. I tried with StAX too. but couldn't make it.
Here is a working example of using StAX. StAX excels when parsing known XML structures, but can be used for dynamic parsing too.
This code relies on knowledge, e.g. knowing that we want the content of DBSequence, Peptide, and PeptideEvidence, and that PeptideSequence has text content, while the others don't.
The methods use recursion to follow the structure of the XML.
public static void main(String[] args) throws Exception {
String xml = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n" +
"<MzIdentML id=\"MS-GF+\">\n" +
" <SequenceCollection xmlns=\"http://psidev.info/psi/pi/mzIdentML/1.1\">\n" +
" <DBSequence length=\"146\" id=\"DBSeq143\">\n" +
" <cvParam cvRef=\"PSI-MS\" accession=\"MS:1001088\"></cvParam>\n" +
" </DBSequence>\n" +
" <Peptide id=\"Pep7\">\n" +
" <PeptideSequence>MFLSFPTTK</PeptideSequence>\n" +
" <Modification location=\"1\" monoisotopicMassDelta=\"15.994915\">\n" +
" <cvParam cvRef=\"UNIMOD\" accession=\"UNIMOD:35\" name=\"Oxidation\"></cvParam>\n" +
" </Modification>\n" +
" </Peptide>\n" +
" <PeptideEvidence dBSequence_ref=\"DBSeq143\" id=\"PepEv_160_1_18\"></PeptideEvidence>\n" +
" <PeptideEvidence dBSequence_ref=\"DBSeq143\" id=\"PepEv_275_8_133\"></PeptideEvidence>\n" +
" </SequenceCollection>\n" +
"</MzIdentML>";
XMLStreamReader reader = XMLInputFactory.newFactory().createXMLStreamReader(new StringReader(xml));
try {
reader.nextTag();
search(reader);
} finally {
reader.close();
}
}
private static void search(XMLStreamReader reader) throws XMLStreamException {
// reader must be on START_ELEMENT upon entry, and will be on matching END_ELEMENT on return
assert reader.getEventType() == XMLStreamConstants.START_ELEMENT;
while (reader.nextTag() == XMLStreamConstants.START_ELEMENT) {
String name = reader.getLocalName();
switch (name) {
case "DBSequence":
case "Peptide":
case "PeptideEvidence": {
Map<String, String> props = new LinkedHashMap<>();
collectProps(reader, props);
System.out.println(name + ": " + props);
break; }
default:
search(reader);
}
}
}
private static void collectProps(XMLStreamReader reader, Map<String, String> props) throws XMLStreamException {
// reader must be on START_ELEMENT upon entry, and will be on matching END_ELEMENT on return
assert reader.getEventType() == XMLStreamConstants.START_ELEMENT;
for (int i = 0; i < reader.getAttributeCount(); i++)
props.put(reader.getAttributeLocalName(i), reader.getAttributeValue(i));
String name = reader.getLocalName();
switch (name) {
case "PeptideSequence":
props.put(name, reader.getElementText());
break;
default:
while (reader.nextTag() == XMLStreamConstants.START_ELEMENT)
collectProps(reader, props);
}
}
OUTPUT
DBSequence: {length=146, id=DBSeq143, cvRef=PSI-MS, accession=MS:1001088}
Peptide: {id=Pep7, PeptideSequence=MFLSFPTTK, location=1, monoisotopicMassDelta=15.994915, cvRef=UNIMOD, accession=UNIMOD:35, name=Oxidation}
PeptideEvidence: {dBSequence_ref=DBSeq143, id=PepEv_160_1_18}
PeptideEvidence: {dBSequence_ref=DBSeq143, id=PepEv_275_8_133}
I want to validate an XML file while it's being parsed. Stand-alone validation with Validator.validate() says it's OK, no exception is thrown while parsing, but the [overridden] error() method from the parsing handler gets called. Why? Is there some state I need to initialize? If I deliberately make the XML file incorrect, stand-alone validation will fail and if I comment out validation, parsing also fails -- both with SAXParseException, and with the error I expected.
[code below edited down from actual stuff with lots of println's to illustrate the issue]
short.xsd:
<?xml version="1.0" encoding="UTF-8" ?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"
targetNamespace="http://www.myapp.com/sample"
xmlns:smp="http://www.myapp.com/sample"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
<xs:element name="Sample" type="xs:string"/>
</xs:schema>
short.xml:
<?xml version="1.0"?>
<Sample
xmlns="http://www.myapp.com/sample"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.myapp.com/sample shortxsd.xsd"
>
hello
</Sample>
Source:
import javax.xml.XMLConstants;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.helpers.DefaultHandler;
public static void main(String... args)
throws IOException
{
String xsdFileName = "shortxsd.xsd";
URL xsdURL = Thread.currentThread().getContextClassLoader().getResource(xsdFileName);
String xsdPath = xsdURL.getPath();
System.out.println("xsdFileName: " + xsdFileName);
System.out.println("xsdURL: " + xsdURL);
System.out.println("xsdPath: " + xsdPath);
String xmlFileName = "shortxml.xml";
URL xmlURL = Thread.currentThread().getContextClassLoader().getResource(xmlFileName);
String xmlPath = xmlURL.getPath();
System.out.println("xmlFileName: " + xmlFileName);
System.out.println("xmlURL: " + xmlURL);
System.out.println("xmlPath: " + xmlPath);
/* Schema creation: */
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = null;
try {
schema = schemaFactory.newSchema(new File(xsdURL.getFile()));
}
catch (SAXException ex) {
System.out.println("Schema creation exception: " + ex);
return;
}
System.out.println("+ Schema creation OK");
/* Stand-alone Validation: */
Validator validator = schema.newValidator();
Source xmlFile = new StreamSource(xmlURL.openStream());
try {
validator.validate(xmlFile);
}
catch (SAXException ex) {
System.out.println("Validation exception: " + ex);
return;
}
System.out.println("+ Stand-alone Validation OK");
/* Parsing with validation: */
SAXParserFactory parserFactory = SAXParserFactory.newInstance();
parserFactory.setSchema(schema);
SAXParser saxParser = null;
try {
saxParser = parserFactory.newSAXParser();
}
catch (ParserConfigurationException |
SAXException ex) {
System.out.println("Parser creation exception: " + ex);
return;
}
System.out.println("+ Parser creation OK");
try (InputStream xmlInput = xmlURL.openStream()) {
saxParser.parse(xmlInput, new DefaultHandler()
{
#Override
public void error(SAXParseException e)
{
System.out.println("# Error: " + e);
}
});
}
catch (SAXException ex) {
System.out.println("Parsing exception: " + ex);
return;
}
System.out.println("+ Parsing OK");
}
Output:
xsdFileName: shortxsd.xsd
xsdURL: file:/.../target/classes/shortxsd.xsd
xsdPath: /.../target/classes/shortxsd.xsd
xmlFileName: shortxml.xml
xmlURL: file:/.../target/classes/shortxml.xml
xmlPath: /.../target/classes/shortxml.xml
+ Schema creation OK
+ Stand-alone Validation OK
+ Parser creation OK
# Error: org.xml.sax.SAXParseException; lineNumber: 6; columnNumber: 2; cvc-elt.1: Cannot find the declaration of element 'Sample'.
+ Parsing OK
You need to configure your SAX parser to be namespace-aware.
If you remove the xmlns="..." attribute from your XML document, and the xmlns:smp="..." and targetNamespace="..." attributes from your XML schema, your code does not produce any error messages. So the problem has to be something to do with namespaces.
To make the SAX parser namespace-aware, set an option in the factory:
/* Parsing with validation: */
SAXParserFactory parserFactory = SAXParserFactory.newInstance();
parserFactory.setNamespaceAware(true); // add this line
parserFactory.setSchema(schema);
I made this change to your code and it printed out four OK messages at the end.
I have an xml file with similar tags ->
<properties>
<definition>
<name>IP</name>
<description></description>
<defaultValue>10.1.1.1</defaultValue>
</definition>
<definition>
<name>Name</name>
<description></description>
<defaultValue>MyName</defaultValue>
</definition>
<definition>
<name>Environment</name>
<description></description>
<defaultValue>Production</defaultValue>
</definition>
</properties>
I want to update the default value of the definition with name : Environment.
Is it possible to do that using SAX parser?
Can you please point me to proper documentation?
So far I have parsed the document but when I update defaultValue, it updates all defaultValues. I dont know how to parse the exact default value tag.
Anything is possible with SAX, it's just waaaaay harder than it has to be. It's pretty old school and there are many easier ways to do this (JAXB, XQuery, XPath, DOM etc ).
That said lets do it with SAX.
It sounds like the problem you are having is that you are not tracking the state of your progress through the document. SAX simply works by making the callbacks when it stumbles across an event within the document
This is a fairly crude way of parsing the doc and updating the relevant node using SAX. Basically I am checking when we hit a element with the value you want to update (Environment) and setting a flag so that when we get to the contents of the defaultValue node, the characters callback lets me remove the existing value and replace it with the new value.
import java.io.StringReader;
import java.util.Arrays;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class Q26897496 extends DefaultHandler {
public static String xmlDoc = "<?xml version='1.0'?>"
+ "<properties>"
+ " <definition>"
+ " <name>IP</name>"
+ " <description></description>"
+ " <defaultValue>10.1.1.1</defaultValue>"
+ " </definition>"
+ " <definition>"
+ " <name>Name</name>"
+ " <description></description>"
+ " <defaultValue>MyName</defaultValue>"
+ " </definition>"
+ " <definition>"
+ " <name>Environment</name>"
+ " <description></description>"
+ " <defaultValue>Production</defaultValue>"
+ " </definition>"
+ "</properties>";
String elementName;
boolean mark = false;
char[] updatedDoc;
public static void main(String[] args) {
Q26897496 q = new Q26897496();
try {
q.parse();
} catch (Exception e) {
e.printStackTrace();
}
}
public Q26897496() {
}
public void parse() throws Exception {
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setNamespaceAware(true);
SAXParser saxParser = spf.newSAXParser();
XMLReader xml = saxParser.getXMLReader();
xml.setContentHandler(this);
xml.parse(new InputSource(new StringReader(xmlDoc)));
System.out.println("new xml: \n" + new String(updatedDoc));
}
#Override
public void startDocument() throws SAXException {
System.out.println("starting");
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
this.elementName = localName;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException {
String value = new String(ch).substring(start, start + length);
if (elementName.equals("name")) {
if (value.equals("Environment")) {
this.mark = true;
}
}
if (elementName.equals("defaultValue") && mark == true) {
// update
String tmpDoc = new String(ch);
String leading = tmpDoc.substring(0, start);
String trailing = tmpDoc.substring(start + length, tmpDoc.length());
this.updatedDoc = (leading + "NewValueForDefaulValue" + trailing).toCharArray();
mark = false;
}
}
}
Why SAXParseException returns null for getSystemId()? What is System Identifier?
import java.io.StringReader;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
public class MainClass {
static public void main(String[] arg) throws Exception{
boolean validate = false;
SAXParserFactory spf = SAXParserFactory.newInstance();
spf.setValidating(validate);
XMLReader reader = null;
SAXParser parser = spf.newSAXParser();
reader = parser.getXMLReader();
reader.setErrorHandler(new MyErrorHandler());
reader.parse(new InputSource(new StringReader(xmlString)));
}
static String xmlString = "<PHONEBOOK>" +
" <PERSON>" +
" <NAME>Joe Wang</NAME>" +
" <EMAIL>joe#yourserver.com</EMAIL>" +
" <TELEPHONE>202-999-9999</TELEPHONE>" +
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON> " +
"<NAME>Karol</NAE>" + // error here
" <EMAIL>karol#yourserver.com</EMAIL>" +
" <TELEPHONE>306-999-9999</TELEPHONE>" +
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" <PERSON>" +
" <NAME>Green</NAME>" +
" <EMAIL>green#yourserver.com</EMAIL>" +
" <TELEPHONE>202-414-9999</TELEPHONE>" +
" <WEB>www.java2s.com</WEB>" +
" </PERSON>" +
" </PHONEBOOK>";
}
class MyErrorHandler implements ErrorHandler {
public void warning(SAXParseException e) throws SAXException {
show("Warning", e);
throw (e);
}
public void error(SAXParseException e) throws SAXException {
show("Error", e);
throw (e);
}
public void fatalError(SAXParseException e) throws SAXException {
show("Fatal Error", e);
throw (e);
}
private void show(String type, SAXParseException e) {
System.out.println(type + ": " + e.getMessage());
System.out.println("Line " + e.getLineNumber() + " Column "
+ e.getColumnNumber());
System.out.println("System ID: " + e.getSystemId());
System.out.println(e);
}
}
The 'system identifier' in XML is the physical location you got something from. When you just parse a string in memory, it has no system identifier at all unless you make an extra call to give it one.
You can, in this case, call InputSource.setSystemId.
The System Identifier is a URI you can specify, it's there so it can be used by the EntityResolver to decide how relative paths get resolved during xml parsing. Whether it is a physical location or just a label is up to you. Of course, in your example you don't have anything to resolve so it's not needed.
I'm trying to validade a XML against a W3C XML Schema.
The following code does the job and reports when error occurs. But I'm unable to get line number of the error. It always returns -1.
Is there a easy way to get the line number?
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
import org.xml.sax.SAXParseException;
public class XMLValidation {
public static void main(String[] args) {
try {
DocumentBuilder parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = parser.parse(new File("myxml.xml"));
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Source schemaFile = new StreamSource(new File("myschema.xsd"));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (SAXParseException e) {
System.out.println(e.getLineNumber());
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
}
}
I found this
http://www.herongyang.com/XML-Schema/Xerces2-XSD-Validation-with-XMLReader.html
that appears to provide the following details(to include line numbers)
Error:
Public ID: null
System ID: file:///D:/herong/dictionary_invalid_xsd.xml
Line number: 7
Column number: 22
Message: cvc-datatype-valid.1.2.1: 'yes' is not a valid 'boolean'
value.
using this code:
/**
* XMLReaderValidator.java
* Copyright (c) 2002 by Dr. Herong Yang. All rights reserved.
*/
import java.io.IOException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
class XMLReaderValidator {
public static void main(String[] args) {
String parserClass = "org.apache.xerces.parsers.SAXParser";
String validationFeature
= "http://xml.org/sax/features/validation";
String schemaFeature
= "http://apache.org/xml/features/validation/schema";
try {
String x = args[0];
XMLReader r = XMLReaderFactory.createXMLReader(parserClass);
r.setFeature(validationFeature,true);
r.setFeature(schemaFeature,true);
r.setErrorHandler(new MyErrorHandler());
r.parse(x);
} catch (SAXException e) {
System.out.println(e.toString());
} catch (IOException e) {
System.out.println(e.toString());
}
}
private static class MyErrorHandler extends DefaultHandler {
public void warning(SAXParseException e) throws SAXException {
System.out.println("Warning: ");
printInfo(e);
}
public void error(SAXParseException e) throws SAXException {
System.out.println("Error: ");
printInfo(e);
}
public void fatalError(SAXParseException e) throws SAXException {
System.out.println("Fattal error: ");
printInfo(e);
}
private void printInfo(SAXParseException e) {
System.out.println(" Public ID: "+e.getPublicId());
System.out.println(" System ID: "+e.getSystemId());
System.out.println(" Line number: "+e.getLineNumber());
System.out.println(" Column number: "+e.getColumnNumber());
System.out.println(" Message: "+e.getMessage());
}
}
}
Replace this line:
validator.validate(new DOMSource(document));
by
validator.validate(new StreamSource(new File("myxml.xml")));
will let the SAXParseException contain line number & column number
Try using a SAXLocator
http://download.oracle.com/javase/1.5.0/docs/api/org/xml/sax/Locator.html
Parsers are not required to supply one, but if they do it should report line numbers
I think your code should include:
// this will be called when XML-parser starts reading
// XML-data; here we save reference to current position in XML:
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
(see http://www.java-tips.org/java-se-tips/org.xml.sax/using-xml-locator-to-indicate-current-parser-pos.html)
The parser will give you a locator which you can then use to get the line number. It's probably worth printing/debugging when this happens to see if you have a valid locator
Assuming the final objective is to have a validated DOM instance, the previous answers would require XML documents to be read twice — first for validation, and then again to build the object tree. That's fine if the document is given as a file path, but it would require some sort of workaround if it were provided as an input stream, which in principle can only be read once.
A more efficient alternative is to use a validating parser to check the XML document against the schema as the object tree is built. See the code below for how to setup a schema-validating DOM parser:
import java.io.File;
import java.io.FileInputStream;
import java.io.InputStream;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.w3c.dom.Document;
import org.xml.sax.ErrorHandler;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XML {
public static Document load(String xml, String xsd) {
// The default error handler just prints errors to the standard error output. In
// order to make the parser interrupt its work once a validation error is found,
// we need to use a custom handler that throws an exception in response to any
// reported issues.
ErrorHandler errorHandler = new ErrorHandler() {
#Override
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
throw exception;
}
#Override
public void warning(SAXParseException exception) throws SAXException {
throw exception;
}
};
try {
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = factory.newSchema(new File(xsd));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
builderFactory.setNamespaceAware(true);
builderFactory.setSchema(schema);
DocumentBuilder builder = builderFactory.newDocumentBuilder();
builder.setErrorHandler(errorHandler);
InputStream input = new FileInputStream(xml);
Document document = builder.parse(input);
return document;
}
catch (SAXParseException e) {
int row = e.getLineNumber();
int col = e.getColumnNumber();
String message = e.getMessage();
System.out.println("Validation error at line " + row + ", column " + col + ": \"" + message + '"');
}
catch (Exception e) {
e.printStackTrace();
}
return null;
}
public static void main(String[] args) {
String xml = args[0];
String xsd = args[1];
Document document = load(xml, xsd);
boolean valid = (document != null);
System.out.println("Document \"" + xml + "\" is " + (valid ? "" : "not ") + "valid against schema \"" + xsd + '"');
}
}