Java XML validation against XSD Schema

Java XML validation against XSD Schema - java

private void validateXML(DOMSource source) throws Exception {
URL schemaFile = new URL("http://www.csc.liv.ac.uk/~valli/modules.xsd");
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
DOMResult result = new DOMResult();
try {
validator.validate(source, result);
System.out.println("is valid");
} catch (SAXException e) {
System.out.println("not valid because " + e.getLocalizedMessage());
}
}
But this returns an error saying:
Exception in thread "main" java.lang.IllegalArgumentException: No SchemaFactory that implements the schema language specified by: http://www.w3.org/2001/XMLSchema -instance could be loaded
Is this a problem with my code or with the actual xsd file?

That error means that your installed Java doesn't have any classes that can parse XMLSchema files, so it's not a problem with the schema or your code.
I'm pretty sure recent JREs have the appropriate classes by default, so can you get us the output of java -version?
Update:
You're using the wrong XMLContants string. You want: XMLConstants.W3C_XML_SCHEMA_NS_URI

Those files are based on the underlying system. I had the same issue when I was programming a project for Android. I found that I had to use Xerces-for-Android to solve my problem.
The following worked for me for validation on Android, if your code relates to Android perhaps it will help, if it doesn't then perhaps the approach will help you with your underlying system:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.

Related

Java using Saxon (s9api) to transform XML: How to add input files in resources?

I use saxon (HE 9.9.1-6) to transform a XML to a HTML file. Saxon is used because the XSLT is version 2 and the default java classes failed.
The XSLT contains two statements to copy in the content of other files:
<xsl:value-of select="unparsed-text('file.ext')"/>
This works fine as long as the Xslt and those files are in the same directory and the xslt is given as a file source
Source xslt = new StreamSource(new File("c:/somedir/file.xsl"));
But my xslt is inside a resource directory (later on it's supposed to be packed in a jar). If I use it in that context, saxon fails to find the included files becuase it looks in the root directory of my project:
Source xslt = new StreamSource(getClass().getClassLoader().getResourceAsStream("file.xsl"));
results in:
Error evaluating (fn:unparsed-text(...)) in xsl:value-of/#select on line 22 column 66
FOUT1170: Failed to read input file: <project root directory>\included_file.css (File not found)
Is there any way that I could supply saxon with additional StreamSources for the files it needs to include? I was unable to find anything.
Ideally, I'd like something like this:
transformer.addInput(new StreamSource(getClass().getClassLoader().getResourceAsStream("inputfile.css")));
The only solution I've came up with was pretty ugly: Copy the xslt and the files it needs from the resources to a temporary directory and then do the conversion using that as a source.
Example code
I'm not knowledgeable in writing XSLT so I can only offer non-minimal example files.
The xslt and its two reuqired files (css and js) can be found here. The ones you need are the three "xrechnung" ones. Direct links: xrechnung-html.xsl, xrechnung-viewer.css, xrechnung-viewer.js.
Please put them in a resource directory (just in case, in eclipse: make a resources-folder and add it as a source directory in properties->build path).
The xml was generated by the first step of above project using it's own example files, I put it on pastebin here
(originally included directly but got character limit error)
Finally, the Java-Code including the ugly workaround:
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.Comparator;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import org.xml.sax.SAXException;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Serializer;
import net.sf.saxon.s9api.Xslt30Transformer;
import net.sf.saxon.s9api.XsltCompiler;
import net.sf.saxon.s9api.XsltExecutable;
public class SaxonProblem {
public static void main(String[] args) throws IOException, SaxonApiException, SAXException {
Path xml = Paths.get("path/to/the.xml");
//working(xml);
notWorking(xml);
}
public static void working(Path xmlFile) throws IOException, SaxonApiException, SAXException {
Path dir = Files.createTempDirectory("saxon");
System.out.println("Temp dir: " + dir.toString());
Path xsltFile = dir.resolve("xrechnung-html.xsl");
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-html.xsl"),
xsltFile, StandardCopyOption.REPLACE_EXISTING);
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-viewer.css"),
dir.resolve("xrechnung-viewer.css"), StandardCopyOption.REPLACE_EXISTING);
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-viewer.js"),
dir.resolve("xrechnung-viewer.js"), StandardCopyOption.REPLACE_EXISTING);
// for the sake of brevity, the html is made where the xml was
Path html = xmlFile.resolveSibling(xmlFile.getFileName().toString() + ".html");
Source xslt = new StreamSource(xsltFile.toFile());
Source xml = new StreamSource(xmlFile.toFile());
transformXml(xml, xslt, html);
// cleanup
Files.walk(dir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
}
public static void notWorking(Path xmlFile) throws SaxonApiException, SAXException, IOException {
// for the sake of brevity, the html is made where the xml was
Path html = xmlFile.resolveSibling(xmlFile.getFileName().toString() + ".html");
Source xslt = new StreamSource(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-html.xsl"));
Source xml = new StreamSource(xmlFile.toFile());
transformXml(xml, xslt, html);
}
public static void transformXml(Source xml, Source xslt, Path output) throws SaxonApiException, SAXException, IOException {
Processor processor = new Processor(false);
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable stylesheet = compiler.compile(xslt);
Serializer out = processor.newSerializer(output.toFile());
out.setOutputProperty(Serializer.Property.METHOD, "html");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
Xslt30Transformer transformer = stylesheet.load30();
transformer.transform(xml, out);
}
}
Solution
Thanks to the comment by Martin Honnen and the answer by Michael Kay, I have a solution using UnparsedTextURIResolver. Does feel more like a hack, but it works and is better than my previous workaround:
Processor processor = new Processor(false);
UnparsedTextURIResolver defaultUtur = processor.getUnderlyingConfiguration().getUnparsedTextURIResolver();
processor.getUnderlyingConfiguration().setUnparsedTextURIResolver(new UnparsedTextURIResolver() {
#Override
public Reader resolve(URI arg0, String arg1, Configuration arg2) throws XPathException {
if (arg0.toString().endsWith("myfilename.css")) {
InputStream css = SaxonProblem.class.getClassLoader().getResourceAsStream("myfilename.css");
return new InputStreamReader(css);
}
return defaultUtur.resolve(arg0, arg1, arg2);
}
});
//[...]

Some suggestions:
Use a URI with the classpath: scheme (fairly recent addition and may not be supported on all paths where URIs are used)
Register an UnparsedTextResolver with the configuration; Saxon will delegate the task of finding the resource to this resolver
Supply the name of the containing directory as a parameter to the stylesheet, and use the resolve-uri() function to get the absolute URI

parsing an XML file with DOM error [duplicate]

I have to parse a bunch of XML files in Java that sometimes -- and invalidly -- contain HTML entities such as —, > and so forth. I understand the correct way of dealing with this is to add suitable entity declarations to the XML file before parsing. However, I can't do that as I have no control over those XML files.
Is there some kind of callback I can override that is invoked whenever the Java XML parser encounters such an entity? I haven't been able to find one in the API.
I'd like to use:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( stream );
I found that I can override resolveEntity in org.xml.sax.helpers.DefaultHandler, but how do I use this with the higher-level API?
Here's a full example:
public class Main {
public static void main( String [] args ) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( new FileInputStream( "test.xml" ));
}
}
with test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Produces:
[Fatal Error] :3:20: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 20; The entity "nbsp" was referenced, but not declared.
Update: I have been poking around in the JDK source code with a debugger, and boy, what an amount of spaghetti. I have no idea what the design is there, or whether there is one. Just how many layers of an onion can one layer on top of each other?
They key class seems to be com.sun.org.apache.xerces.internal.impl.XMLEntityManager, but I cannot find any code that either lets me add stuff into it before it gets used, or that attempts to resolve entities without going through that class.

I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download
public static void main(String args[]){
String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}
Result:
<bar>
Some text — invalid!
</bar>
Loading from a file can be found here:
http://jsoup.org/cookbook/input/load-document-from-file

Issue - 1: I have to parse a bunch of XML files in Java that sometimes -- and
invalidly -- contain HTML entities such as —
XML has only five predefined entities. The —, is not among them. It works only when used in plain HTML or in legacy JSP. So, SAX will not help. It can be done using StaX which has high level iterator based API. (Collected from this link)
Issue - 2: I found that I can override resolveEntity in
org.xml.sax.helpers.DefaultHandler, but how do I use this with the
higher-level API?
Streaming API for XML, called StaX, is an API for reading and writing XML Documents.
StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The core StaX API falls into two categories and they are listed below. They are
Cursor based API: It is low-level API. cursor-based API allows the application to process XML as a stream of tokens aka events
Iterator based API: The higher-level iterator-based API allows the application to process XML as a series of event objects, each of which communicates a piece of the XML structure to the application.
STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace internal entity references with their
replacement text and report them as characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader.
However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to notreplace them.
You may try it. Hope it will solve your issue. For your case,
Main.java
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.EntityReference;
import javax.xml.stream.events.XMLEvent;
public class Main {
public static void main(String[] args) {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(
XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader;
try {
reader = inputFactory
.createXMLEventReader(new FileInputStream("F://test.xml"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
}
test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Output:
Entity Reference: nbsp
Entity Reference: mdash
Credit goes to #skaffman.
Related Link:
http://www.journaldev.com/1191/how-to-read-xml-file-in-java-using-java-stax-api
http://www.journaldev.com/1226/java-stax-cursor-based-api-read-xml-example
http://www.vogella.com/tutorials/JavaXML/article.html
Is there a Java XML API that can parse a document without resolving character entities?
UPDATE:
Issue - 3: Is there a way to use StaX to "filter" the entities (replacing them
with something else, for example) and still produce a Document at the
end of the process?
To create a new document using the StAX API, it is required to create an XMLStreamWriter that provides methods to produce XML opening and closing tags, attributes and character content.
There are 5 methods of XMLStreamWriter for document.
xmlsw.writeStartDocument(); - initialises an empty document to which
elements can be added
xmlsw.writeStartElement(String s) -creates a new element named s
xmlsw.writeAttribute(String name, String value)- adds the attribute
name with the corresponding value to the last element produced by a
call to writeStartElement. It is possible to add attributes as long
as no call to writeElementStart,writeCharacters or writeEndElement
has been done.
xmlsw.writeEndElement - close the last started element
xmlsw.writeCharacters(String s) - creates a new text node with
content s as content of the last started element.
A sample example is attached with it:
StAXExpand.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import java.util.Arrays;
public class StAXExpand {
static XMLStreamWriter xmlsw = null;
public static void main(String[] argv) {
try {
xmlsw = XMLOutputFactory.newInstance()
.createXMLStreamWriter(System.out);
CompactTokenizer tok = new CompactTokenizer(
new FileReader(argv[0]));
String rootName = "dummyRoot";
// ignore everything preceding the word before the first "["
while(!tok.nextToken().equals("[")){
rootName=tok.getToken();
}
// start creating new document
xmlsw.writeStartDocument();
ignorableSpacing(0);
xmlsw.writeStartElement(rootName);
expand(tok,3);
ignorableSpacing(0);
xmlsw.writeEndDocument();
xmlsw.flush();
xmlsw.close();
} catch (XMLStreamException e){
System.out.println(e.getMessage());
} catch (IOException ex) {
System.out.println("IOException"+ex);
ex.printStackTrace();
}
}
public static void expand(CompactTokenizer tok, int indent)
throws IOException,XMLStreamException {
tok.skip("[");
while(tok.getToken().equals("#")) {// add attributes
String attName = tok.nextToken();
tok.nextToken();
xmlsw.writeAttribute(attName,tok.skip("["));
tok.nextToken();
tok.skip("]");
}
boolean lastWasElement=true; // for controlling the output of newlines
while(!tok.getToken().equals("]")){ // process content
String s = tok.getToken().trim();
tok.nextToken();
if(tok.getToken().equals("[")){
if(lastWasElement)ignorableSpacing(indent);
xmlsw.writeStartElement(s);
expand(tok,indent+3);
lastWasElement=true;
} else {
xmlsw.writeCharacters(s);
lastWasElement=false;
}
}
tok.skip("]");
if(lastWasElement)ignorableSpacing(indent-3);
xmlsw.writeEndElement();
}
private static char[] blanks = "\n".toCharArray();
private static void ignorableSpacing(int nb)
throws XMLStreamException {
if(nb>blanks.length){// extend the length of space array
blanks = new char[nb+1];
blanks[0]='\n';
Arrays.fill(blanks,1,blanks.length,' ');
}
xmlsw.writeCharacters(blanks, 0, nb+1);
}
}
CompactTokenizer.java
import java.io.Reader;
import java.io.IOException;
import java.io.StreamTokenizer;
public class CompactTokenizer {
private StreamTokenizer st;
CompactTokenizer(Reader r){
st = new StreamTokenizer(r);
st.resetSyntax(); // remove parsing of numbers...
st.wordChars('\u0000','\u00FF'); // everything is part of a word
// except the following...
st.ordinaryChar('\n');
st.ordinaryChar('[');
st.ordinaryChar(']');
st.ordinaryChar('#');
}
public String nextToken() throws IOException{
st.nextToken();
while(st.ttype=='\n'||
(st.ttype==StreamTokenizer.TT_WORD &&
st.sval.trim().length()==0))
st.nextToken();
return getToken();
}
public String getToken(){
return (st.ttype == StreamTokenizer.TT_WORD) ? st.sval : (""+(char)st.ttype);
}
public String skip(String sym) throws IOException {
if(getToken().equals(sym))
return nextToken();
else
throw new IllegalArgumentException("skip: "+sym+" expected but"+
sym +" found ");
}
}
For more, you can follow the tutorial
https://docs.oracle.com/javase/tutorial/jaxp/stax/example.html
http://www.ibm.com/developerworks/library/x-tipstx2/index.html
http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/HTML/ch09s03.html
http://staf.sourceforge.net/current/STAXDoc.pdf

Another approach, since you're not using a rigid OXM approach anyway.
You might want to try using a less rigid parser such as JSoup?
This will stop immediate problems with invalid XML schemas etc, but it will just devolve the problem into your code.

Just to throw in a different approach to a solution:
You might envelope your input stream with a stream inplementation that replaces the entities by something legal.
While this is a hack for sure, it should be a quick and easy solution (or better say: workaround).
Not as elegant and clean as a xml framework internal solution, though.

I made yesterday something similar i need to add value from unziped XML in stream to database.
//import I'm not sure if all are necessary :)
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
//I didnt checked this code now because i'm in work for sure its work maybe
you will need to do little changes
InputSource is = new InputSource(new FileInputStream("test.xml"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String words= xpath.evaluate("/foo/bar", doc.getDocumentElement());
ParsingHexToChar.parseToChar(words);
// lib which i use common-lang3.jar
//metod to parse
public static String parseToChar( String words){
String decode= org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(words);
return decode;
}

Try this using org.apache.commons package :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
InputStream in = new FileInputStream(xmlfile);
String unescapeHtml4 = IOUtils.toString(in);
CharSequenceTranslator obj = new AggregateTranslator(new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE())
);
unescapeHtml4 = obj.translate(unescapeHtml4);
StringReader readerInput= new StringReader(unescapeHtml4);
InputSource is = new InputSource(readerInput);
Document doc = parser.parse(is);

Serialize DOM to FileOutputStream using Xerces

I am using this link to generate XML file using DOM. It says that "Xerces parser is bundled with the JDK 1.5 distribution.So you need not download the parser separately."
However, when I write the following line in my Eclipse Helios it gives compile-time error even though I have Java 1.6 in my system.
import org.apache.xml.serialize.XMLSerializer;
Why is it so?

Xerces is indeed bundled with the JDK but you should use it with the JAXP API under javax.xml.parsers. Check the output of the program below.
Also, to serialize an XML Document, you should use DOM Level 3 Load and Save (present in the JDK) or an XSLT transformation with no stylesheet (identity transformation). The rest is dependent on a specific implementation. The Xerces XMLSerializer is deprecated:
Deprecated. This class was deprecated in Xerces 2.9.0. It is recommended that new applications use the DOM Level 3 LSSerializer or JAXP's Transformation API for XML (TrAX) for serializing XML. See the Xerces documentation for more information.
Here is an example of serialization with DOM level 3:
import org.w3c.dom.*;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.*;
public class DOMExample3 {
public static void main(String[] args) throws Exception {
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
if (impl == null) {
System.out.println("No DOMImplementation found !");
System.exit(0);
}
System.out.printf("DOMImplementationLS: %s\n", impl.getClass().getName());
LSParser parser = impl.createLSParser(
DOMImplementationLS.MODE_SYNCHRONOUS,
"http://www.w3.org/TR/REC-xml");
// http://www.w3.org/2001/XMLSchema
System.out.printf("LSParser: %s\n", parser.getClass().getName());
if (args.length == 0) {
System.exit(0);
}
Document doc = parser.parseURI(args[0]);
LSSerializer serializer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setEncoding("UTF-8");
output.setByteStream(System.out);
serializer.write(doc, output);
System.out.println();
}
}
Here is an example with an identity transformation:
import org.w3c.dom.Document;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class DOMExample2 {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
System.out.println("Parsing XML document...");
Document doc;
doc = parser.parse(args[0]);
// Xerces Java 2
/* Deprecated. This class was deprecated in Xerces 2.9.0.
* It is recommended that new applications use the DOM Level 3
* LSSerializer or JAXP's Transformation API for XML (TrAX)
* for serializing XML and HTML.
* See the Xerces documentation for more information.
*/
/*
System.out.println("XERCES: Displaying XML document...");
OutputFormat of = new OutputFormat(doc, "ISO-8859-1", true);
PrintWriter pw = new PrintWriter(System.out);
BaseMarkupSerializer bms = new XMLSerializer(pw, of);
bms.serialize(doc);
*/
// JAXP
System.out.println("JAXP: Displaying XML document...");
TransformerFactory transFactory = TransformerFactory.newInstance();
System.out.println(transFactory.getClass().getName());
//transFactory.setAttribute("indent-number", 2);
Transformer idTransform = transFactory.newTransformer();
idTransform.setOutputProperty(OutputKeys.METHOD, "xml");
idTransform.setOutputProperty(OutputKeys.INDENT,"yes");
// Apache default indentation is 0
idTransform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
Source input = new DOMSource(doc);
Result output = new StreamResult(System.out);
idTransform.transform(input, output);
}
}

It will be in, IIRC, com.sun.org.apache.xml.serialize.XMLSerializer. However, those are private classes and likely to change at any time. I suggest using the standard public APIs (javax.* and friends) instead. (Use the transform API without any XSLT.)

Validating an xml file using RELAX NG Schema in Java (IDE - Eclipse)

I have been trying to validate an xml file name bookNew.xml against an .rnc file named bookNewRelax.rnc.
The error that I constantly face is --
Exception in thread "main" java.lang.IllegalArgumentException: No SchemaFactory that implements the schema language specified by: http://relaxng.org/ns/structure/1.0 could be loaded
at javax.xml.validation.SchemaFactory.newInstance(Unknown Source)
at testRelax.main(testRelax.java:38)
In order to prevent this, I used a line of code before instantiating an object of the SchemaFactory class, which I believed would help solve this issue. the ptece of code is as under:-
System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.CompactSyntaxSchemaFactory");
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
I have included the external jar - jing.jar in my project and still, the same exception is being thrown.
I have also imported the library com.thaiopensource.*; and it is underlined in yellow showing that it is never used at all. Personally, I think, it is the jar file playing spoilsport here, else why would the thaiopensource library be never come into use.
I am pasting the java file underneath.
import java.io.*;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.validation.*;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.thaiopensource.*;
public class testRelax {
/** Get CPU time in nanoseconds. */
public static long getCpuTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadCpuTime( ) : 0L;
}
/** Get user time in nanoseconds. */
public static long getUserTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadUserTime( ) : 0L;
}
public static void main(String args[]) throws SAXException, IOException, ParserConfigurationException {
System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.CompactSyntaxSchemaFactory");
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
File schemaLocation = new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNewRelax.rnc");
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
File file=new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNew.xml");
try{
long startTime = System.currentTimeMillis();
System.out.println("Milli"+startTime);
long startUserTimeNano = getUserTime( );
System.out.println("Nano"+startUserTimeNano);
long startCPUTimeNano = getCpuTime( );
System.out.println("Nano"+startCPUTimeNano);
Document doc = builder.parse(new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNew.xml"));
DOMSource source = new DOMSource(doc);
validator.validate(source);
long stopTime = System.currentTimeMillis();
System.out.println("MilliStop"+stopTime);
long elapsedTime = stopTime - startTime;
System.out.println("Elapsed time"+elapsedTime);
//System.out.println("getUserTime--->"+getUserTime());
//System.out.println("getCpuTime--->"+getCpuTime());
//System.out.println("startUserTimeNano--->"+startUserTimeNano);
//System.out.println("startCPUTimeNano--->"+startCPUTimeNano);
long taskUserTimeNano = getUserTime( ) - startUserTimeNano;
System.out.println("User"+taskUserTimeNano);
long taskCpuTimeNano = getCpuTime( ) - startCPUTimeNano;
System.out.println("CPU"+taskCpuTimeNano);
System.out.println(file + " The document is valid");
}
catch(SAXException ex)
{
System.out.println("the document is not valid because--");
System.out.println(ex.getMessage());
}
}
}
Kindly advise me how to make my java program "accept" the RELAX NG Compact Schema (or else simply .rng will also do) so that the proper validation may be done. Thanks in anticipation.

Java implementations are not required to implement RELAX NG validation via SchemaFactory. So even if it works in one environment, it is not portable. From your error message, it appears your particular Java implementation doesn't support it.
Since you have the Jing libraries, you can validate using them - see the documentation here to get started.

I had the same problem and it turned out that I was missing jing-20091111.jar from the classpath.
I've been using some class loader mechanisms, so all the jing classes were available if I used them in my code. The problem was that SchemaFactory didn't know about my classloaders, so I had to put the jar directly in the classpath.
So I think alexbrn's response about particular Java implementations' support is wrong. When System.setProperty() is used to provide implementation for RELAX NG, it should work in every JVM.

XML Schema Validation in Android

I have created an XML and I want to validate with schema i.e,
XSD file but there are no direct classes provided by android for the
same if I am not wrong ......... and there is an external jar named
jaxp1.3 which doesn't allow me to compile the code is it because the
bytecode of desktop and android are different? Which has the classes
schema factory and validator which does the validation stuff ...... Is
there an other option available. Any help would be appreciated .....
desperately searching for the ans..........

It's a known issue posted by Google here
The solution is to use Apache Xerces ported to Android.
There is a project here
You have to do a svn chekout and export the proyect to a jar file to use as a library in your android proyect.
The code to instance SchemaFactory change a little.
I show you an example:
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
SchemaFactory factory = new XMLSchemaFactory();
Schema esquema = factory.newSchema(".../file.xsd");

#iOSDev, I had to use Xerces-for-Android for my validation. The following is a summary of what I did to get it working with my program:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java XML validation against XSD Schema - java

Related

Java using Saxon (s9api) to transform XML: How to add input files in resources?

parsing an XML file with DOM error [duplicate]

Serialize DOM to FileOutputStream using Xerces

Validating an xml file using RELAX NG Schema in Java (IDE - Eclipse)

XML Schema Validation in Android

Categories

Resources