I have been trying to validate an xml file name bookNew.xml against an .rnc file named bookNewRelax.rnc.
The error that I constantly face is --
Exception in thread "main" java.lang.IllegalArgumentException: No SchemaFactory that implements the schema language specified by: http://relaxng.org/ns/structure/1.0 could be loaded
at javax.xml.validation.SchemaFactory.newInstance(Unknown Source)
at testRelax.main(testRelax.java:38)
In order to prevent this, I used a line of code before instantiating an object of the SchemaFactory class, which I believed would help solve this issue. the ptece of code is as under:-
System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.CompactSyntaxSchemaFactory");
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
I have included the external jar - jing.jar in my project and still, the same exception is being thrown.
I have also imported the library com.thaiopensource.*; and it is underlined in yellow showing that it is never used at all. Personally, I think, it is the jar file playing spoilsport here, else why would the thaiopensource library be never come into use.
I am pasting the java file underneath.
import java.io.*;
import java.lang.management.ManagementFactory;
import java.lang.management.ThreadMXBean;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.dom.DOMSource;
import javax.xml.validation.*;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.thaiopensource.*;
public class testRelax {
/** Get CPU time in nanoseconds. */
public static long getCpuTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadCpuTime( ) : 0L;
}
/** Get user time in nanoseconds. */
public static long getUserTime( ) {
ThreadMXBean bean = ManagementFactory.getThreadMXBean( );
return bean.isCurrentThreadCpuTimeSupported( ) ?
bean.getCurrentThreadUserTime( ) : 0L;
}
public static void main(String args[]) throws SAXException, IOException, ParserConfigurationException {
System.setProperty(SchemaFactory.class.getName() + ":" + XMLConstants.RELAXNG_NS_URI, "com.thaiopensource.relaxng.jaxp.CompactSyntaxSchemaFactory");
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.RELAXNG_NS_URI);
File schemaLocation = new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNewRelax.rnc");
Schema schema = factory.newSchema(schemaLocation);
Validator validator = schema.newValidator();
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
DocumentBuilder builder = domFactory.newDocumentBuilder();
File file=new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNew.xml");
try{
long startTime = System.currentTimeMillis();
System.out.println("Milli"+startTime);
long startUserTimeNano = getUserTime( );
System.out.println("Nano"+startUserTimeNano);
long startCPUTimeNano = getCpuTime( );
System.out.println("Nano"+startCPUTimeNano);
Document doc = builder.parse(new File("C:/Users/gp85943/workspace/TestBookRelax/src/bookNew.xml"));
DOMSource source = new DOMSource(doc);
validator.validate(source);
long stopTime = System.currentTimeMillis();
System.out.println("MilliStop"+stopTime);
long elapsedTime = stopTime - startTime;
System.out.println("Elapsed time"+elapsedTime);
//System.out.println("getUserTime--->"+getUserTime());
//System.out.println("getCpuTime--->"+getCpuTime());
//System.out.println("startUserTimeNano--->"+startUserTimeNano);
//System.out.println("startCPUTimeNano--->"+startCPUTimeNano);
long taskUserTimeNano = getUserTime( ) - startUserTimeNano;
System.out.println("User"+taskUserTimeNano);
long taskCpuTimeNano = getCpuTime( ) - startCPUTimeNano;
System.out.println("CPU"+taskCpuTimeNano);
System.out.println(file + " The document is valid");
}
catch(SAXException ex)
{
System.out.println("the document is not valid because--");
System.out.println(ex.getMessage());
}
}
}
Kindly advise me how to make my java program "accept" the RELAX NG Compact Schema (or else simply .rng will also do) so that the proper validation may be done. Thanks in anticipation.
Java implementations are not required to implement RELAX NG validation via SchemaFactory. So even if it works in one environment, it is not portable. From your error message, it appears your particular Java implementation doesn't support it.
Since you have the Jing libraries, you can validate using them - see the documentation here to get started.
I had the same problem and it turned out that I was missing jing-20091111.jar from the classpath.
I've been using some class loader mechanisms, so all the jing classes were available if I used them in my code. The problem was that SchemaFactory didn't know about my classloaders, so I had to put the jar directly in the classpath.
So I think alexbrn's response about particular Java implementations' support is wrong. When System.setProperty() is used to provide implementation for RELAX NG, it should work in every JVM.
Related
I have to parse a bunch of XML files in Java that sometimes -- and invalidly -- contain HTML entities such as —, > and so forth. I understand the correct way of dealing with this is to add suitable entity declarations to the XML file before parsing. However, I can't do that as I have no control over those XML files.
Is there some kind of callback I can override that is invoked whenever the Java XML parser encounters such an entity? I haven't been able to find one in the API.
I'd like to use:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( stream );
I found that I can override resolveEntity in org.xml.sax.helpers.DefaultHandler, but how do I use this with the higher-level API?
Here's a full example:
public class Main {
public static void main( String [] args ) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( new FileInputStream( "test.xml" ));
}
}
with test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Produces:
[Fatal Error] :3:20: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 20; The entity "nbsp" was referenced, but not declared.
Update: I have been poking around in the JDK source code with a debugger, and boy, what an amount of spaghetti. I have no idea what the design is there, or whether there is one. Just how many layers of an onion can one layer on top of each other?
They key class seems to be com.sun.org.apache.xerces.internal.impl.XMLEntityManager, but I cannot find any code that either lets me add stuff into it before it gets used, or that attempts to resolve entities without going through that class.
I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download
public static void main(String args[]){
String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}
Result:
<bar>
Some text — invalid!
</bar>
Loading from a file can be found here:
http://jsoup.org/cookbook/input/load-document-from-file
Issue - 1: I have to parse a bunch of XML files in Java that sometimes -- and
invalidly -- contain HTML entities such as —
XML has only five predefined entities. The —, is not among them. It works only when used in plain HTML or in legacy JSP. So, SAX will not help. It can be done using StaX which has high level iterator based API. (Collected from this link)
Issue - 2: I found that I can override resolveEntity in
org.xml.sax.helpers.DefaultHandler, but how do I use this with the
higher-level API?
Streaming API for XML, called StaX, is an API for reading and writing XML Documents.
StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The core StaX API falls into two categories and they are listed below. They are
Cursor based API: It is low-level API. cursor-based API allows the application to process XML as a stream of tokens aka events
Iterator based API: The higher-level iterator-based API allows the application to process XML as a series of event objects, each of which communicates a piece of the XML structure to the application.
STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace internal entity references with their
replacement text and report them as characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader.
However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to notreplace them.
You may try it. Hope it will solve your issue. For your case,
Main.java
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.EntityReference;
import javax.xml.stream.events.XMLEvent;
public class Main {
public static void main(String[] args) {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(
XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader;
try {
reader = inputFactory
.createXMLEventReader(new FileInputStream("F://test.xml"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
}
test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Output:
Entity Reference: nbsp
Entity Reference: mdash
Credit goes to #skaffman.
Related Link:
http://www.journaldev.com/1191/how-to-read-xml-file-in-java-using-java-stax-api
http://www.journaldev.com/1226/java-stax-cursor-based-api-read-xml-example
http://www.vogella.com/tutorials/JavaXML/article.html
Is there a Java XML API that can parse a document without resolving character entities?
UPDATE:
Issue - 3: Is there a way to use StaX to "filter" the entities (replacing them
with something else, for example) and still produce a Document at the
end of the process?
To create a new document using the StAX API, it is required to create an XMLStreamWriter that provides methods to produce XML opening and closing tags, attributes and character content.
There are 5 methods of XMLStreamWriter for document.
xmlsw.writeStartDocument(); - initialises an empty document to which
elements can be added
xmlsw.writeStartElement(String s) -creates a new element named s
xmlsw.writeAttribute(String name, String value)- adds the attribute
name with the corresponding value to the last element produced by a
call to writeStartElement. It is possible to add attributes as long
as no call to writeElementStart,writeCharacters or writeEndElement
has been done.
xmlsw.writeEndElement - close the last started element
xmlsw.writeCharacters(String s) - creates a new text node with
content s as content of the last started element.
A sample example is attached with it:
StAXExpand.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import java.util.Arrays;
public class StAXExpand {
static XMLStreamWriter xmlsw = null;
public static void main(String[] argv) {
try {
xmlsw = XMLOutputFactory.newInstance()
.createXMLStreamWriter(System.out);
CompactTokenizer tok = new CompactTokenizer(
new FileReader(argv[0]));
String rootName = "dummyRoot";
// ignore everything preceding the word before the first "["
while(!tok.nextToken().equals("[")){
rootName=tok.getToken();
}
// start creating new document
xmlsw.writeStartDocument();
ignorableSpacing(0);
xmlsw.writeStartElement(rootName);
expand(tok,3);
ignorableSpacing(0);
xmlsw.writeEndDocument();
xmlsw.flush();
xmlsw.close();
} catch (XMLStreamException e){
System.out.println(e.getMessage());
} catch (IOException ex) {
System.out.println("IOException"+ex);
ex.printStackTrace();
}
}
public static void expand(CompactTokenizer tok, int indent)
throws IOException,XMLStreamException {
tok.skip("[");
while(tok.getToken().equals("#")) {// add attributes
String attName = tok.nextToken();
tok.nextToken();
xmlsw.writeAttribute(attName,tok.skip("["));
tok.nextToken();
tok.skip("]");
}
boolean lastWasElement=true; // for controlling the output of newlines
while(!tok.getToken().equals("]")){ // process content
String s = tok.getToken().trim();
tok.nextToken();
if(tok.getToken().equals("[")){
if(lastWasElement)ignorableSpacing(indent);
xmlsw.writeStartElement(s);
expand(tok,indent+3);
lastWasElement=true;
} else {
xmlsw.writeCharacters(s);
lastWasElement=false;
}
}
tok.skip("]");
if(lastWasElement)ignorableSpacing(indent-3);
xmlsw.writeEndElement();
}
private static char[] blanks = "\n".toCharArray();
private static void ignorableSpacing(int nb)
throws XMLStreamException {
if(nb>blanks.length){// extend the length of space array
blanks = new char[nb+1];
blanks[0]='\n';
Arrays.fill(blanks,1,blanks.length,' ');
}
xmlsw.writeCharacters(blanks, 0, nb+1);
}
}
CompactTokenizer.java
import java.io.Reader;
import java.io.IOException;
import java.io.StreamTokenizer;
public class CompactTokenizer {
private StreamTokenizer st;
CompactTokenizer(Reader r){
st = new StreamTokenizer(r);
st.resetSyntax(); // remove parsing of numbers...
st.wordChars('\u0000','\u00FF'); // everything is part of a word
// except the following...
st.ordinaryChar('\n');
st.ordinaryChar('[');
st.ordinaryChar(']');
st.ordinaryChar('#');
}
public String nextToken() throws IOException{
st.nextToken();
while(st.ttype=='\n'||
(st.ttype==StreamTokenizer.TT_WORD &&
st.sval.trim().length()==0))
st.nextToken();
return getToken();
}
public String getToken(){
return (st.ttype == StreamTokenizer.TT_WORD) ? st.sval : (""+(char)st.ttype);
}
public String skip(String sym) throws IOException {
if(getToken().equals(sym))
return nextToken();
else
throw new IllegalArgumentException("skip: "+sym+" expected but"+
sym +" found ");
}
}
For more, you can follow the tutorial
https://docs.oracle.com/javase/tutorial/jaxp/stax/example.html
http://www.ibm.com/developerworks/library/x-tipstx2/index.html
http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/HTML/ch09s03.html
http://staf.sourceforge.net/current/STAXDoc.pdf
Another approach, since you're not using a rigid OXM approach anyway.
You might want to try using a less rigid parser such as JSoup?
This will stop immediate problems with invalid XML schemas etc, but it will just devolve the problem into your code.
Just to throw in a different approach to a solution:
You might envelope your input stream with a stream inplementation that replaces the entities by something legal.
While this is a hack for sure, it should be a quick and easy solution (or better say: workaround).
Not as elegant and clean as a xml framework internal solution, though.
I made yesterday something similar i need to add value from unziped XML in stream to database.
//import I'm not sure if all are necessary :)
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
//I didnt checked this code now because i'm in work for sure its work maybe
you will need to do little changes
InputSource is = new InputSource(new FileInputStream("test.xml"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String words= xpath.evaluate("/foo/bar", doc.getDocumentElement());
ParsingHexToChar.parseToChar(words);
// lib which i use common-lang3.jar
//metod to parse
public static String parseToChar( String words){
String decode= org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(words);
return decode;
}
Try this using org.apache.commons package :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
InputStream in = new FileInputStream(xmlfile);
String unescapeHtml4 = IOUtils.toString(in);
CharSequenceTranslator obj = new AggregateTranslator(new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE())
);
unescapeHtml4 = obj.translate(unescapeHtml4);
StringReader readerInput= new StringReader(unescapeHtml4);
InputSource is = new InputSource(readerInput);
Document doc = parser.parse(is);
I am using this link to generate XML file using DOM. It says that "Xerces parser is bundled with the JDK 1.5 distribution.So you need not download the parser separately."
However, when I write the following line in my Eclipse Helios it gives compile-time error even though I have Java 1.6 in my system.
import org.apache.xml.serialize.XMLSerializer;
Why is it so?
Xerces is indeed bundled with the JDK but you should use it with the JAXP API under javax.xml.parsers. Check the output of the program below.
Also, to serialize an XML Document, you should use DOM Level 3 Load and Save (present in the JDK) or an XSLT transformation with no stylesheet (identity transformation). The rest is dependent on a specific implementation. The Xerces XMLSerializer is deprecated:
Deprecated. This class was deprecated in Xerces 2.9.0. It is recommended that new applications use the DOM Level 3 LSSerializer or JAXP's Transformation API for XML (TrAX) for serializing XML. See the Xerces documentation for more information.
Here is an example of serialization with DOM level 3:
import org.w3c.dom.*;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.ls.*;
public class DOMExample3 {
public static void main(String[] args) throws Exception {
DOMImplementationRegistry registry = DOMImplementationRegistry.newInstance();
DOMImplementationLS impl = (DOMImplementationLS) registry.getDOMImplementation("XML 3.0 LS 3.0");
if (impl == null) {
System.out.println("No DOMImplementation found !");
System.exit(0);
}
System.out.printf("DOMImplementationLS: %s\n", impl.getClass().getName());
LSParser parser = impl.createLSParser(
DOMImplementationLS.MODE_SYNCHRONOUS,
"http://www.w3.org/TR/REC-xml");
// http://www.w3.org/2001/XMLSchema
System.out.printf("LSParser: %s\n", parser.getClass().getName());
if (args.length == 0) {
System.exit(0);
}
Document doc = parser.parseURI(args[0]);
LSSerializer serializer = impl.createLSSerializer();
LSOutput output = impl.createLSOutput();
output.setEncoding("UTF-8");
output.setByteStream(System.out);
serializer.write(doc, output);
System.out.println();
}
}
Here is an example with an identity transformation:
import org.w3c.dom.Document;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
public class DOMExample2 {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = factory.newDocumentBuilder();
System.out.println("Parsing XML document...");
Document doc;
doc = parser.parse(args[0]);
// Xerces Java 2
/* Deprecated. This class was deprecated in Xerces 2.9.0.
* It is recommended that new applications use the DOM Level 3
* LSSerializer or JAXP's Transformation API for XML (TrAX)
* for serializing XML and HTML.
* See the Xerces documentation for more information.
*/
/*
System.out.println("XERCES: Displaying XML document...");
OutputFormat of = new OutputFormat(doc, "ISO-8859-1", true);
PrintWriter pw = new PrintWriter(System.out);
BaseMarkupSerializer bms = new XMLSerializer(pw, of);
bms.serialize(doc);
*/
// JAXP
System.out.println("JAXP: Displaying XML document...");
TransformerFactory transFactory = TransformerFactory.newInstance();
System.out.println(transFactory.getClass().getName());
//transFactory.setAttribute("indent-number", 2);
Transformer idTransform = transFactory.newTransformer();
idTransform.setOutputProperty(OutputKeys.METHOD, "xml");
idTransform.setOutputProperty(OutputKeys.INDENT,"yes");
// Apache default indentation is 0
idTransform.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
Source input = new DOMSource(doc);
Result output = new StreamResult(System.out);
idTransform.transform(input, output);
}
}
It will be in, IIRC, com.sun.org.apache.xml.serialize.XMLSerializer. However, those are private classes and likely to change at any time. I suggest using the standard public APIs (javax.* and friends) instead. (Use the transform API without any XSLT.)
I've an xml file that I would avoid having to load all in memory.
As everyone know, for such a file I better have to use a SAX parser (which will go along the file and call for events if something relevant is found.)
My current problem is that I would like to process the file "by chunk" which means:
Parse the file and find a relevant tag (node)
Load this tag entirely in memory (like we would do it in DOM)
Do the process of this entity (that local chunk)
When I'm done with the chunk, release it and continue to 1. (until "end of file")
In a perfect world I'm searching some something like this:
// 1. Create a parser and set the file to load
IdealParser p = new IdealParser("BigFile.xml");
// 2. Set an XPath to define the interesting nodes
p.setRelevantNodesPath("/path/to/relevant/nodes");
// 3. Add a handler to callback the right method once a node is found
p.setHandler(new Handler(){
// 4. The method callback by the parser when a relevant node is found
void aNodeIsFound(saxNode aNode)
{
// 5. Inflate the current node i.e. load it (and all its content) in memory
DomNode d = aNode.expand();
// 6. Do something with the inflated node (method to be defined somewhere)
doThingWithNode(d);
}
});
// 7. Start the parser
p.start();
I'm currently stuck on how to expand a "sax node" (understand me…) efficiently.
Is there any Java framework or library relevant to this kind of task?
UPDATE
You could also just use the javax.xml.xpath APIs:
package forum7998733;
import java.io.FileReader;
import javax.xml.xpath.*;
import org.w3c.dom.Node;
import org.xml.sax.InputSource;
public class XPathDemo {
public static void main(String[] args) throws Exception {
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
InputSource xml = new InputSource(new FileReader("BigFile.xml"));
Node result = (Node) xpath.evaluate("/path/to/relevant/nodes", xml, XPathConstants.NODE);
System.out.println(result);
}
}
Below is a sample of how it could be done with StAX.
input.xml
Below is some sample XML:
<statements>
<statement account="123">
...stuff...
</statement>
<statement account="456">
...stuff...
</statement>
</statements>
Demo
In this example a StAX XMLStreamReader is used to find the node that will be converted to a DOM. In this example we convert each statement fragment to a DOM, but your navigation algorithm could be more advanced.
package forum7998733;
import java.io.FileReader;
import javax.xml.stream.*;
import javax.xml.transform.*;
import javax.xml.transform.stax.StAXSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.dom.*;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(new FileReader("src/forum7998733/input.xml"));
xsr.nextTag(); // Advance to statements element
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
while(xsr.nextTag() == XMLStreamConstants.START_ELEMENT) {
DOMResult domResult = new DOMResult();
t.transform(new StAXSource(xsr), domResult);
DOMSource domSource = new DOMSource(domResult.getNode());
StreamResult streamResult = new StreamResult(System.out);
t.transform(domSource, streamResult);
}
}
}
Output
<?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="123">
...stuff...
</statement><?xml version="1.0" encoding="UTF-8" standalone="no"?><statement account="456">
...stuff...
</statement>
It could be done with SAX... But I think the newer StAX (Streaming API for XML) will serve your purpose better. You could create an XMLEventReader and use that to parse your file, detecting which nodes adhere to one of your criteria. For simple path-based selection (not really XPath, but some simple / delimited path) you'd need to maintain a path to your current node by adding entries to a String on new elements or cutting of entries on an end tag. A boolean flag can suffice to maintain whether you're currently in "relevant mode" or not.
As you obtain XMLEvents from your reader, you could copy the relevant ones over to an XMLEventWriter that you've created on some suitable placeholder, like a StringWriter or ByteArrayOutputStream. Once you've completed the copying for some XML extract that forms a "subdocument" of what you wish to build a DOM for, simply supply your placeholder to a DocumentBuilder in a suitable form.
The limitation here is that you're not harnessing all the power of the XPath language. If you wish to take stuff like node position into account, you'd have to foresee that in your own path. Perhaps someone knows of a good way of integrating a true XPath implementation into this.
StAX is really nice in that it gives you control over the parsing, rather than using some callback interface through a handler like SAX.
There's yet another alternative: using XSLT. An XSLT stylesheet is the ideal way to filter out only relevant stuff. You could transform your input once to obtain the required fragments and process those. Or run multiple stylesheets over the same input to get the desired extract each time. An even nicer (and more efficient) solution, however, would be the use of extension functions and/or extension elements.
Extension functions can be implemented in a way that's independent from the XSLT processor being used. They're fairly straightforward to use in Java and I know for a fact that you can use them to pass complete XML extracts to a method, because I've done so already. Might take some experimentation, but it's a powerful mechanism. A DOM extract (or node) is probably one of the accepted parameter types for such a method. That'd leave the document building up to the XSLT processor which is even easier.
Extension elements are also very useful, but I think they need to be used in an implementation-specific manner. If you're okay with tying yourself to a specific JAXP setup like Xerces + Xalan, they might be the answer.
When going for XSLT, you'll have all the advantages of a full XPath 1.0 implementation, plus the peace of mind that comes from knowing XSLT is in really good shape in Java. It limits the building of the input tree to those nodes that are needed at any time and is blazing fast because the processors tend to compile stylesheets into Java bytecode rather than interpreting them. It is possible that using compilation instead of interpretation loses the possibility of using extension elements, though. Not certain about that. Extension functions are still possible.
Whatever way you choose, there's so much out there for XML processing in Java that you'll find plenty of help in implementing this, should you have no luck in finding a ready-made solution. That'd be the most obvious thing, of course... No need to reinvent the wheel when someone did the hard work.
Good luck!
EDIT: because I'm actually not feeling depressed for once, here's a demo using the StAX solution I whipped up. It's certainly not the cleanest code, but it'll give you the basic idea:
package staxdom;
import java.io.IOException;
import java.io.InputStream;
import java.io.StringReader;
import java.io.StringWriter;
import java.util.Collections;
import java.util.HashSet;
import java.util.Set;
import java.util.Stack;
import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLEventWriter;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.StartElement;
import javax.xml.stream.events.XMLEvent;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class DOMExtractor {
private final Set<String> paths;
private final XMLInputFactory inputFactory;
private final XMLOutputFactory outputFactory;
private final DocumentBuilderFactory docBuilderFactory;
private final Stack<QName> activeStack = new Stack<QName>();
private boolean active = false;
private String currentPath = "";
public DOMExtractor(final Set<String> paths) {
this.paths = Collections.unmodifiableSet(new HashSet<String>(paths));
inputFactory = XMLInputFactory.newFactory();
outputFactory = XMLOutputFactory.newFactory();
docBuilderFactory = DocumentBuilderFactory.newInstance();
}
public void parse(final InputStream input) throws XMLStreamException, ParserConfigurationException, SAXException, IOException {
final XMLEventReader reader = inputFactory.createXMLEventReader(input);
XMLEventWriter writer = null;
StringWriter buffer = null;
final DocumentBuilder builder = docBuilderFactory.newDocumentBuilder();
XMLEvent currentEvent = reader.nextEvent();
do {
if(active)
writer.add(currentEvent);
if(currentEvent.isEndElement()) {
if(active) {
activeStack.pop();
if(activeStack.isEmpty()) {
writer.flush();
writer.close();
final Document doc;
final StringReader docReader = new StringReader(buffer.toString());
try {
doc = builder.parse(new InputSource(docReader));
} finally {
docReader.close();
}
//TODO: use doc
//Next bit is only for demo...
outputDoc(doc);
active = false;
writer = null;
buffer = null;
}
}
int index;
if((index = currentPath.lastIndexOf('/')) >= 0)
currentPath = currentPath.substring(0, index);
} else if(currentEvent.isStartElement()) {
final StartElement start = (StartElement)currentEvent;
final QName qName = start.getName();
final String local = qName.getLocalPart();
currentPath += "/" + local;
if(!active && paths.contains(currentPath)) {
active = true;
buffer = new StringWriter();
writer = outputFactory.createXMLEventWriter(buffer);
writer.add(currentEvent);
}
if(active)
activeStack.push(qName);
}
currentEvent = reader.nextEvent();
} while(!currentEvent.isEndDocument());
}
private void outputDoc(final Document doc) {
try {
final Transformer t = TransformerFactory.newInstance().newTransformer();
t.transform(new DOMSource(doc), new StreamResult(System.out));
System.out.println("");
System.out.println("");
} catch(TransformerException ex) {
ex.printStackTrace();
}
}
public static void main(String[] args) {
final Set<String> paths = new HashSet<String>();
paths.add("/root/one");
paths.add("/root/three/embedded");
final DOMExtractor me = new DOMExtractor(paths);
InputStream stream = null;
try {
stream = DOMExtractor.class.getResourceAsStream("sample.xml");
me.parse(stream);
} catch(final Exception e) {
e.printStackTrace();
} finally {
if(stream != null)
try {
stream.close();
} catch(IOException ex) {
ex.printStackTrace();
}
}
}
}
And the sample.xml file (should be in the same package):
<?xml version="1.0" encoding="UTF-8"?>
<root>
<one>
<two>this is text</two>
look, I can even handle mixed!
</one>
... not sure what to do with this, though
<two>
<willbeignored/>
</two>
<three>
<embedded>
<and><here><we><go>
Creative Commons Legal Code
Attribution 3.0 Unported
CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
LEGAL SERVICES. DISTRIBUTION OF THIS LICENSE DOES NOT CREATE AN
ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
REGARDING THE INFORMATION PROVIDED, AND DISCLAIMS LIABILITY FOR
DAMAGES RESULTING FROM ITS USE.
License
THE WORK (AS DEFINED BELOW) IS PROVIDED UNDER THE TERMS OF THIS CREATIVE
COMMONS PUBLIC LICENSE ("CCPL" OR "LICENSE"). THE WORK IS PROTECTED BY
COPYRIGHT AND/OR OTHER APPLICABLE LAW. ANY USE OF THE WORK OTHER THAN AS
AUTHORIZED UNDER THIS LICENSE OR COPYRIGHT LAW IS PROHIBITED.
BY EXERCISING ANY RIGHTS TO THE WORK PROVIDED HERE, YOU ACCEPT AND AGREE
TO BE BOUND BY THE TERMS OF THIS LICENSE. TO THE EXTENT THIS LICENSE MAY
BE CONSIDERED TO BE A CONTRACT, THE LICENSOR GRANTS YOU THE RIGHTS
CONTAINED HERE IN CONSIDERATION OF YOUR ACCEPTANCE OF SUCH TERMS AND
CONDITIONS.
</go></we></here></and>
</embedded>
</three>
</root>
EDIT 2: Just noticed in Blaise Doughan's answer that there's a StAXSource. That'll be even more efficient. Use that if you're going with StAX. Will eliminate the need to keep some buffer. StAX allows you to "peek" at the next event, so you can check if it's a start element with the right path without consuming it before passing it into the transformer .
ok thanks to your pieces of code, I finally end up with my solution:
Usage is quite intuitive:
try
{
/* CREATE THE PARSER */
XMLParser parser = new XMLParser();
/* CREATE THE FILTER (THIS IS A REGEX (X)PATH FILTER) */
XMLRegexFilter filter = new XMLRegexFilter("statements/statement");
/* CREATE THE HANDLER WHICH WILL BE CALLED WHEN A NODE IS FOUND */
XMLHandler handler = new XMLHandler()
{
public void nodeFound(StringBuilder node, XMLStackFilter withFilter)
{
// DO SOMETHING WITH THE FOUND XML NODE
System.out.println("Node found");
System.out.println(node.toString());
}
};
/* ATTACH THE FILTER WITH THE HANDLER */
parser.addFilterWithHandler(filter, handler);
/* SET THE FILE TO PARSE */
parser.setFilePath("/path/to/bigfile.xml");
/* RUN THE PARSER */
parser.parse();
}
catch (Exception ex)
{
ex.printStackTrace();
}
Note:
I made a XMLNodeFoundNotifier and a XMLStackFilter interface to easily integrate or build your own handler / filter.
Normally you should be able to parse very large files with this class. Only the returned nodes are actually loaded into memory.
You can enable attributes support in uncommenting the right part in the code, I disabled it for simplicity reasons.
You can use as many filters per handler as you need and conversely
All the of the code is here:
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Stack;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import javax.xml.stream.*;
/* IMPLEMENT THIS TO YOUR CLASS IN ORDER TO TO BE NOTIFIED WHEN A NODE IS FOUND*/
interface XMLNodeFoundNotifier {
abstract void nodeFound(StringBuilder node, XMLStackFilter withFilter);
}
/* A SMALL HANDER USEFULL FOR EXPLICIT CLASS DECLARATION */
abstract class XMLHandler implements XMLNodeFoundNotifier {
}
/* INTERFACE TO WRITE YOUR OWN FILTER BASED ON THE CURRENT NODES STACK (PATH)*/
interface XMLStackFilter {
abstract boolean isRelevant(Stack fullPath);
}
/* A VERY USEFULL FILTER USING REGEX AS THE PATH FILTER */
class XMLRegexFilter implements XMLStackFilter {
Pattern relevantExpression;
XMLRegexFilter(String filterRules) {
relevantExpression = Pattern.compile(filterRules);
}
/* HERE WE ARE ARE ASK TO TELL IF THE CURRENT STACK (LIST OF NODES) IS RELEVANT
* OR NOT ACCORDING TO WHAT WE WANT. RETURN TRUE IF THIS IS THE CASE */
#Override
public boolean isRelevant(Stack fullPath) {
/* A POSSIBLE CLEVER WAY COULD BE TO SERIALIZE THE WHOLE PATH (INCLUDING
* ATTRIBUTES) TO A STRING AND TO MATCH IT WITH A REGEX BEING THE FILTER
* FOR NOW StackToString DOES NOT SERIALIZE ATTRIBUTES */
String stackPath = XMLParser.StackToString(fullPath);
Matcher m = relevantExpression.matcher(stackPath);
return m.matches();
}
}
/* THE MAIN PARSER'S CLASS */
public class XMLParser {
HashMap<XMLStackFilter, XMLNodeFoundNotifier> filterHandler;
HashMap<Integer, Integer> feedingStreams;
Stack<HashMap> currentStack;
String filePath;
XMLParser() {
currentStack = new <HashMap>Stack();
filterHandler = new <XMLStackFilter, XMLNodeFoundNotifier> HashMap();
feedingStreams = new <Integer, Integer>HashMap();
}
public void addFilterWithHandler(XMLStackFilter f, XMLNodeFoundNotifier h) {
filterHandler.put(f, h);
}
public void setFilePath(String filePath) {
this.filePath = filePath;
}
/* CONVERT A STACK OF NODES TO A REGULAR PATH STRING. NOTE THAT PER DEFAULT
* I DID NOT ADDED THE ATTRIBUTES INTO THE PATH. UNCOMENT THE LINKS ABOVE TO
* DO SO
*/
public static String StackToString(Stack<HashMap> s) {
int k = s.size();
if (k == 0) {
return null;
}
StringBuilder out = new StringBuilder();
out.append(s.get(0).get("tag"));
for (int x = 1; x < k; ++x) {
HashMap node = s.get(x);
out.append('/').append(node.get("tag"));
/*
// UNCOMMENT THIS TO ADD THE ATTRIBUTES SUPPORT TO THE PATH
ArrayList <String[]>attributes = (ArrayList)node.get("attr");
if (attributes.size()>0)
{
out.append("[");
for (int i = 0 ; i<attributes.size(); i++)
{
String[]keyValuePair = attributes.get(i);
if (i>0) out.append(",");
out.append(keyValuePair[0]);
out.append("=\"");
out.append(keyValuePair[1]);
out.append("\"");
}
out.append("]");
}*/
}
return out.toString();
}
/*
* ONCE A NODE HAS BEEN SUCCESSFULLY FOUND, WE GET THE DELIMITERS OF THE FILE
* WE THEN RETRIEVE THE DATA FROM IT.
*/
private StringBuilder getChunk(int from, int to) throws Exception {
int length = to - from;
FileReader f = new FileReader(filePath);
BufferedReader br = new BufferedReader(f);
br.skip(from);
char[] readb = new char[length];
br.read(readb, 0, length);
StringBuilder b = new StringBuilder();
b.append(readb);
return b;
}
/* TRANSFORMS AN XSR NODE TO A HASHMAP NODE'S REPRESENTATION */
public HashMap XSRNode2HashMap(XMLStreamReader xsr) {
HashMap h = new HashMap();
ArrayList attributes = new ArrayList();
for (int i = 0; i < xsr.getAttributeCount(); i++) {
String[] s = new String[2];
s[0] = xsr.getAttributeName(i).toString();
s[1] = xsr.getAttributeValue(i);
attributes.add(s);
}
h.put("tag", xsr.getName());
h.put("attr", attributes);
return h;
}
public void parse() throws Exception {
FileReader f = new FileReader(filePath);
XMLInputFactory xif = XMLInputFactory.newInstance();
XMLStreamReader xsr = xif.createXMLStreamReader(f);
Location previousLoc = xsr.getLocation();
while (xsr.hasNext()) {
switch (xsr.next()) {
case XMLStreamConstants.START_ELEMENT:
currentStack.add(XSRNode2HashMap(xsr));
for (XMLStackFilter filter : filterHandler.keySet()) {
if (filter.isRelevant(currentStack)) {
feedingStreams.put(currentStack.hashCode(), new Integer(previousLoc.getCharacterOffset()));
}
}
previousLoc = xsr.getLocation();
break;
case XMLStreamConstants.END_ELEMENT:
Integer stream = null;
if ((stream = feedingStreams.get(currentStack.hashCode())) != null) {
// FIND ALL THE FILTERS RELATED TO THIS FeedingStreem AND CALL THEIR HANDLER.
for (XMLStackFilter filter : filterHandler.keySet()) {
if (filter.isRelevant(currentStack)) {
XMLNodeFoundNotifier h = filterHandler.get(filter);
StringBuilder aChunk = getChunk(stream.intValue(), xsr.getLocation().getCharacterOffset());
h.nodeFound(aChunk, filter);
}
}
feedingStreams.remove(currentStack.hashCode());
}
previousLoc = xsr.getLocation();
currentStack.pop();
break;
default:
break;
}
}
}
}
A little while since i did SAX, but what you want to do is process each of the tags until you find the end tag for the group you want to process, then run your process, clear it out and look for the next start tag.
private void validateXML(DOMSource source) throws Exception {
URL schemaFile = new URL("http://www.csc.liv.ac.uk/~valli/modules.xsd");
SchemaFactory schemaFactory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI);
Schema schema = schemaFactory.newSchema(schemaFile);
Validator validator = schema.newValidator();
DOMResult result = new DOMResult();
try {
validator.validate(source, result);
System.out.println("is valid");
} catch (SAXException e) {
System.out.println("not valid because " + e.getLocalizedMessage());
}
}
But this returns an error saying:
Exception in thread "main" java.lang.IllegalArgumentException: No SchemaFactory that implements the schema language specified by: http://www.w3.org/2001/XMLSchema -instance could be loaded
Is this a problem with my code or with the actual xsd file?
That error means that your installed Java doesn't have any classes that can parse XMLSchema files, so it's not a problem with the schema or your code.
I'm pretty sure recent JREs have the appropriate classes by default, so can you get us the output of java -version?
Update:
You're using the wrong XMLContants string. You want: XMLConstants.W3C_XML_SCHEMA_NS_URI
Those files are based on the underlying system. I had the same issue when I was programming a project for Android. I found that I had to use Xerces-for-Android to solve my problem.
The following worked for me for validation on Android, if your code relates to Android perhaps it will help, if it doesn't then perhaps the approach will help you with your underlying system:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.
I have created an XML and I want to validate with schema i.e,
XSD file but there are no direct classes provided by android for the
same if I am not wrong ......... and there is an external jar named
jaxp1.3 which doesn't allow me to compile the code is it because the
bytecode of desktop and android are different? Which has the classes
schema factory and validator which does the validation stuff ...... Is
there an other option available. Any help would be appreciated .....
desperately searching for the ans..........
It's a known issue posted by Google here
The solution is to use Apache Xerces ported to Android.
There is a project here
You have to do a svn chekout and export the proyect to a jar file to use as a library in your android proyect.
The code to instance SchemaFactory change a little.
I show you an example:
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
SchemaFactory factory = new XMLSchemaFactory();
Schema esquema = factory.newSchema(".../file.xsd");
#iOSDev, I had to use Xerces-for-Android for my validation. The following is a summary of what I did to get it working with my program:
Create a validation utility.
Get both the xml and xsd into file on the android OS and use the validation utility against it.
Use Xerces-For-Android to do the validation.
Android does support some packages which we can use, I created my xml validation utility based on: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
My initial sandbox testing was pretty smooth with java, then I tried to port it over to Dalvik and found that my code did not work. Some things just aren't supported the same with Dalvik, so I made some modifications.
I found a reference to xerces for android, so I modified my sandbox test of (the following doesn't work with android, the example after this does):
import java.io.File;
import javax.xml.XMLConstants;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Source;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import javax.xml.validation.Validator;
import org.w3c.dom.Document;
/**
* A Utility to help with xml communication validation.
*/
public class XmlUtil {
/**
* Validation method.
* Base code/example from: http://docs.oracle.com/javase/1.5.0/docs/api/javax/xml/validation/package-summary.html
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// parse an XML document into a DOM tree
DocumentBuilder parser = null;
Document document;
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
// validate the DOM tree
parser = DocumentBuilderFactory.newInstance().newDocumentBuilder();
document = parser.parse(new File(xmlFilePath));
// create a SchemaFactory capable of understanding WXS schemas
SchemaFactory factory = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
// load a WXS schema, represented by a Schema instance
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Schema schema = factory.newSchema(schemaFile);
// create a Validator instance, which can be used to validate an instance document
Validator validator = schema.newValidator();
validator.validate(new DOMSource(document));
} catch (Exception e) {
// Catches: SAXException, ParserConfigurationException, and IOException.
return false;
}
return true;
}
}
The above code had to be modified some to work with xerces for android (http://gc.codehum.com/p/xerces-for-android/). You need SVN to get the project, the following are some crib notes:
download xerces-for-android
download silk svn (for windows users) from http://www.sliksvn.com/en/download
install silk svn (I did complete install)
Once the install is complete, you should have svn in your system path.
Test by typing "svn" from the command line.
I went to my desktop then downloaded the xerces project by:
svn checkout http://xerces-for-android.googlecode.com/svn/trunk/ xerces-for-android-read-only
You should then have a new folder on your desktop called xerces-for-android-read-only
With the above jar (Eventually I'll make it into a jar, just copied it directly into my source for quick testing. If you wish to do the same, you can making the jar quickly with Ant (http://ant.apache.org/manual/using.html)), I was able to get the following to work for my xml validation:
import java.io.File;
import java.io.IOException;
import mf.javax.xml.transform.Source;
import mf.javax.xml.transform.stream.StreamSource;
import mf.javax.xml.validation.Schema;
import mf.javax.xml.validation.SchemaFactory;
import mf.javax.xml.validation.Validator;
import mf.org.apache.xerces.jaxp.validation.XMLSchemaFactory;
import org.xml.sax.SAXException;
/**
* A Utility to help with xml communication validation.
*/public class XmlUtil {
/**
* Validation method.
*
* #param xmlFilePath The xml file we are trying to validate.
* #param xmlSchemaFilePath The schema file we are using for the validation. This method assumes the schema file is valid.
* #return True if valid, false if not valid or bad parse or exception/error during parse.
*/
public static boolean validate(String xmlFilePath, String xmlSchemaFilePath) {
// Try the validation, we assume that if there are any issues with the validation
// process that the input is invalid.
try {
SchemaFactory factory = new XMLSchemaFactory();
Source schemaFile = new StreamSource(new File(xmlSchemaFilePath));
Source xmlSource = new StreamSource(new File(xmlFilePath));
Schema schema = factory.newSchema(schemaFile);
Validator validator = schema.newValidator();
validator.validate(xmlSource);
} catch (SAXException e) {
return false;
} catch (IOException e) {
return false;
} catch (Exception e) {
// Catches everything beyond: SAXException, and IOException.
e.printStackTrace();
return false;
} catch (Error e) {
// Needed this for debugging when I was having issues with my 1st set of code.
e.printStackTrace();
return false;
}
return true;
}
}
Some Side Notes:
For creating the files, I made a simple file utility to write string to files:
public static void createFileFromString(String fileText, String fileName) {
try {
File file = new File(fileName);
BufferedWriter output = new BufferedWriter(new FileWriter(file));
output.write(fileText);
output.close();
} catch ( IOException e ) {
e.printStackTrace();
}
}
I also needed to write to an area that I had access to, so I made use of:
String path = this.getActivity().getPackageManager().getPackageInfo(getPackageName(), 0).applicationInfo.dataDir;
A little hackish, it works. I'm sure there is a more succinct way of doing this, however I figured I'd share my success, as there weren't any good examples that I found.