I have a situation where I'd like to start using an XML Schema to validate documents that, until now, have never had a schema definition. As such, the existing documents I'd like to validate do not have any xmlns declaration in them.
I have no problem successfully validating a document which does include the xmlns declaration, but I'd also like to be able to validate those documents without such a declaration. I was hoping for something like this:
DocumentBuilderFactory dbf = ...;
dbf.setSchema(... my schema for namespace "foo:bar"...);
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setDefaultNamespace("foo:bar");
Document doc = db.parse(input);
There is no such method DocumentBuilder.setDefaultNamespace and so the schema validation is not performed when loading documents of this type.
Is there any way to force the namespace for a document if one is not set? Or does that require essentially parsing the XML without regard to schema, checking for an existing namespace, adjusting it, then re-validating the document with the schema?
I'm currently expecting the parser to perform validation during parsing, but I have no problem parsing first and then validating afterward.
UPDATE 2021-01-13
Here is a concrete example of what I'm trying to do, as a JUnit test case.
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.junit.Assert;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLSchemaTest
{
private static final String XMLNS = "http://www.example.com/schema";
private static final String schemaDocument = "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" targetNamespace=\"" + XMLNS + "\" xmlns:e=\"" + XMLNS + "\" elementFormDefault=\"qualified\"><xs:element name=\"example\" type=\"e:exampleType\" /><xs:complexType name=\"exampleType\"><xs:sequence><xs:element name=\"test\" type=\"e:testType\" /></xs:sequence></xs:complexType><xs:complexType name=\"testType\" /></xs:schema>";
private static Document parse(String document) throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setErrorHandler(new MyErrorHandler());
return db.parse(new InputSource(new StringReader(document)));
}
#Test
public void testConformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
#Test
public void testConformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
#Test
public void testNononformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
#Test
public void testNononformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
public static class MyErrorHandler implements ErrorHandler {
#Override
public void warning(SAXParseException exception) throws SAXException {
System.err.println("WARNING: " + exception);
}
#Override
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: " + exception);
}
}
}
All of the tests pass except for testConformingDocumentWithoutSchema. I think this is kind of expected, as the document declares no namespace.
I'm asking how the test can e changed (but not the document itself!) so that I can validate the document against a schema that was not actually declared by the document.
I pounded on this for a while, and I was able to come up with a hack that works. It may be possible to do this more elegantly (which was my original question), and it also may be possible to do this with less code, but this was what I was able to come up with.
If you look at the JUnit test case in the question, changing the "parse" method to the following (and adding XMLNS as the second argument to all calls to parse) will allow all tests to complete:
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
...
private static Document parse(String document, String namespace) throws SAXException, ParserConfigurationException, IOException {
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
ErrorHandler errorHandler = new MyErrorHandler();
db.setErrorHandler(errorHandler);
try {
return db.parse(new InputSource(new StringReader(document)));
} catch (SAXParseException spe) {
// Just in case this was a problem with a missing namespace
// System.out.println("Possibly recovering from SPE " + spe);
// New DocumentBuilder without the schema
dbf.setSchema(null);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
Document doc = db.parse(new InputSource(new StringReader(document)));
if(null != doc.getDocumentElement().getNamespaceURI()) {
// Namespace URI was set; this is a fatal error
throw spe;
}
// Override the namespace on the Document + root element
doc.getDocumentElement().setAttribute("xmlns", namespace);
// Serialize the document -> String to start over again
DOMImplementationLS domImplementation = (DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
LSOutput lsOutput = domImplementation.createLSOutput();
lsOutput.setEncoding("UTF-8");
StringWriter out = new StringWriter();
lsOutput.setCharacterStream(out);
lsSerializer.write(doc, lsOutput);
String converted = out.toString();
// Re-enable the schema
dbf.setSchema(schema);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
return db.parse(new InputSource(new StringReader(converted)));
}
}
This works by catching SAXParseException and, because SAXParseException doesn't give up any of its details, assuming that the problem might be due to a missing XML namespace declaration. I then re-parse the document without the schema validation, add a namespace declaration to the in-memory Document, then serialize the Document to String and re-parse the document with the schema validation re-enabled.
I tried to do this just by setting the XML namespace and then using Schema.newValidator().validate(new DOMSource(doc)), but this failed validation every time for me. Running through the serializer got around that problem.
Related
I have a requirement to create a sort of 'skeleton' xml based on an XSD schema.
The documents defined by these schemas have no namespace. They are authored by other developers, not in an automated way.
There is no mixed content allowed. That is, elements can contain elements only, or text only.
The rules for this sample xml are:
elements that can contain only text content should not be created in the sample xml
all other optional and mandatory elements should be included in the sample xml
elements should be created only once even if they can occur multiple times
any other nodes such as attributes, comments, processing instruction, etc. should be ommited - the sample xml would be an 'element tree'
Are there APIs or tools in Java that can generate such sample xml? I'm looking for pointers where to get started.
This needs to be done programmatically in a reliable way, as the sample xml is used by other XSLT transformations.
Hope below code will serve your purpose
package com.example.demo;
import java.io.File;
import javax.xml.namespace.QName;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import jlibs.xml.sax.XMLDocument;
import jlibs.xml.xsd.XSInstance;
import jlibs.xml.xsd.XSParser;
public interface xsdtoxml {
public static void main(String[] pArgs) {
try {
String filename = "out.xsd";
// instance.
final Document doc = loadXsdDocument(filename);
//Find the docs root element and use it to find the targetNamespace
final Element rootElem = doc.getDocumentElement();
String targetNamespace = null;
if (rootElem != null && rootElem.getNodeName().equals("xs:schema"))
{
targetNamespace = rootElem.getAttribute("targetNamespace");
}
//Parse the file into an XSModel object
org.apache.xerces.xs.XSModel xsModel = new XSParser().parse(filename);
//Define defaults for the XML generation
XSInstance instance = new XSInstance();
instance.minimumElementsGenerated = 1;
instance.maximumElementsGenerated = 1;
instance.generateDefaultAttributes = true;
instance.generateOptionalAttributes = true;
instance.maximumRecursionDepth = 0;
instance.generateAllChoices = true;
instance.showContentModel = true;
instance.generateOptionalElements = true;
//Build the sample xml doc
//Replace first param to XMLDoc with a file input stream to write to file
QName rootElement = new QName(targetNamespace, "out");
XMLDocument sampleXml = new XMLDocument(new StreamResult(System.out), true, 4, null);
instance.generate(xsModel, rootElement, sampleXml);
} catch (TransformerConfigurationException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static Document loadXsdDocument(String inputName) {
final String filename = inputName;
final DocumentBuilderFactory factory = DocumentBuilderFactory
.newInstance();
factory.setValidating(false);
factory.setIgnoringElementContentWhitespace(true);
factory.setIgnoringComments(true);
Document doc = null;
try {
final DocumentBuilder builder = factory.newDocumentBuilder();
final File inputFile = new File(filename);
doc = builder.parse(inputFile);
} catch (final Exception e) {
e.printStackTrace();
// throw new ContentLoadException(msg);
}
return doc;
}
}
xsd to xml :
1 : you can use eclipse (right click and select Generate)
2 : Sun/Oracle Multi-Schema Validator
3 : xmlgen
see:
How to generate sample XML documents from their DTD or XSD?
for subtle requirements, you should program it yourself
I would need to build a simple program for my homework purposes that will retrieve data from an XML attribute based on the user input in a web service. To that end, I assumed I would start building a class that could parse my XML string and also I built a simple java service that does nothing but responds with a simple message. The problem is how do I put these together in order to get my program to work? Is this a good way to begin with? Please advise.
Also, to make thing a little more easier, the data in the string representation of XML has key words in both English and Serbian that would enable this web service to retrieve from one another:
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Recnik {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
String xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE language [<!ATTLIST phrase id ID #IMPLIED>]><language id=\"sr\"><phrase key=\"house\" value=\"kuca\"/><phrase key=\"dog\" value=\"pas\"/><phrase key=\"cat\" value=\"macka\"/></language>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
//FileInputStream fis = new FileInputStream("myBooks.xml");
InputSource is = new InputSource(new StringReader(xmlString));
Document doc = db.parse(is);
Element r = doc.getDocumentElement();
NodeList language = r.getElementsByTagName("phrase");
System.out.println(language.item(1).getAttributes().item(0).getTextContent());
}
}
package Prevodilac;
import javax.jws.WebService;
import javax.jws.WebMethod;
import javax.jws.WebParam;
#WebService(serviceName = "Prevodilac")
public class Prevodilac {
#WebMethod(operationName = "pretraga")
public String pretraga(int a, int b) {
Integer res = a+b;
return res.toString();
}
}
#WebService(serviceName = "Prevodilac")
public class Prevodilac {
Document doc;
public Prevodilac() throws ParserConfigurationException, SAXException, IOException{
// Fill the document just once, not for each method call
String xmlString = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><!DOCTYPE language [<!ATTLIST phrase id ID #IMPLIED>]><language id=\"sr\"><phrase key=\"house\" value=\"kuca\"/><phrase key=\"dog\" value=\"pas\"/><phrase key=\"cat\" value=\"macka\"/></language>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(xmlString));
doc = db.parse(is);
}
#WebMethod(operationName = "pretraga")
public String pretraga(String key) {
Element r = doc.getDocumentElement();
NodeList language = r.getElementsByTagName("phrase");
String result = "Not found";
for( int index = 0; index < language.getLength(); index++ ) {
Node attribute = language.item(index).getAttributes().getNamedItem("key");
// TODO (It's homework after all):
// check if the attribute corresponds to key parameter
if( attribute..... ){
// fill result with attribute value
result = ...;
}
}
return result;
}
}
In my XML file I have some entities such as ’
So I have created a DTD tag for my XML document to define these entities. Below is the Java code used to read the XML file.
SAXBuilder builder = new SAXBuilder();
URL url = new URL("http://127.0.0.1:8080/sample/subject.xml");
InputStream stream = url.openStream();
org.jdom.Document document = builder.build(stream);
Element root = document.getRootElement();
Element name = root.getChild("name");
result = name.getText();
System.err.println(result);
How can I change the Java code to retrieve a DTD over HTTP to allow the parsing of my XML document to be error free?
Simplified example of the xml document.
<main>
<name>hello ‘ world ’ foo & bar </name>
</main>
One way to do this would be to read the document and then validate it with the transformer:
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ValidateWithExternalDTD {
private static final String URL = "http://127.0.0.1:8080/sample/subject.xml";
private static final String DTD = "http://127.0.0.1/YourDTD.dtd";
public static void main(String args[]) {
try {
DocumentBuilderFactory factory= DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
// Set the error handler
builder.setErrorHandler(new org.xml.sax.ErrorHandler() {
public void fatalError(SAXParseException spex)
throws SAXException {
// output error and exit
spex.printStackTrace();
System.exit(0);
}
public void error(SAXParseException spex)
throws SAXParseException {
// output error and continue
spex.printStackTrace();
}
public void warning(SAXParseException spex)
throws SAXParseException {
// output warning and continue
spex.printStackTrace();
}
});
// Read the document
URL url = new URL(ValidateWithExternalDTD.URL);
Document xmlDocument = builder.parse(url.openStream());
DOMSource source = new DOMSource(xmlDocument);
// Use the tranformer to validate the document
StreamResult result = new StreamResult(System.out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, ValidateWithExternalDTD.DTD);
transformer.transform(source, result);
// Process your document if everything is OK
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Another way would be to replace the XML title with the XML title plus the DTD reference
Replace this:
<?xml version = "1.0"?>
with this:
<?xml version = "1.0"?><!DOCTYPE ...>
Of course you would replace the first occurance only and not try to go through the whole xml document
You have to instantiate the SAXBuilder by passing true(validate) to its constructor:
SAXBuilder builder = new SAXBuilder(true);
or call:
builder.setValidation(true)
I am looking for example Java code that can construct an XML document that uses namespaces. I cannot seem to find anything using my normal favourite tool so was hoping someone may be able to help me out.
There are a number of ways of doing this. Just a couple of examples:
Using XOM
import nu.xom.Document;
import nu.xom.Element;
public class XomTest {
public static void main(String[] args) {
XomTest xomTest = new XomTest();
xomTest.testXmlDocumentWithNamespaces();
}
private void testXmlDocumentWithNamespaces() {
Element root = new Element("my:example", "urn:example.namespace");
Document document = new Document(root);
Element element = new Element("element", "http://another.namespace");
root.appendChild(element);
System.out.print(document.toXML());
}
}
Using Java Implementation of W3C DOM
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.DOMImplementation;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
public class DomTest {
private static DocumentBuilderFactory dbf = DocumentBuilderFactory
.newInstance();
public static void main(String[] args) throws Exception {
DomTest domTest = new DomTest();
domTest.testXmlDocumentWithNamespaces();
}
public void testXmlDocumentWithNamespaces() throws Exception {
DocumentBuilder db = dbf.newDocumentBuilder();
DOMImplementation domImpl = db.getDOMImplementation();
Document document = buildExampleDocumentWithNamespaces(domImpl);
serialize(domImpl, document);
}
private Document buildExampleDocumentWithNamespaces(
DOMImplementation domImpl) {
Document document = domImpl.createDocument("urn:example.namespace",
"my:example", null);
Element element = document.createElementNS("http://another.namespace",
"element");
document.getDocumentElement().appendChild(element);
return document;
}
private void serialize(DOMImplementation domImpl, Document document) {
DOMImplementationLS ls = (DOMImplementationLS) domImpl;
LSSerializer lss = ls.createLSSerializer();
LSOutput lso = ls.createLSOutput();
lso.setByteStream(System.out);
lss.write(document, lso);
}
}
I am not sure, what you trying to do, but I use jdom for most of my xml-issues and it supports namespaces (of course).
The code:
Document doc = new Document();
Namespace sNS = Namespace.getNamespace("someNS", "someNamespace");
Element element = new Element("SomeElement", sNS);
element.setAttribute("someKey", "someValue", Namespace.getNamespace("someONS", "someOtherNamespace"));
Element element2 = new Element("SomeElement", Namespace.getNamespace("someNS", "someNamespace"));
element2.setAttribute("someKey", "someValue", sNS);
element.addContent(element2);
doc.addContent(element);
produces the following xml:
<?xml version="1.0" encoding="UTF-8"?>
<someNS:SomeElement xmlns:someNS="someNamespace" xmlns:someONS="someOtherNamespace" someONS:someKey="someValue">
<someNS:SomeElement someNS:someKey="someValue" />
</someNS:SomeElement>
Which should contain everything you need. Hope that helps.
I have a complete XML document in a string and would like a Document object. Google turns up all sorts of garbage. What is the simplest solution? (In Java 1.5)
Solution Thanks to Matt McMinn, I have settled on this implementation. It has the right level of input flexibility and exception granularity for me. (It's good to know if the error came from malformed XML - SAXException - or just bad IO - IOException.)
public static org.w3c.dom.Document loadXMLFrom(String xml)
throws org.xml.sax.SAXException, java.io.IOException {
return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes()));
}
public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is)
throws org.xml.sax.SAXException, java.io.IOException {
javax.xml.parsers.DocumentBuilderFactory factory =
javax.xml.parsers.DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
javax.xml.parsers.DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
}
catch (javax.xml.parsers.ParserConfigurationException ex) {
}
org.w3c.dom.Document doc = builder.parse(is);
is.close();
return doc;
}
Whoa there!
There's a potentially serious problem with this code, because it ignores the character encoding specified in the String (which is UTF-8 by default). When you call String.getBytes() the platform default encoding is used to encode Unicode characters to bytes. So, the parser may think it's getting UTF-8 data when in fact it's getting EBCDIC or something… not pretty!
Instead, use the parse method that takes an InputSource, which can be constructed with a Reader, like this:
import java.io.StringReader;
import org.xml.sax.InputSource;
…
return builder.parse(new InputSource(new StringReader(xml)));
It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k.
This works for me in Java 1.5 - I stripped out specific exceptions for readability.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;
public Document loadXMLFromString(String xml) throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(xml.getBytes()));
}
Just had a similar problem, except i needed a NodeList and not a Document, here's what I came up with. It's mostly the same solution as before, augmented to get the root element down as a NodeList and using erickson's suggestion of using an InputSource instead for character encoding issues.
private String DOC_ROOT="root";
String xml=getXmlString();
Document xmlDoc=loadXMLFrom(xml);
Element template=xmlDoc.getDocumentElement();
NodeList nodes=xmlDoc.getElementsByTagName(DOC_ROOT);
public static Document loadXMLFrom(String xml) throws Exception {
InputSource is= new InputSource(new StringReader(xml));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = null;
builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
return doc;
}
To manipulate XML in Java, I always tend to use the Transformer API:
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;
public static Document loadXMLFrom(String xml) throws TransformerException {
Source source = new StreamSource(new StringReader(xml));
DOMResult result = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(source , result);
return (Document) result.getNode();
}