I got dtd in file and I cant remove it. When i try to parse it in Java I get "Caused by: java.net.SocketException: Network is unreachable: connect", because its remote dtd. can I disable somehow dtd checking?
You should be able to specify your own EntityResolver, or use specific features of your parser? See here for some approaches.
A more complete example:
<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "//FOO//" "foo.dtd">
<foo>
<bar>Value</bar>
</foo>
And xpath usage:
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Main {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
System.out.println("Ignoring " + publicId + ", " + systemId);
return new InputSource(new StringReader(""));
}
});
Document document = builder.parse(new File("src/foo.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String content = xpath.evaluate("/foo/bar/text()", document
.getDocumentElement());
System.out.println(content);
}
}
Hope this helps...
This worked for me:
SAXParserFactory saxfac = SAXParserFactory.newInstance();
saxfac.setValidating(false);
try {
saxfac.setFeature("http://xml.org/sax/features/validation", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
saxfac.setFeature("http://xml.org/sax/features/external-general-entities", false);
saxfac.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
}
catch (Exception e1) {
e1.printStackTrace();
}
I had this problem before. I solved it by downloading and storing a local copy of the DTD and then validating against the local copy. You need to edit the XML file to point to the local copy.
<!DOCTYPE root-element SYSTEM "filename">
Little more info here: http://www.w3schools.com/dtd/dtd_intro.asp
I think you can also manually set some sort of validateOnParse property to "false" in your parser. Depends on what library you're using to parse the XML.
More info here: http://www.w3schools.com/dtd/dtd_validation.asp
Related
I have a situation where I'd like to start using an XML Schema to validate documents that, until now, have never had a schema definition. As such, the existing documents I'd like to validate do not have any xmlns declaration in them.
I have no problem successfully validating a document which does include the xmlns declaration, but I'd also like to be able to validate those documents without such a declaration. I was hoping for something like this:
DocumentBuilderFactory dbf = ...;
dbf.setSchema(... my schema for namespace "foo:bar"...);
dbf.setValidating(false);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setDefaultNamespace("foo:bar");
Document doc = db.parse(input);
There is no such method DocumentBuilder.setDefaultNamespace and so the schema validation is not performed when loading documents of this type.
Is there any way to force the namespace for a document if one is not set? Or does that require essentially parsing the XML without regard to schema, checking for an existing namespace, adjusting it, then re-validating the document with the schema?
I'm currently expecting the parser to perform validation during parsing, but I have no problem parsing first and then validating afterward.
UPDATE 2021-01-13
Here is a concrete example of what I'm trying to do, as a JUnit test case.
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.Schema;
import javax.xml.validation.SchemaFactory;
import org.junit.Assert;
import org.junit.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class XMLSchemaTest
{
private static final String XMLNS = "http://www.example.com/schema";
private static final String schemaDocument = "<xs:schema xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" targetNamespace=\"" + XMLNS + "\" xmlns:e=\"" + XMLNS + "\" elementFormDefault=\"qualified\"><xs:element name=\"example\" type=\"e:exampleType\" /><xs:complexType name=\"exampleType\"><xs:sequence><xs:element name=\"test\" type=\"e:testType\" /></xs:sequence></xs:complexType><xs:complexType name=\"testType\" /></xs:schema>";
private static Document parse(String document) throws SAXException, ParserConfigurationException, IOException {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
db.setErrorHandler(new MyErrorHandler());
return db.parse(new InputSource(new StringReader(document)));
}
#Test
public void testConformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
#Test
public void testConformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><test/></example>";
Document doc = parse(testDocument);
//Assert.assertEquals("Wrong document XML namespace", XMLNS, doc.getNamespaceURI());
Element root = doc.getDocumentElement();
Assert.assertEquals("Wrong root element XML namespace", XMLNS, root.getNamespaceURI());
Assert.assertEquals("Wrong element name", "example", root.getLocalName());
Assert.assertEquals("Wrong element name", "example", root.getTagName());
}
#Test
public void testNononformingDocumentWithSchema() throws Exception {
String testDocument = "<example xmlns=\"" + XMLNS + "\"><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
#Test
public void testNononformingDocumentWithoutSchema() throws Exception {
String testDocument = "<example><random/></example>";
try {
parse(testDocument);
Assert.fail("Document should not have parsed properly");
} catch (Exception e) {
System.out.println(e);
// Expected
}
}
public static class MyErrorHandler implements ErrorHandler {
#Override
public void warning(SAXParseException exception) throws SAXException {
System.err.println("WARNING: " + exception);
}
#Override
public void error(SAXParseException exception) throws SAXException {
throw exception;
}
#Override
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: " + exception);
}
}
}
All of the tests pass except for testConformingDocumentWithoutSchema. I think this is kind of expected, as the document declares no namespace.
I'm asking how the test can e changed (but not the document itself!) so that I can validate the document against a schema that was not actually declared by the document.
I pounded on this for a while, and I was able to come up with a hack that works. It may be possible to do this more elegantly (which was my original question), and it also may be possible to do this with less code, but this was what I was able to come up with.
If you look at the JUnit test case in the question, changing the "parse" method to the following (and adding XMLNS as the second argument to all calls to parse) will allow all tests to complete:
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
...
private static Document parse(String document, String namespace) throws SAXException, ParserConfigurationException, IOException {
SchemaFactory sf = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema");
Source[] sources = new Source[] {
new StreamSource(new StringReader(schemaDocument))
};
Schema schema = sf.newSchema(sources);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setSchema(schema);
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
ErrorHandler errorHandler = new MyErrorHandler();
db.setErrorHandler(errorHandler);
try {
return db.parse(new InputSource(new StringReader(document)));
} catch (SAXParseException spe) {
// Just in case this was a problem with a missing namespace
// System.out.println("Possibly recovering from SPE " + spe);
// New DocumentBuilder without the schema
dbf.setSchema(null);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
Document doc = db.parse(new InputSource(new StringReader(document)));
if(null != doc.getDocumentElement().getNamespaceURI()) {
// Namespace URI was set; this is a fatal error
throw spe;
}
// Override the namespace on the Document + root element
doc.getDocumentElement().setAttribute("xmlns", namespace);
// Serialize the document -> String to start over again
DOMImplementationLS domImplementation = (DOMImplementationLS) doc.getImplementation();
LSSerializer lsSerializer = domImplementation.createLSSerializer();
LSOutput lsOutput = domImplementation.createLSOutput();
lsOutput.setEncoding("UTF-8");
StringWriter out = new StringWriter();
lsOutput.setCharacterStream(out);
lsSerializer.write(doc, lsOutput);
String converted = out.toString();
// Re-enable the schema
dbf.setSchema(schema);
db = dbf.newDocumentBuilder();
db.setErrorHandler(errorHandler);
return db.parse(new InputSource(new StringReader(converted)));
}
}
This works by catching SAXParseException and, because SAXParseException doesn't give up any of its details, assuming that the problem might be due to a missing XML namespace declaration. I then re-parse the document without the schema validation, add a namespace declaration to the in-memory Document, then serialize the Document to String and re-parse the document with the schema validation re-enabled.
I tried to do this just by setting the XML namespace and then using Schema.newValidator().validate(new DOMSource(doc)), but this failed validation every time for me. Running through the serializer got around that problem.
I want to manipulate xml doc having default namespace but no prefix. Is there a way to use xpath without namespace uri just as if there is no namespace?
I believe it should be possible if we set namespaceAware property of documentBuilderFactory to false. But in my case it is not working.
Is my understanding is incorrect or I am doing some mistake in code?
Here is my code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(false);
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xPath.evaluate("//author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
Here is my xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="http://www.mydomain.com/schema">
<author>
<book title="t1"/>
<book title="t2"/>
</author>
</root>
The XPath processing for a document that uses the default namespace (no prefix) is the same as the XPath processing for a document that uses prefixes:
For namespace qualified documents you can use a NamespaceContext when you execute the XPath. You will need to prefix the fragments in the XPath to match the NamespaceContext. The prefixes you use do not need to match the prefixes used in the document.
http://download.oracle.com/javase/6/docs/api/javax/xml/namespace/NamespaceContext.html
Here is how it looks with your code:
import java.util.Iterator;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new MyNamespaceContext());
NodeList nl = (NodeList) xPath.evaluate("/ns:root/ns:author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
}
private static class MyNamespaceContext implements NamespaceContext {
public String getNamespaceURI(String prefix) {
if("ns".equals(prefix)) {
return "http://www.mydomain.com/schema";
}
return null;
}
public String getPrefix(String namespaceURI) {
return null;
}
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
}
Note:
I also used the corrected XPath suggested by Dennis.
The following also appears to work, and is closer to your original question:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xPath.evaluate("/root/author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Blaise Doughan is right, attached code is correct.
Problem was somewhere elese. I was running all my tests through Application launcher in Eclipse IDE and nothing was working. Then I discovered Eclipse project was cause of all grief. I ran my class from command prompt, it worked. Created a new eclipse project and pasted same code there, it worked there too.
Thank you all guys for your time and efforts.
I've written a simple NamespaceContext implementation (here), that might be of help. It takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.
It follows the NamespaceContext spesification, and you can see how it works in the unit tests.
Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");
context = new SimpleNamespaceContext(mappings);
context.getNamespaceURI("foo"); // "http://foo"
context.getPrefix("http://foo"); // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]
Note that it has a dependency on Google Guava
My goal is executing an XQuery using XPath.
My XML file is:
<?xml version="1.0" encoding="UTF-8"?>
<postes>
<poste>
<gouvernourat>Kairouan</gouvernourat>
<ville>Kairouan sud</ville>
<cp>3100</cp>
</poste>
<poste>
<gouvernourat>Tunis</gouvernourat>
<ville>Ghazela</ville>
<cp>1002</cp>
</poste>
</postes>
My Java code is:
package xmlparse;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class QueryXML {
public void query() throws ParserConfigurationException, SAXException,
IOException, XPathExpressionException {
// Standard of reading a XML file
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
XPathExpression expr = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("a.xml"); //C:\\Users\\aymen\\Desktop\\
// Create a XPathFactory
XPathFactory xFactory = XPathFactory.newInstance();
// Create a XPath object
XPath xpath = xFactory.newXPath();
// Compile the XPath expression
expr = xpath.compile("/postes/poste[gouvernourat='Tunis']/ville/text()");
// Run the query and get a nodeset
Object result = expr.evaluate(doc, XPathConstants.NODESET);
// Cast the result to a DOM NodeList
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
}
public static void main(String[] args) throws XPathExpressionException, ParserConfigurationException, SAXException, IOException {
QueryXML process = new QueryXML();
process.query();
}
}
When I launch this Java code the result is displayed on the console correctly (System.out.println).
But if I copy this code to my Android application and change System.out.println(nodes.item(i).getNodeValue()); to Text2.setText(nodes.item(i).getNodeValue()); (I have a TextView named Text2)
When I execute the code and I click the button the TextView stays empty (No error for Force Close)
Thank you in advance
Attribute names needs to start with '#' while using XPath in Android.
So change
[gouvernourat='Tunis']
To
[#gouvernourat='Tunis']
Refer http://developer.android.com/reference/javax/xml/xpath/package-summary.html for details.
I have the following java code:
DocumentBuilder db=DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc=db.parse(new File("/opt/myfile"));
And /opt/myfile contains something like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE archive SYSTEM "../../schema/xml/schema.dtd">
...
I get the following error:
java.io.FileNotFoundException: /../schema/xml/schema.dtd (No such file or directory)
This is a large java framework that consumes an XML file produced elsewhere. I think the relative path is the problem. I don't think it will be acceptable to change the cwd before the JVM starts (the path comes from a config file that is read by the JVM itself) and I have not found a way to change the cwd while the JVM is running. How do I parse this XML file with the appropriate DTD?
You need to use a custom EntityResolver to tweak the path of the DTD so that it can be found. For example:
db.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("schema.dtd")) {
return new InputSource(new FileReader("/path/to/schema.dtd"));
} else {
return null;
}
}
});
If schema.dtd is on your classpath, you can just use getResourceAsStream to load it, without specifying the full path:
return new InputSource(Foo.class.getResourceAsStream("schema.dtd"));
You can also ignore the DTD altogether:
http://marcels-javanotes.blogspot.com/2005/11/parsing-xml-file-without-having-access.html
Below code work for me, It ignore DTD
Imports:
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
Code :
File fileName = new File("XML File Path");
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
EntityResolver resolver = new EntityResolver () {
public InputSource resolveEntity (String publicId, String systemId) {
String empty = "";
ByteArrayInputStream bais = new ByteArrayInputStream(empty.getBytes());
System.out.println("resolveEntity:" + publicId + "|" + systemId);
return new InputSource(bais);
}
};
documentBuilder.setEntityResolver(resolver);
Document document = documentBuilder.parse(fileName);
I used the custom EntityResolver like the example above but it still searched the DTD file in another base directory. So I debuged it and then found out I need to change user.dir system property. So I added this line to my application initialization method and it works now.
System.setProperty("user.dir")
In my XML file I have some entities such as ’
So I have created a DTD tag for my XML document to define these entities. Below is the Java code used to read the XML file.
SAXBuilder builder = new SAXBuilder();
URL url = new URL("http://127.0.0.1:8080/sample/subject.xml");
InputStream stream = url.openStream();
org.jdom.Document document = builder.build(stream);
Element root = document.getRootElement();
Element name = root.getChild("name");
result = name.getText();
System.err.println(result);
How can I change the Java code to retrieve a DTD over HTTP to allow the parsing of my XML document to be error free?
Simplified example of the xml document.
<main>
<name>hello ‘ world ’ foo & bar </name>
</main>
One way to do this would be to read the document and then validate it with the transformer:
import java.net.URL;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.OutputKeys;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
public class ValidateWithExternalDTD {
private static final String URL = "http://127.0.0.1:8080/sample/subject.xml";
private static final String DTD = "http://127.0.0.1/YourDTD.dtd";
public static void main(String args[]) {
try {
DocumentBuilderFactory factory= DocumentBuilderFactory.newInstance();
factory.setValidating(true);
DocumentBuilder builder = factory.newDocumentBuilder();
// Set the error handler
builder.setErrorHandler(new org.xml.sax.ErrorHandler() {
public void fatalError(SAXParseException spex)
throws SAXException {
// output error and exit
spex.printStackTrace();
System.exit(0);
}
public void error(SAXParseException spex)
throws SAXParseException {
// output error and continue
spex.printStackTrace();
}
public void warning(SAXParseException spex)
throws SAXParseException {
// output warning and continue
spex.printStackTrace();
}
});
// Read the document
URL url = new URL(ValidateWithExternalDTD.URL);
Document xmlDocument = builder.parse(url.openStream());
DOMSource source = new DOMSource(xmlDocument);
// Use the tranformer to validate the document
StreamResult result = new StreamResult(System.out);
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
transformer.setOutputProperty(OutputKeys.DOCTYPE_SYSTEM, ValidateWithExternalDTD.DTD);
transformer.transform(source, result);
// Process your document if everything is OK
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
Another way would be to replace the XML title with the XML title plus the DTD reference
Replace this:
<?xml version = "1.0"?>
with this:
<?xml version = "1.0"?><!DOCTYPE ...>
Of course you would replace the first occurance only and not try to go through the whole xml document
You have to instantiate the SAXBuilder by passing true(validate) to its constructor:
SAXBuilder builder = new SAXBuilder(true);
or call:
builder.setValidation(true)