Parsing an XML file with a DTD schema on a relative path

Parsing an XML file with a DTD schema on a relative path - java

I have the following java code:
DocumentBuilder db=DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc=db.parse(new File("/opt/myfile"));
And /opt/myfile contains something like:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE archive SYSTEM "../../schema/xml/schema.dtd">
...
I get the following error:
java.io.FileNotFoundException: /../schema/xml/schema.dtd (No such file or directory)
This is a large java framework that consumes an XML file produced elsewhere. I think the relative path is the problem. I don't think it will be acceptable to change the cwd before the JVM starts (the path comes from a config file that is read by the JVM itself) and I have not found a way to change the cwd while the JVM is running. How do I parse this XML file with the appropriate DTD?

You need to use a custom EntityResolver to tweak the path of the DTD so that it can be found. For example:
db.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
if (systemId.contains("schema.dtd")) {
return new InputSource(new FileReader("/path/to/schema.dtd"));
} else {
return null;
}
}
});
If schema.dtd is on your classpath, you can just use getResourceAsStream to load it, without specifying the full path:
return new InputSource(Foo.class.getResourceAsStream("schema.dtd"));

You can also ignore the DTD altogether:
http://marcels-javanotes.blogspot.com/2005/11/parsing-xml-file-without-having-access.html

Below code work for me, It ignore DTD
Imports:
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.IOException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
Code :
File fileName = new File("XML File Path");
DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
EntityResolver resolver = new EntityResolver () {
public InputSource resolveEntity (String publicId, String systemId) {
String empty = "";
ByteArrayInputStream bais = new ByteArrayInputStream(empty.getBytes());
System.out.println("resolveEntity:" + publicId + "|" + systemId);
return new InputSource(bais);
}
};
documentBuilder.setEntityResolver(resolver);
Document document = documentBuilder.parse(fileName);

I used the custom EntityResolver like the example above but it still searched the DTD file in another base directory. So I debuged it and then found out I need to change user.dir system property. So I added this line to my application initialization method and it works now.
System.setProperty("user.dir")

Related

Java using Saxon (s9api) to transform XML: How to add input files in resources?

I use saxon (HE 9.9.1-6) to transform a XML to a HTML file. Saxon is used because the XSLT is version 2 and the default java classes failed.
The XSLT contains two statements to copy in the content of other files:
<xsl:value-of select="unparsed-text('file.ext')"/>
This works fine as long as the Xslt and those files are in the same directory and the xslt is given as a file source
Source xslt = new StreamSource(new File("c:/somedir/file.xsl"));
But my xslt is inside a resource directory (later on it's supposed to be packed in a jar). If I use it in that context, saxon fails to find the included files becuase it looks in the root directory of my project:
Source xslt = new StreamSource(getClass().getClassLoader().getResourceAsStream("file.xsl"));
results in:
Error evaluating (fn:unparsed-text(...)) in xsl:value-of/#select on line 22 column 66
FOUT1170: Failed to read input file: <project root directory>\included_file.css (File not found)
Is there any way that I could supply saxon with additional StreamSources for the files it needs to include? I was unable to find anything.
Ideally, I'd like something like this:
transformer.addInput(new StreamSource(getClass().getClassLoader().getResourceAsStream("inputfile.css")));
The only solution I've came up with was pretty ugly: Copy the xslt and the files it needs from the resources to a temporary directory and then do the conversion using that as a source.
Example code
I'm not knowledgeable in writing XSLT so I can only offer non-minimal example files.
The xslt and its two reuqired files (css and js) can be found here. The ones you need are the three "xrechnung" ones. Direct links: xrechnung-html.xsl, xrechnung-viewer.css, xrechnung-viewer.js.
Please put them in a resource directory (just in case, in eclipse: make a resources-folder and add it as a source directory in properties->build path).
The xml was generated by the first step of above project using it's own example files, I put it on pastebin here
(originally included directly but got character limit error)
Finally, the Java-Code including the ugly workaround:
import java.io.File;
import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.StandardCopyOption;
import java.util.Comparator;
import javax.xml.transform.Source;
import javax.xml.transform.stream.StreamSource;
import org.xml.sax.SAXException;
import net.sf.saxon.s9api.Processor;
import net.sf.saxon.s9api.SaxonApiException;
import net.sf.saxon.s9api.Serializer;
import net.sf.saxon.s9api.Xslt30Transformer;
import net.sf.saxon.s9api.XsltCompiler;
import net.sf.saxon.s9api.XsltExecutable;
public class SaxonProblem {
public static void main(String[] args) throws IOException, SaxonApiException, SAXException {
Path xml = Paths.get("path/to/the.xml");
//working(xml);
notWorking(xml);
}
public static void working(Path xmlFile) throws IOException, SaxonApiException, SAXException {
Path dir = Files.createTempDirectory("saxon");
System.out.println("Temp dir: " + dir.toString());
Path xsltFile = dir.resolve("xrechnung-html.xsl");
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-html.xsl"),
xsltFile, StandardCopyOption.REPLACE_EXISTING);
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-viewer.css"),
dir.resolve("xrechnung-viewer.css"), StandardCopyOption.REPLACE_EXISTING);
Files.copy(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-viewer.js"),
dir.resolve("xrechnung-viewer.js"), StandardCopyOption.REPLACE_EXISTING);
// for the sake of brevity, the html is made where the xml was
Path html = xmlFile.resolveSibling(xmlFile.getFileName().toString() + ".html");
Source xslt = new StreamSource(xsltFile.toFile());
Source xml = new StreamSource(xmlFile.toFile());
transformXml(xml, xslt, html);
// cleanup
Files.walk(dir).sorted(Comparator.reverseOrder()).map(Path::toFile).forEach(File::delete);
}
public static void notWorking(Path xmlFile) throws SaxonApiException, SAXException, IOException {
// for the sake of brevity, the html is made where the xml was
Path html = xmlFile.resolveSibling(xmlFile.getFileName().toString() + ".html");
Source xslt = new StreamSource(SaxonProblem.class.getClassLoader().getResourceAsStream("xrechnung-html.xsl"));
Source xml = new StreamSource(xmlFile.toFile());
transformXml(xml, xslt, html);
}
public static void transformXml(Source xml, Source xslt, Path output) throws SaxonApiException, SAXException, IOException {
Processor processor = new Processor(false);
XsltCompiler compiler = processor.newXsltCompiler();
XsltExecutable stylesheet = compiler.compile(xslt);
Serializer out = processor.newSerializer(output.toFile());
out.setOutputProperty(Serializer.Property.METHOD, "html");
out.setOutputProperty(Serializer.Property.INDENT, "yes");
Xslt30Transformer transformer = stylesheet.load30();
transformer.transform(xml, out);
}
}
Solution
Thanks to the comment by Martin Honnen and the answer by Michael Kay, I have a solution using UnparsedTextURIResolver. Does feel more like a hack, but it works and is better than my previous workaround:
Processor processor = new Processor(false);
UnparsedTextURIResolver defaultUtur = processor.getUnderlyingConfiguration().getUnparsedTextURIResolver();
processor.getUnderlyingConfiguration().setUnparsedTextURIResolver(new UnparsedTextURIResolver() {
#Override
public Reader resolve(URI arg0, String arg1, Configuration arg2) throws XPathException {
if (arg0.toString().endsWith("myfilename.css")) {
InputStream css = SaxonProblem.class.getClassLoader().getResourceAsStream("myfilename.css");
return new InputStreamReader(css);
}
return defaultUtur.resolve(arg0, arg1, arg2);
}
});
//[...]

Some suggestions:
Use a URI with the classpath: scheme (fairly recent addition and may not be supported on all paths where URIs are used)
Register an UnparsedTextResolver with the configuration; Saxon will delegate the task of finding the resource to this resolver
Supply the name of the containing directory as a parameter to the stylesheet, and use the resolve-uri() function to get the absolute URI

parsing an XML file with DOM error [duplicate]

I have to parse a bunch of XML files in Java that sometimes -- and invalidly -- contain HTML entities such as —, > and so forth. I understand the correct way of dealing with this is to add suitable entity declarations to the XML file before parsing. However, I can't do that as I have no control over those XML files.
Is there some kind of callback I can override that is invoked whenever the Java XML parser encounters such an entity? I haven't been able to find one in the API.
I'd like to use:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( stream );
I found that I can override resolveEntity in org.xml.sax.helpers.DefaultHandler, but how do I use this with the higher-level API?
Here's a full example:
public class Main {
public static void main( String [] args ) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
Document doc = parser.parse( new FileInputStream( "test.xml" ));
}
}
with test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Produces:
[Fatal Error] :3:20: The entity "nbsp" was referenced, but not declared.
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 3; columnNumber: 20; The entity "nbsp" was referenced, but not declared.
Update: I have been poking around in the JDK source code with a debugger, and boy, what an amount of spaghetti. I have no idea what the design is there, or whether there is one. Just how many layers of an onion can one layer on top of each other?
They key class seems to be com.sun.org.apache.xerces.internal.impl.XMLEntityManager, but I cannot find any code that either lets me add stuff into it before it gets used, or that attempts to resolve entities without going through that class.

I would use a library like Jsoup for this purpose. I tested the following below and it works. I don't know if this helps. It can be located here: http://jsoup.org/download
public static void main(String args[]){
String html = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}
Result:
<bar>
Some text — invalid!
</bar>
Loading from a file can be found here:
http://jsoup.org/cookbook/input/load-document-from-file

Issue - 1: I have to parse a bunch of XML files in Java that sometimes -- and
invalidly -- contain HTML entities such as —
XML has only five predefined entities. The —, is not among them. It works only when used in plain HTML or in legacy JSP. So, SAX will not help. It can be done using StaX which has high level iterator based API. (Collected from this link)
Issue - 2: I found that I can override resolveEntity in
org.xml.sax.helpers.DefaultHandler, but how do I use this with the
higher-level API?
Streaming API for XML, called StaX, is an API for reading and writing XML Documents.
StaX is a Pull-Parsing model. Application can take the control over parsing the XML documents by pulling (taking) the events from the parser.
The core StaX API falls into two categories and they are listed below. They are
Cursor based API: It is low-level API. cursor-based API allows the application to process XML as a stream of tokens aka events
Iterator based API: The higher-level iterator-based API allows the application to process XML as a series of event objects, each of which communicates a piece of the XML structure to the application.
STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:
Requires the parser to replace internal entity references with their
replacement text and report them as characters
This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader.
However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to notreplace them.
You may try it. Hope it will solve your issue. For your case,
Main.java
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import javax.xml.stream.XMLEventReader;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.events.EntityReference;
import javax.xml.stream.events.XMLEvent;
public class Main {
public static void main(String[] args) {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(
XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader;
try {
reader = inputFactory
.createXMLEventReader(new FileInputStream("F://test.xml"));
while (reader.hasNext()) {
XMLEvent event = reader.nextEvent();
if (event.isEntityReference()) {
EntityReference ref = (EntityReference) event;
System.out.println("Entity Reference: " + ref.getName());
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
}
}
test.xml:
<?xml version="1.0" encoding="UTF-8"?>
<foo>
<bar>Some text — invalid!</bar>
</foo>
Output:
Entity Reference: nbsp
Entity Reference: mdash
Credit goes to #skaffman.
Related Link:
http://www.journaldev.com/1191/how-to-read-xml-file-in-java-using-java-stax-api
http://www.journaldev.com/1226/java-stax-cursor-based-api-read-xml-example
http://www.vogella.com/tutorials/JavaXML/article.html
Is there a Java XML API that can parse a document without resolving character entities?
UPDATE:
Issue - 3: Is there a way to use StaX to "filter" the entities (replacing them
with something else, for example) and still produce a Document at the
end of the process?
To create a new document using the StAX API, it is required to create an XMLStreamWriter that provides methods to produce XML opening and closing tags, attributes and character content.
There are 5 methods of XMLStreamWriter for document.
xmlsw.writeStartDocument(); - initialises an empty document to which
elements can be added
xmlsw.writeStartElement(String s) -creates a new element named s
xmlsw.writeAttribute(String name, String value)- adds the attribute
name with the corresponding value to the last element produced by a
call to writeStartElement. It is possible to add attributes as long
as no call to writeElementStart,writeCharacters or writeEndElement
has been done.
xmlsw.writeEndElement - close the last started element
xmlsw.writeCharacters(String s) - creates a new text node with
content s as content of the last started element.
A sample example is attached with it:
StAXExpand.java
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import javax.xml.stream.XMLOutputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamWriter;
import java.util.Arrays;
public class StAXExpand {
static XMLStreamWriter xmlsw = null;
public static void main(String[] argv) {
try {
xmlsw = XMLOutputFactory.newInstance()
.createXMLStreamWriter(System.out);
CompactTokenizer tok = new CompactTokenizer(
new FileReader(argv[0]));
String rootName = "dummyRoot";
// ignore everything preceding the word before the first "["
while(!tok.nextToken().equals("[")){
rootName=tok.getToken();
}
// start creating new document
xmlsw.writeStartDocument();
ignorableSpacing(0);
xmlsw.writeStartElement(rootName);
expand(tok,3);
ignorableSpacing(0);
xmlsw.writeEndDocument();
xmlsw.flush();
xmlsw.close();
} catch (XMLStreamException e){
System.out.println(e.getMessage());
} catch (IOException ex) {
System.out.println("IOException"+ex);
ex.printStackTrace();
}
}
public static void expand(CompactTokenizer tok, int indent)
throws IOException,XMLStreamException {
tok.skip("[");
while(tok.getToken().equals("#")) {// add attributes
String attName = tok.nextToken();
tok.nextToken();
xmlsw.writeAttribute(attName,tok.skip("["));
tok.nextToken();
tok.skip("]");
}
boolean lastWasElement=true; // for controlling the output of newlines
while(!tok.getToken().equals("]")){ // process content
String s = tok.getToken().trim();
tok.nextToken();
if(tok.getToken().equals("[")){
if(lastWasElement)ignorableSpacing(indent);
xmlsw.writeStartElement(s);
expand(tok,indent+3);
lastWasElement=true;
} else {
xmlsw.writeCharacters(s);
lastWasElement=false;
}
}
tok.skip("]");
if(lastWasElement)ignorableSpacing(indent-3);
xmlsw.writeEndElement();
}
private static char[] blanks = "\n".toCharArray();
private static void ignorableSpacing(int nb)
throws XMLStreamException {
if(nb>blanks.length){// extend the length of space array
blanks = new char[nb+1];
blanks[0]='\n';
Arrays.fill(blanks,1,blanks.length,' ');
}
xmlsw.writeCharacters(blanks, 0, nb+1);
}
}
CompactTokenizer.java
import java.io.Reader;
import java.io.IOException;
import java.io.StreamTokenizer;
public class CompactTokenizer {
private StreamTokenizer st;
CompactTokenizer(Reader r){
st = new StreamTokenizer(r);
st.resetSyntax(); // remove parsing of numbers...
st.wordChars('\u0000','\u00FF'); // everything is part of a word
// except the following...
st.ordinaryChar('\n');
st.ordinaryChar('[');
st.ordinaryChar(']');
st.ordinaryChar('#');
}
public String nextToken() throws IOException{
st.nextToken();
while(st.ttype=='\n'||
(st.ttype==StreamTokenizer.TT_WORD &&
st.sval.trim().length()==0))
st.nextToken();
return getToken();
}
public String getToken(){
return (st.ttype == StreamTokenizer.TT_WORD) ? st.sval : (""+(char)st.ttype);
}
public String skip(String sym) throws IOException {
if(getToken().equals(sym))
return nextToken();
else
throw new IllegalArgumentException("skip: "+sym+" expected but"+
sym +" found ");
}
}
For more, you can follow the tutorial
https://docs.oracle.com/javase/tutorial/jaxp/stax/example.html
http://www.ibm.com/developerworks/library/x-tipstx2/index.html
http://www.iro.umontreal.ca/~lapalme/ForestInsteadOfTheTrees/HTML/ch09s03.html
http://staf.sourceforge.net/current/STAXDoc.pdf

Another approach, since you're not using a rigid OXM approach anyway.
You might want to try using a less rigid parser such as JSoup?
This will stop immediate problems with invalid XML schemas etc, but it will just devolve the problem into your code.

Just to throw in a different approach to a solution:
You might envelope your input stream with a stream inplementation that replaces the entities by something legal.
While this is a hack for sure, it should be a quick and easy solution (or better say: workaround).
Not as elegant and clean as a xml framework internal solution, though.

I made yesterday something similar i need to add value from unziped XML in stream to database.
//import I'm not sure if all are necessary :)
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
//I didnt checked this code now because i'm in work for sure its work maybe
you will need to do little changes
InputSource is = new InputSource(new FileInputStream("test.xml"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
XPathFactory xpf = XPathFactory.newInstance();
XPath xpath = xpf.newXPath();
String words= xpath.evaluate("/foo/bar", doc.getDocumentElement());
ParsingHexToChar.parseToChar(words);
// lib which i use common-lang3.jar
//metod to parse
public static String parseToChar( String words){
String decode= org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4(words);
return decode;
}

Try this using org.apache.commons package :
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder parser = dbf.newDocumentBuilder();
InputStream in = new FileInputStream(xmlfile);
String unescapeHtml4 = IOUtils.toString(in);
CharSequenceTranslator obj = new AggregateTranslator(new LookupTranslator(EntityArrays.ISO8859_1_UNESCAPE()),
new LookupTranslator(EntityArrays.HTML40_EXTENDED_UNESCAPE())
);
unescapeHtml4 = obj.translate(unescapeHtml4);
StringReader readerInput= new StringReader(unescapeHtml4);
InputSource is = new InputSource(readerInput);
Document doc = parser.parse(is);

Can read files while executing from Netbeans, but not when executing JAR-file

I have a program that needs to get information from a XML-file, and it's working as planned when I'am executing in NetBeans. When I'm on the other hand using the JAR-file and trying to do the same, I'm catching an IOException. Why is that? I'm using a String with the absolute path:
InputStream in = new FileInputStream(strPath);
doc = db.parse(in);
java.io.FileNotFoundException: kmk (Det går inte att hitta filen)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at java.io.FileInputStream.<init>(FileInputStream.java:101)
at xmlDom.XMLTolk.skapaTolk(XMLTolk.java:82)
at xmlDom.XMLTolk.<init>(XMLTolk.java:61)
at xmlDom.Kloak.<init>(Kloak.java:46)
at xmlDom.Kloak.main(Kloak.java:123)
This small program can parse the XML-file, but when I create a JAR-file of it - it doesn't work.
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class XMLTest {
Document d;
public XMLTest() throws ParserConfigurationException, FileNotFoundException, IOException, SAXException {
String docString = "C:\\someXML.xml";
DocumentBuilderFactory dbf;
dbf = DocumentBuilderFactory.newInstance();
dbf.setValidating(true);
dbf.setIgnoringElementContentWhitespace(true);
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream in = new FileInputStream(docString);
d = db.parse(in);
}
public static void main(String[] args) throws ParserConfigurationException, FileNotFoundException, IOException, SAXException {
XMLTest t = new XMLTest();
}
}

Most likely because you're trying to read a file inside the application's classpath, and there's no filesystem file for the item when it's packaged in a jar. Use ClassLoader#getResourceAsStream instead.

When running with "java -jar foo.jar" the current working directory is not changed.
So when you are using relative paths they are found relative to your current working directory as set already, and not relative to the jar file location which is what you intuitively could have been expecting.
For most cases you can ask the JVM where it found the bytes used to define a given class and use that to derive the location of your jar file, which you can then use to locate the kmk file.

Reading XML online and Storing It (Using Java)

I found and followed an example from Stackoverflow (http://stackoverflow.com/questions/2310139/how-to-read-xml-response-from-a-url-in-java) of how to read an XML file from a URL (as you can see in my code pasted below). My only trouble is that now that I got the program to read the XML, how do I get it to store it? For example, could I make it save the information to a XML file built into the project (this would be the best solution for me, if it's possible)? Such as, take for example, I have a blank XML file built into the project. The program runs, reads the XML code off of the URL, and stores it all into the pre-built blank XML file. Could I do this?
If I sound confusing or un-clear about anything, just ask me to clarify what I'm looking for.
And here is my code, if you'd like to look at what I have so far:
package xml.parsing.example;
import java.io.IOException;
import java.net.URL;
import java.net.URLConnection;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
public class XmlParser {
public static void main (String[] args) throws IOException, ParserConfigurationException, SAXException, TransformerException {
URL url = new URL("http://totheriver.com/learn/xml/code/employees.xml");
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
TransformerFactory tfactory = TransformerFactory.newInstance();
Transformer xform = tfactory.newTransformer();
// that’s the default xform; use a stylesheet to get a real one
xform.transform(new DOMSource(doc), new StreamResult(System.out));
}
}

Very simply:
File myOutput = new File("c:\\myDirectory\\myOutput.xml");
xform.transform(new DOMSource(doc), new StreamResult(myOutput));

This page has some great examples of how to serialize the DOM object to a neatly formatted XML file.

how to disable dtd at runtime in java's xpath?

I got dtd in file and I cant remove it. When i try to parse it in Java I get "Caused by: java.net.SocketException: Network is unreachable: connect", because its remote dtd. can I disable somehow dtd checking?

You should be able to specify your own EntityResolver, or use specific features of your parser? See here for some approaches.
A more complete example:
<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "//FOO//" "foo.dtd">
<foo>
<bar>Value</bar>
</foo>
And xpath usage:
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Main {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
System.out.println("Ignoring " + publicId + ", " + systemId);
return new InputSource(new StringReader(""));
}
});
Document document = builder.parse(new File("src/foo.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String content = xpath.evaluate("/foo/bar/text()", document
.getDocumentElement());
System.out.println(content);
}
}
Hope this helps...

This worked for me:
SAXParserFactory saxfac = SAXParserFactory.newInstance();
saxfac.setValidating(false);
try {
saxfac.setFeature("http://xml.org/sax/features/validation", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
saxfac.setFeature("http://xml.org/sax/features/external-general-entities", false);
saxfac.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
}
catch (Exception e1) {
e1.printStackTrace();
}

I had this problem before. I solved it by downloading and storing a local copy of the DTD and then validating against the local copy. You need to edit the XML file to point to the local copy.
<!DOCTYPE root-element SYSTEM "filename">
Little more info here: http://www.w3schools.com/dtd/dtd_intro.asp
I think you can also manually set some sort of validateOnParse property to "false" in your parser. Depends on what library you're using to parse the XML.
More info here: http://www.w3schools.com/dtd/dtd_validation.asp

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing an XML file with a DTD schema on a relative path - java

You can also ignore the DTD altogether: http://marcels-javanotes.blogspot.com/2005/11/parsing-xml-file-without-having-access.html

Related

Java using Saxon (s9api) to transform XML: How to add input files in resources?

parsing an XML file with DOM error [duplicate]

Can read files while executing from Netbeans, but not when executing JAR-file

Reading XML online and Storing It (Using Java)

how to disable dtd at runtime in java's xpath?

Categories

Resources