I currently have the following XML file.
http://www.cse.unsw.edu.au/~cs9321/14s1/assignments/musicDb.xml
My XMLParser.java class.
package edu.unsw.comp9321.assignment1;
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
public class XMLParser {
public void search () {
try{
File fXmlFile = new File("/COMP9321Assignment1/xml/musicDb.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
} catch (Exception e) {
e.printStackTrace();
}
}
}
I create an object in another class and call the search but I keep receiving the above stated error.
Would anyone know what the problem might be?
Thanks for your help.
Normally "org.xml.sax.SAXParseException: Premature end file" occurrs due to several reasons
1.Check your xml were all the tags are closed properly at same level
2.check if any name space issues.
3.check for welformness of your xml document
Related
I want to manipulate xml doc having default namespace but no prefix. Is there a way to use xpath without namespace uri just as if there is no namespace?
I believe it should be possible if we set namespaceAware property of documentBuilderFactory to false. But in my case it is not working.
Is my understanding is incorrect or I am doing some mistake in code?
Here is my code:
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(false);
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xPath.evaluate("//author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
Here is my xml:
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns="http://www.mydomain.com/schema">
<author>
<book title="t1"/>
<book title="t2"/>
</author>
</root>
The XPath processing for a document that uses the default namespace (no prefix) is the same as the XPath processing for a document that uses prefixes:
For namespace qualified documents you can use a NamespaceContext when you execute the XPath. You will need to prefix the fragments in the XPath to match the NamespaceContext. The prefixes you use do not need to match the prefixes used in the document.
http://download.oracle.com/javase/6/docs/api/javax/xml/namespace/NamespaceContext.html
Here is how it looks with your code:
import java.util.Iterator;
import javax.xml.namespace.NamespaceContext;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
xPath.setNamespaceContext(new MyNamespaceContext());
NodeList nl = (NodeList) xPath.evaluate("/ns:root/ns:author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
}
private static class MyNamespaceContext implements NamespaceContext {
public String getNamespaceURI(String prefix) {
if("ns".equals(prefix)) {
return "http://www.mydomain.com/schema";
}
return null;
}
public String getPrefix(String namespaceURI) {
return null;
}
public Iterator getPrefixes(String namespaceURI) {
return null;
}
}
}
Note:
I also used the corrected XPath suggested by Dennis.
The following also appears to work, and is closer to your original question:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class Demo {
public static void main(String[] args) {
DocumentBuilderFactory domFactory = DocumentBuilderFactory.newInstance();
try {
DocumentBuilder builder = domFactory.newDocumentBuilder();
Document dDoc = builder.parse("E:/test.xml");
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nl = (NodeList) xPath.evaluate("/root/author", dDoc, XPathConstants.NODESET);
System.out.println(nl.getLength());
} catch (Exception e) {
e.printStackTrace();
}
}
}
Blaise Doughan is right, attached code is correct.
Problem was somewhere elese. I was running all my tests through Application launcher in Eclipse IDE and nothing was working. Then I discovered Eclipse project was cause of all grief. I ran my class from command prompt, it worked. Created a new eclipse project and pasted same code there, it worked there too.
Thank you all guys for your time and efforts.
I've written a simple NamespaceContext implementation (here), that might be of help. It takes a Map<String, String> as input, where the key is a prefix, and the value is a namespace.
It follows the NamespaceContext spesification, and you can see how it works in the unit tests.
Map<String, String> mappings = new HashMap<>();
mappings.put("foo", "http://foo");
mappings.put("foo2", "http://foo");
mappings.put("bar", "http://bar");
context = new SimpleNamespaceContext(mappings);
context.getNamespaceURI("foo"); // "http://foo"
context.getPrefix("http://foo"); // "foo" or "foo2"
context.getPrefixes("http://foo"); // ["foo", "foo2"]
Note that it has a dependency on Google Guava
I try to do Y! Weather XML parsing for the sake of learning using BlueJ. When I compiled the class, BlueJ giving me error: package org.w3c does not exist
Here is the code:
import java.net.URL;
import java.io.InputStream;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.*;
public class WeatherApp
{
public static void main(String[] args)
{
URL xmlURL = new URL("http://weather.yahooapis.com/forecastrss?w=22722924&u=c");
InputStream xml = xmlURL.openStream();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xml);
xml.close();
System.out.println(doc);
}
}
What could go possibly wrong here? And how to fix it? Thanks
I'm having a problem correctly calling getAttributeNS() (and other NS methods) from Java DOM. First, here is my sample XML doc:
<?xml version="1.0" encoding="UTF-8"?>
<bookstore>
<book xmlns:c="http://www.w3schools.com/children/" xmlns:foo="http://foo.org/foo" category="CHILDREN">
<title foo:lang="en">Harry Potter</title>
<author>J K. Rowling</author>
<year>2005</year>
<price>29.99</price>
</book>
</bookstore>
And here is my little Java class that uses DOM and calls getAttributeNS:
package com.mycompany.proj;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Element;
import java.io.File;
public class AttributeNSProblem
{
public static void main(String[] args)
{
try
{
File fXmlFile = new File("bookstore_ns.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
System.out.println("Root element: " + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("title");
Element elem = (Element)nList.item(0);
String lang = elem.getAttributeNS("http://foo.org/foo", "lang");
System.out.println("title lang: " + lang);
lang = elem.getAttribute("foo:lang");
System.out.println("title lang: " + lang);
}
catch (Exception e)
{
e.printStackTrace();
}
}
}
When I call getAttributeNS("http://foo.org/foo", "lang"), it returns an empty String. I've also tried getAttributeNS("foo", "lang"), same result.
What's the proper way to retrieve the value of an attribute qualified by a namespace?
Thanks.
Immediately after DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();, add dbFactory.setNamespaceAware(true);
I got dtd in file and I cant remove it. When i try to parse it in Java I get "Caused by: java.net.SocketException: Network is unreachable: connect", because its remote dtd. can I disable somehow dtd checking?
You should be able to specify your own EntityResolver, or use specific features of your parser? See here for some approaches.
A more complete example:
<?xml version="1.0"?>
<!DOCTYPE foo PUBLIC "//FOO//" "foo.dtd">
<foo>
<bar>Value</bar>
</foo>
And xpath usage:
import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.xml.sax.EntityResolver;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
public class Main {
public static void main(String[] args) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
builder.setEntityResolver(new EntityResolver() {
#Override
public InputSource resolveEntity(String publicId, String systemId)
throws SAXException, IOException {
System.out.println("Ignoring " + publicId + ", " + systemId);
return new InputSource(new StringReader(""));
}
});
Document document = builder.parse(new File("src/foo.xml"));
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
String content = xpath.evaluate("/foo/bar/text()", document
.getDocumentElement());
System.out.println(content);
}
}
Hope this helps...
This worked for me:
SAXParserFactory saxfac = SAXParserFactory.newInstance();
saxfac.setValidating(false);
try {
saxfac.setFeature("http://xml.org/sax/features/validation", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
saxfac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
saxfac.setFeature("http://xml.org/sax/features/external-general-entities", false);
saxfac.setFeature("http://xml.org/sax/features/external-parameter-entities", false);
}
catch (Exception e1) {
e1.printStackTrace();
}
I had this problem before. I solved it by downloading and storing a local copy of the DTD and then validating against the local copy. You need to edit the XML file to point to the local copy.
<!DOCTYPE root-element SYSTEM "filename">
Little more info here: http://www.w3schools.com/dtd/dtd_intro.asp
I think you can also manually set some sort of validateOnParse property to "false" in your parser. Depends on what library you're using to parse the XML.
More info here: http://www.w3schools.com/dtd/dtd_validation.asp
I have a complete XML document in a string and would like a Document object. Google turns up all sorts of garbage. What is the simplest solution? (In Java 1.5)
Solution Thanks to Matt McMinn, I have settled on this implementation. It has the right level of input flexibility and exception granularity for me. (It's good to know if the error came from malformed XML - SAXException - or just bad IO - IOException.)
public static org.w3c.dom.Document loadXMLFrom(String xml)
throws org.xml.sax.SAXException, java.io.IOException {
return loadXMLFrom(new java.io.ByteArrayInputStream(xml.getBytes()));
}
public static org.w3c.dom.Document loadXMLFrom(java.io.InputStream is)
throws org.xml.sax.SAXException, java.io.IOException {
javax.xml.parsers.DocumentBuilderFactory factory =
javax.xml.parsers.DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
javax.xml.parsers.DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
}
catch (javax.xml.parsers.ParserConfigurationException ex) {
}
org.w3c.dom.Document doc = builder.parse(is);
is.close();
return doc;
}
Whoa there!
There's a potentially serious problem with this code, because it ignores the character encoding specified in the String (which is UTF-8 by default). When you call String.getBytes() the platform default encoding is used to encode Unicode characters to bytes. So, the parser may think it's getting UTF-8 data when in fact it's getting EBCDIC or something… not pretty!
Instead, use the parse method that takes an InputSource, which can be constructed with a Reader, like this:
import java.io.StringReader;
import org.xml.sax.InputSource;
…
return builder.parse(new InputSource(new StringReader(xml)));
It may not seem like a big deal, but ignorance of character encoding issues leads to insidious code rot akin to y2k.
This works for me in Java 1.5 - I stripped out specific exceptions for readability.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import java.io.ByteArrayInputStream;
public Document loadXMLFromString(String xml) throws Exception
{
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(xml.getBytes()));
}
Just had a similar problem, except i needed a NodeList and not a Document, here's what I came up with. It's mostly the same solution as before, augmented to get the root element down as a NodeList and using erickson's suggestion of using an InputSource instead for character encoding issues.
private String DOC_ROOT="root";
String xml=getXmlString();
Document xmlDoc=loadXMLFrom(xml);
Element template=xmlDoc.getDocumentElement();
NodeList nodes=xmlDoc.getElementsByTagName(DOC_ROOT);
public static Document loadXMLFrom(String xml) throws Exception {
InputSource is= new InputSource(new StringReader(xml));
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = null;
builder = factory.newDocumentBuilder();
Document doc = builder.parse(is);
return doc;
}
To manipulate XML in Java, I always tend to use the Transformer API:
import javax.xml.transform.Source;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.stream.StreamSource;
public static Document loadXMLFrom(String xml) throws TransformerException {
Source source = new StreamSource(new StringReader(xml));
DOMResult result = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(source , result);
return (Document) result.getNode();
}