Parsing xml file error - java

I'm trying to transform an XML file into a document like this:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse("C:/xml/41111208890622000144550010000000011000003066-nfe.xml");
Document document = db.parse(new InputSource(new StringReader("C:/xml/41111208890622000144550010000000011000003066-nfe.xml")));
but it is giving the error message:
Exception in thread "main" org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 1;
someone knows what to do?

You're currently creating a reader containing the string
"C:/xml/41111208890622000144550010000000011000003066-nfe.xml"
and asking the DocumentBuilder to parse that as if it were XML, when it's clearly not. (I'm referring to the second parse call, which I suspect is the one in your actual code. The code you've provided wouldn't compile as you've declared document twice.)
You can create a FileInputStream or perhaps an InputStreamReader wrapped around it:
String filename = "C:/xml/41111208890622000144550010000000011000003066-nfe.xml";
try (FileInputStream input = new FileInputStream(filename))
{
Document document = db.parse(new InputSource(input));
}
(I prefer to use a stream directly, and let the parser detect the encoding.)
Now this call:
Document document = db.parse("C:/xml/...");
would nearly work and may actually work, using DocumentBuilder.parse(String) - it depends on whether parse is happy to handle a filename as a URI. (I've seen some XML APIs that are fine with that, and some that aren't.) If it doesn't work, try using the file:// scheme:
Document document = db.parse("file://C:/xml/...");

Related

File-path from xPath-Object or Document-Object in Java

Is there a way to get the path of a XML-Document from a xPath- or Document-Object in the xPath-API ?
That´s how the Objects are initalized:
FileInputStream file = new FileInputStream(new File("C:\ExampleFile.xml"));
DocumentBuilderFactory builderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = builderFactory.newDocumentBuilder();
Document xmlDocument = builder.parse(file);
XPath xPath = XPathFactory.newInstance().newXPath();
So the question is:
Could the Objects xmlDocument or xpath somehow return "C:\ExampleFile.xml" ?
Using the Document object xmlDocument you can return the path of the file with:
xmlDocument.getDocumentURI();
Rather than creating a FileInputStream from the File and passing that to the parse method
FileInputStream file = new FileInputStream(new File("C:\\ExampleFile.xml"));
use the version of parse that takes a File directly.
File file = new File("C:\\ExampleFile.xml");
// rest of your code is unchanged - parse(file) is now the
// java.io.File version rather than the InputStream version
When you pass just a stream the parser has no way of knowing that that stream was created from a file, as far as the parser is concerned that could be a stream you received from a web server, or a ByteArrayInputStream, or some other non-file source. If you pass the File directly to parse then the parser will handle opening and closing the streams itself, and will be able to provide a meaningful URI to downstream components, and you'll get a sensible result from xmlDocument.getDocumentURI().
As an aside, if you want XPath to work reliably then you need to enable namespaces by calling builderFactory.setNamespaceAware(true) before you call newDocumentBuilder(). Even if your XML doesn't actually use any namespaces, you still need to parse with a namespace-aware DOM parser.

MalformedByteSequenceException when parsing XML from URL in Java

I'm trying to parse a XML with the following code:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("http://www.cinemark.com.br/mobile/xml/films/").openStream());
But get the following error:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:687)
at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:557)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1753)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.arrangeCapacity(XMLEntityScanner.java:1629)
at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipString(XMLEntityScanner.java:1667)
at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(XMLVersionDetector.java:196)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:812)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243)
at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:347)
at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
at Programacao.main(Programacao.java:53)
Accessing the url, you can see there are some portuguese characters, and seeing the response, I could see the first line of the xml file:
<?xml version="1.0" encoding="iso-8859-1"?>
So I tried doing this:
URL url = new URL("http://www.cinemark.com.br/mobile/xml/films/");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream ism = url.openStream();
InputSource is = new InputSource(ism);
is.setEncoding("iso-8859-1");
Document doc = db.parse(is.getByteStream());
But I still got the EXACT same error.
How can I parse the xml using a different encondig?
Also, how can I know if the xml is really in the encoding described in the file?
I'm using JDK 1.7.0_51 on Fedora Linux 20
Thanks
SOLUTION
What I did to solve the problem, based on Seelenvirtuose answer:
URL url = new URL("http://www.cinemark.com.br/mobile/xml/films/");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
InputStream ism = url.openStream();
GZIPInputStream gis = new GZIPInputStream(ism);
Reader decoder = new InputStreamReader(gis);
InputSource is = new InputSource(decoder);
Document doc = db.parse(is);
The difference in behavior is as following:
When accessing the URL in a browser, after some time it displays:
<?xml version="1.0" encoding="iso-8859-1"?>
<cinemark>
<films>
<film ...>...</film>
...
</films>
</cinemark>
However, when simply running curl (for example), then you get an output similar to:
‹ ¬YMsÛ6½ûW`xôT¨Oªc) [...]
So, what actually is happening? Easy: This is called HTTP compresson. So when running the following command
curl -o films.zip http://www.cinemark.com.br/mobile/xml/films/
you will get a file called films.zip that contains a single file called films, which in turn contains the expected XML document.
So, what you should do is: Take the output stream as a compressed stream, extract the content, and parse that.

Parse a single Line of XML into a HashMap

I'm building an android app that communicates to a web server and am struggling with the following scenario:
Given ONE line of XML in a String eg:
"<test one="1" two="2" />"
I would like to extract the values into a HashMap so that:
map.get("one") = "1"
map.get("two") = "2"
I already can do this with a full XML document using the SAX Parser, this complains when i try to just give it the above string with a MalformedUrlException: Protocol not found
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
Document doc = null;
builder = factory.newDocumentBuilder();
doc = builder.parse("<test one="1" two="2" />"); //here
I realize some regex could do this but Id really rather do it properly.
The same behaviour can be found at http://metacpan.org/pod/XML::Simple#XMLin which is what the web server uses.
Can anyone help? Thanks :D
DocumentBuilder.parse(String) treats the string as a URL. Try this instead:
Document doc = builder.parse(new InputSource(new StringReader(text)));
(where text contains the XML, of course).

Parsing xml response

I have a JAVA application where I am sending some xml requests and receiving xml responses. I first receive response in string and then write a file and storing this file into file system. Then while parsing the xml response file I am accessing this from file system and use some of the data for further business logic.
File file = new File("log\\XMLMessage\\LastXMLResponse.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(file);
Now I am thinking to distribute this JAVA application via Java Web Start (JWS) application and as I know I cannot keep this file into jar file since there will be modification in this file on regular basis.
What do you suggest me to do? Can I parse the String directly (no need to store the response into file)?
Document doc = db.parse(xmlMessage);
or where can I keep this file? I don't want to show this file to the user of my application.
Take the String, make a StringReader from String, make a InputSource from StringReader, then call parse on your DocumentBuilder.
Yes, you can parse the string directly,no need to store it in a file.
Try this:
String xml = "<xml></xml>";
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(xml.getBytes()));
System.out.println(doc);

org.w3c.dom.Document to String without javax.xml.transform

I've spent a while looking around on Google for a way to convert a org.w3c.dom.Document to a string representation of the whole DOM tree, so I can save the object to file system.
However all the solutions I've found use javax.xml.transform.Transformer which isn't supported as part of the Android 2.1 API. How can I do this without using this class/containing package?
Please try this code:
DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docFactory.newDocumentBuilder();
Document doc = docBuilder.parse("/path/to/file.xml");
DOMImplementation domImpl = ownerDocument.getImplementation();
DOMImplementationLS domImplLS = (DOMImplementationLS)domImpl.getFeature("LS", "3.0");
LSSerializer serializer = domImplLS.createLSSerializer();
serializer.getDomConfig().setParameter("xml-declaration", Boolean.valueOf(false));
LSOutput lsOutput = domImplLS.createLSOutput();
lsOutput.setCharacterStream(output);
serializer.write(doc, lsOutput);
To avoid using Transformer you should manually iterate over your xml tree, otherwise you can rely on some external libraries. You should take a look here.

Categories