I'm using document builder and NodeList in Android Studio to parse an xml document. I previously found that the xml was incorrect and had un-escaped ampersands within the text. After taking care of this though and double check with w3 XML validator, I still get an unexpected token error:
e: "org.xml.sax.SAXParseException: Unexpected token (position:TEXT \n \n 601\n ...#5262:1 in java.io.StringReader#cd0db4a)"
However, when I open the xml and look at the line referred to, I don't see anything that would be considered troublesome:
... ...
5257 <WebSvcLocation>
5258 <Id>1521981</Id>
5259 <Name>Warehouse: Row 3</Name>
5260 <SiteName>Warehouse</SiteName>
5261 </WebSvcLocation>
5262 </ArrayOfWebSvcLocation>
I have checked the xml as well for non printing characters and I have not found any. Below is the code I have been using:
public List<Location> SpinnerXML(String xml){
List<Location> list = new ArrayList<Location>();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
InputSource is;
String s = xml.replaceAll("[&]"," and ");
try {
builder = factory.newDocumentBuilder();
is = new InputSource(new StringReader(s));
Document doc = builder.parse(is);
NodeList lt = doc.getElementsByTagName("WebSvcLocation");
int id;
String name,siteName;
for (int i = 0; i < lt.getLength(); i++) {
Element el = (Element) lt.item(i);
id = Integer.parseInt(getValue(el, "Id"));
name = getValue(el, "Name");
siteName = getValue(el, "SiteName");
list.add(new Location(id, name, siteName));
}
} catch (ParserConfigurationException e){
} catch (SAXException e){
e.printStackTrace();
} catch (IOException e){
}
return list;
}
The XML I have been trying to read is hosted here.
Thanks in advance for the help!
InputSource seems to do some guessing as to the encoding, so here's some things to try.
From here it says:
Android note: The Android platform default (encoding) is always UTF-8.
Referenced from here
Java stores strings as UTF-16 internally.
"Java stores strings as UTF-16 internally, but the encoding used
externally, the "system default encoding", varies.
(1) I would initially recommend:
is.setEncoding("UTF-8");
(2) But it should do no harm to replace this:
Document doc = builder.parse(is);
With this:
Document doc = builder.parse(new ByteArrayInputStream(s.getBytes()));
(3) OR try this:
String s1 = URLDecoder.decode(s, "UTF-8");
Document doc = builder.parse(new ByteArrayInputStream(s1.getBytes()));
NOTE:
if you try (2) or (3) comment OUT:
is = new InputSource(new StringReader(s));
As it may mess up String s.
Related
I'm trying to parse a xml string using domParser but when I trying to get the document it shows [#document: null] and it doesn't contain the data of xml passing.
The code is something like that:
Document doc = null;
DOMParser parser = new DOMParser();
logger.debug("Parsing");
InputSource IS = new InputSource(new StringReader(nameFile));
parser.parse(IS);
doc = parser.getDocument();
NodeList NL = doc.getElementsByTagName("element");
The problem starts when doc = parser.getDocument().
It returns [#document=null]. So the NodeList can't find the element that I'm looking for.
My XML is quite big. It contains around 50K character.
My question is, what are the possible issue that introducing this problem?
For your information, this application with the same code works in OAS with JDK1.4 now I'm transfering the application to Weblogic 12c with JDK 1.6.
Thanks in advance.
UPDATED:
Sorry for not mentioning nameFile data type. nameFile is a xml data in string format.
UPDATED2:
I've tried with a simple xml but no luck.
Example:
1st Example: this string is without any space ->
nameFile = "<?xml version='1.0'?><company><staff id='1001'><firstname>yong</firstname><lastname>mook kim</lastname><nickname>mkyong</nickname><salary>100000</salary></staff><staff id='2001'><firstname>low</firstname><lastname>yin fong</lastname><nickname>fong fong</nickname><salary>200000</salary></staff></company>";
2nd Example:
nameFile = "<message>Hello</message>
None of this is working. Always returns [#document:null]
I assume 'nameFile' in your code snippet is a string! The following works perfectly for me.
String nameFile= "<message>HELLO World</message>";
DOMParser parser = new DOMParser();
try {
parser.parse(new InputSource(new java.io.StringReader(nameFile)));
Document doc = parser.getDocument();
String message = doc.getDocumentElement().getTextContent();
System.out.println(message);
} catch (SAXException e) {
// handle SAXException
} catch (IOException e) {
// handle IOException
}
How convert String having contents in XML format into JDom document.
i am trying with below code:
String docString = txtEditor.getDocumentProvider().getDocument(
txtEditor.getEditorInput()).get();
SAXBuilder sb= new SAXBuilder();
doc = sb.build(new StringReader(docString));
Can any one help me to resolve above problem.
Thanks in advance!!
This is how you generally parse an xml to Document
try {
SAXBuilder builder = new SAXBuilder();
Document anotherDocument = builder.build(new File("/some/directory/sample.xml"));
} catch(JDOMException e) {
e.printStackTrace();
} catch(NullPointerException e) {
e.printStackTrace();
}
This is taken from JDOM IBM Reference
In case you have string you can convert it to InputStream and then pass it
String exampleXML = "<your-xml-string>";
InputStream stream = new ByteArrayInputStream(exampleXML.getBytes("UTF-8"));
Document anotherDocument = builder.build(stream);
For the various arguments builder.build() supports you can go through the api docs
This is a FAQ that shold have an answer more accessible than the actual FAQ: How do I build a document from a String?
So, I have created issue #111
For what it's worth, I have previously improved the error messages for this situation (see the previous issue #63 and now you should have an error that says:
MalformedURLException mx = new MalformedURLException(
"SAXBuilder.build(String) expects the String to be " +
"a systemID, but in this instance it appears to be " +
"actual XML data.");
Bottom line is that you should be using:
Document parseddoc = new SaxBuilder().build(new StringReader(myxmlstring));
rolfl
I use Wikimedia API Sandbox for Japanese.
Japanese Version
English Version
I send a HTTP request to Wikimedia and I get a result formed in XML.
When I try to send a request and get a result on API Sandbox Webpage, there is no character corruption in a result.
But when I get a result in Java, a result includes character corruptions.
I cannot assign a specific character code in XML file.
How can I assign a result a specific character code?
How can I resolve my problem?
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db
.parse(new URL(
"http://ja.wikipedia.org/w/api.php?action=query&prop=categories&format=xml&cllimit=10&titles="
+ key).openStream());
Element root = doc.getDocumentElement();
NodeList queryList = root.getChildNodes();
Node query = queryList.item(0);
if (query instanceof Element) {
Element queryEle = (Element) query;
NodeList pagesList = queryEle.getChildNodes();
Node pgs = pagesList.item(0);
if (pgs instanceof Element) {
Element pagesElement = (Element) pgs;
NodeList pageList = pagesElement.getChildNodes();
Node page = pageList.item(0);
if (page instanceof Element) {
Element pageElement = (Element) page;
String title = pageElement.getAttribute("title");
title = new String(title.getBytes("UTF-8"), "UTF-8");
}
}
}
} catch (ParserConfigurationException e) {
} catch (SAXException e) {
} catch (IOException e) {
}
Now I send a request, I got a result whose page title is "大学". But in Java, it shows "??".
I use above code for Android Application.
title = new String(title.getBytes("UTF-8"), "UTF-8"); can be left out.
It worked for me, for key=1 (receiving UTF-8). I have a UTF-8 Linux PC though. Maybe you did not output in a UTF-8 context or so. Try write the Document to a file.
You could do more inspection:
URLConnection connection = new URL("...").openConnection();
... connection.getContentEncoding();
... connection.getContentType();
InputStream in = connection.openStream();
I am starting an Android application that will parse XML from the web. I've created a few Android apps but they've never involved parsing XML and I was wondering if anyone had any tips on the best way to go about it?
Here's an example:
try {
URL url = new URL(/*your xml url*/);
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
NodeList nodes = doc.getElementsByTagName(/*tag from xml file*/);
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName(/*item within the tag*/);
Element line = (Element) title.item(0);
phoneNumberList.add(line.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
In my example, my XML file looks a little like:
<numbers>
<phone>
<string name = "phonenumber1">555-555-5555</string>
</phone>
<phone>
<string name = "phonenumber2">555-555-5555</string>
</phone>
</numbers>
and I would replace /*tag from xml file*/ with "phone" and /*item within the tag*/ with "string".
I always use the w3c dom classes. I have a static helper method that I use to parse the xml data as a string and returns to me a Document object. Where you get the xml data can vary (web, file, etc) but eventually you load it as a string.
something like this...
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
document = builder.parse(is);
}
catch (SAXException e) { }
catch (IOException e) { }
catch (ParserConfigurationException e) { }
There are different types of parsing mechanisms available, one is SAX Here is SAX parsing example, second is DOM parsing Here is DOM Parsing example.. From your question it is not clear what you want, but these may be good starting points.
There are three types of parsing I know: DOM, SAX and XMLPullParsing.
In my example here you need the URL and the parent node of the XML element.
try {
URL url = new URL("http://www.something.com/something.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("parent node here");
for (int i = 0; i < nodeList1.getLength(); i++) {
Node node = nodeList1.item(i);
}
} catch(Exception e) {
}
Also try this.
I would use the DOM parser, it is not as efficient as SAX, if the XML file is not too large, as it is easier in that case.
I have made just one android App, that involved XML parsing. XML received from a SOAP web service. I used XmlPullParser. The implementation from Xml.newPullParser() had a bug where calls to nextText() did not always advance to the END_TAG as the documentation promised. There is a work around for this.
I cannot make TagSoup work. I'm using the code that follows, but when I print the Node returned by the parser (the line with System.err.println(doc);) , I always get "[#document: null]".
I don't know how to find the bug in this code or, whichever it is, the origin of the problem. Please help!
public final Document parseDOM(final File fileToParse) {
Parser p = new Parser();
SAX2DOM sax2dom = null;
org.w3c.dom.Node doc = null;
try {
URL url = new URL("http://stackoverflow.com/");
p.setFeature(Parser.namespacesFeature, false);
p.setFeature(Parser.namespacePrefixesFeature, false);
sax2dom = new SAX2DOM();
p.setContentHandler(sax2dom);
p.parse(new InputSource(new InputStreamReader(url.openStream())));
doc = sax2dom.getDOM();
System.err.println(doc);
} catch (Exception e) {
// TODO handle exception
e.printStackTrace();
}
return doc.getOwnerDocument();
}
From the documentation on getOwnerDocument:
When this node is a Document or a DocumentType which is not used with any Document yet, this is null.
Since getDOM in your case should return a Document, you could simply cast the return value or change the type of doc to Document.
Your parser is working, but you just can't print out a node like that. The easiest way to print out a node and all its children is to use an XML Serializer like this:
Writer out = new StringWriter();
XMLSerializer serializer = new XMLSerializer(out, new OutputFormat());
serializer.serialize(doc);
System.out.println(out.toString());