Parsing xml in Android without xml declaration using SAX - java

Here's the XML I'm trying to parse: http://realtime.catabus.com/InfoPoint/rest/routes/get/51
#Override
protected Void doInBackground(String... Url) {
try {
URL url = new URL(Url[0]);
DocumentBuilderFactory dbf = DocumentBuilderFactory
.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
// Download the XML file
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
// Locate the Tag Name
nodelist = doc.getElementsByTagName("VehicleLocation");
} catch (Exception e) {
Log.e("Error", e.getMessage());
e.printStackTrace();
}
return null;
}
During runtime, when it reaches this line: DocumentBuilder db = dbf.newDocumentBuilder(); I get the following error:
Unexpected token (position:TEXT {"RouteId":51,"R...#1:1298 in java.io.InputStreamReader#850b9be)
It seems to have something to do with the encoding. My guess is that it's because the XML doesn't sepcify the encoding, but maybe not.
Is there a way to specify the encoding in the code (I can't change the XML itself)?
Thanks!
EDIT: This seems to only happen when parsing the XML from the url. Storing the file locally seems to work fine.

Is there a way to specify the encoding in the code (I can't change the
XML itself)?
You can call InputSource.setEncoding() to set the encoding.
I would suggest to take a look at XmlPullParser instead for parsing XML in Android.

Related

Unexpected token (<) using Document builder in Android Studio

I'm using document builder and NodeList in Android Studio to parse an xml document. I previously found that the xml was incorrect and had un-escaped ampersands within the text. After taking care of this though and double check with w3 XML validator, I still get an unexpected token error:
e: "org.xml.sax.SAXParseException: Unexpected token (position:TEXT \n \n 601\n ...#5262:1 in java.io.StringReader#cd0db4a)"
However, when I open the xml and look at the line referred to, I don't see anything that would be considered troublesome:
... ...
5257 <WebSvcLocation>
5258 <Id>1521981</Id>
5259 <Name>Warehouse: Row 3</Name>
5260 <SiteName>Warehouse</SiteName>
5261 </WebSvcLocation>
5262 </ArrayOfWebSvcLocation>
I have checked the xml as well for non printing characters and I have not found any. Below is the code I have been using:
public List<Location> SpinnerXML(String xml){
List<Location> list = new ArrayList<Location>();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
InputSource is;
String s = xml.replaceAll("[&]"," and ");
try {
builder = factory.newDocumentBuilder();
is = new InputSource(new StringReader(s));
Document doc = builder.parse(is);
NodeList lt = doc.getElementsByTagName("WebSvcLocation");
int id;
String name,siteName;
for (int i = 0; i < lt.getLength(); i++) {
Element el = (Element) lt.item(i);
id = Integer.parseInt(getValue(el, "Id"));
name = getValue(el, "Name");
siteName = getValue(el, "SiteName");
list.add(new Location(id, name, siteName));
}
} catch (ParserConfigurationException e){
} catch (SAXException e){
e.printStackTrace();
} catch (IOException e){
}
return list;
}
The XML I have been trying to read is hosted here.
Thanks in advance for the help!
InputSource seems to do some guessing as to the encoding, so here's some things to try.
From here it says:
Android note: The Android platform default (encoding) is always UTF-8.
Referenced from here
Java stores strings as UTF-16 internally.
"Java stores strings as UTF-16 internally, but the encoding used
externally, the "system default encoding", varies.
(1) I would initially recommend:
is.setEncoding("UTF-8");
(2) But it should do no harm to replace this:
Document doc = builder.parse(is);
With this:
Document doc = builder.parse(new ByteArrayInputStream(s.getBytes()));
(3) OR try this:
String s1 = URLDecoder.decode(s, "UTF-8");
Document doc = builder.parse(new ByteArrayInputStream(s1.getBytes()));
NOTE:
if you try (2) or (3) comment OUT:
is = new InputSource(new StringReader(s));
As it may mess up String s.

DomParser from string XML getting null document

I'm trying to parse a xml string using domParser but when I trying to get the document it shows [#document: null] and it doesn't contain the data of xml passing.
The code is something like that:
Document doc = null;
DOMParser parser = new DOMParser();
logger.debug("Parsing");
InputSource IS = new InputSource(new StringReader(nameFile));
parser.parse(IS);
doc = parser.getDocument();
NodeList NL = doc.getElementsByTagName("element");
The problem starts when doc = parser.getDocument().
It returns [#document=null]. So the NodeList can't find the element that I'm looking for.
My XML is quite big. It contains around 50K character.
My question is, what are the possible issue that introducing this problem?
For your information, this application with the same code works in OAS with JDK1.4 now I'm transfering the application to Weblogic 12c with JDK 1.6.
Thanks in advance.
UPDATED:
Sorry for not mentioning nameFile data type. nameFile is a xml data in string format.
UPDATED2:
I've tried with a simple xml but no luck.
Example:
1st Example: this string is without any space ->
nameFile = "<?xml version='1.0'?><company><staff id='1001'><firstname>yong</firstname><lastname>mook kim</lastname><nickname>mkyong</nickname><salary>100000</salary></staff><staff id='2001'><firstname>low</firstname><lastname>yin fong</lastname><nickname>fong fong</nickname><salary>200000</salary></staff></company>";
2nd Example:
nameFile = "<message>Hello</message>
None of this is working. Always returns [#document:null]
I assume 'nameFile' in your code snippet is a string! The following works perfectly for me.
String nameFile= "<message>HELLO World</message>";
DOMParser parser = new DOMParser();
try {
parser.parse(new InputSource(new java.io.StringReader(nameFile)));
Document doc = parser.getDocument();
String message = doc.getDocumentElement().getTextContent();
System.out.println(message);
} catch (SAXException e) {
// handle SAXException
} catch (IOException e) {
// handle IOException
}

Access xml from ISBNDB

I have a java client and I try to get the XML DOM object from ISBNDB by passing the ISBN number of book. But I get null as the result. But I am able to get the xml when I hit the URL from my browser. Am I doing it in the right way? or Am I completely on a wrong track?
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
try {
System.setProperty("http.proxyHost", "myproxy");
System.setProperty("http.proxyPort", "80");
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new URL("http://isbndb.com/api/books.xml?access_key=MYKEY&index1=isbn&value1="+isbnValue).openStream());
System.out.println(doc.getTextContent());
} catch (Exception e) {
e.printStackTrace();
}

Parsing XML from website to an Android device

I am starting an Android application that will parse XML from the web. I've created a few Android apps but they've never involved parsing XML and I was wondering if anyone had any tips on the best way to go about it?
Here's an example:
try {
URL url = new URL(/*your xml url*/);
URLConnection conn = url.openConnection();
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(conn.getInputStream());
NodeList nodes = doc.getElementsByTagName(/*tag from xml file*/);
for (int i = 0; i < nodes.getLength(); i++) {
Element element = (Element) nodes.item(i);
NodeList title = element.getElementsByTagName(/*item within the tag*/);
Element line = (Element) title.item(0);
phoneNumberList.add(line.getTextContent());
}
}
catch (Exception e) {
e.printStackTrace();
}
In my example, my XML file looks a little like:
<numbers>
<phone>
<string name = "phonenumber1">555-555-5555</string>
</phone>
<phone>
<string name = "phonenumber2">555-555-5555</string>
</phone>
</numbers>
and I would replace /*tag from xml file*/ with "phone" and /*item within the tag*/ with "string".
I always use the w3c dom classes. I have a static helper method that I use to parse the xml data as a string and returns to me a Document object. Where you get the xml data can vary (web, file, etc) but eventually you load it as a string.
something like this...
Document document = null;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder;
try
{
builder = factory.newDocumentBuilder();
InputSource is = new InputSource(new StringReader(data));
document = builder.parse(is);
}
catch (SAXException e) { }
catch (IOException e) { }
catch (ParserConfigurationException e) { }
There are different types of parsing mechanisms available, one is SAX Here is SAX parsing example, second is DOM parsing Here is DOM Parsing example.. From your question it is not clear what you want, but these may be good starting points.
There are three types of parsing I know: DOM, SAX and XMLPullParsing.
In my example here you need the URL and the parent node of the XML element.
try {
URL url = new URL("http://www.something.com/something.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new InputSource(url.openStream()));
doc.getDocumentElement().normalize();
NodeList nodeList1 = doc.getElementsByTagName("parent node here");
for (int i = 0; i < nodeList1.getLength(); i++) {
Node node = nodeList1.item(i);
}
} catch(Exception e) {
}
Also try this.
I would use the DOM parser, it is not as efficient as SAX, if the XML file is not too large, as it is easier in that case.
I have made just one android App, that involved XML parsing. XML received from a SOAP web service. I used XmlPullParser. The implementation from Xml.newPullParser() had a bug where calls to nextText() did not always advance to the END_TAG as the documentation promised. There is a work around for this.

Cannot run Xpath queries from JAVA in XML files having <DOCTYPE> tag

I have made the following method which runs hard-coded xPath queries in a hard-coded XML file. The method works perfect with one exception. Some xml files contains the following tag
<!DOCTYPE WorkFlowDefinition SYSTEM "wfdef4.dtd">
When i try to run a query in that file i get the following exception:
java.io.FileNotFoundException:
C:\ProgramFiles\code\other\xPath\wfdef4.dtd(The system cannot find the file specified).
The question is : What can i do to instruct my program not to take under consideration this DTD file?
I have also noted that the path C:\ProgramFiles\code\other\xPath\wfdef4.dtd is the one i run my application from and not the one that the actual xml file is located.
Thank you in advace
Here is my method:
public String evaluate(String expression,File file){
XPathFactory factory = XPathFactory.newInstance();
xPath = XPathFactory.newInstance().newXPath();
StringBuffer strBuffer = new StringBuffer();
try{
InputSource inputSource = new InputSource(new FileInputStream(file));
//evaluates the expression
NodeList nodeList = (NodeList)xPath.evaluate(expression,
inputSource,XPathConstants.NODESET);
//does other stuff, irrelevant with my question.
for (int i = 0 ; i <nodeList.getLength(); i++){
strBuffer.append(nodeList.item(i).getTextContent());
}
}catch (Exception e) {
e.printStackTrace();
}
return strBuffer.toString();
}
And the answer is :
xPath = XPathFactory.newInstance().newXPath();
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
//add this line to ignore dth DTD
dbf.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

Categories