SAX parsing problem in Android... empty elements? - java

I am using SAX to parse an XML file I'm pulling from the web. I've extended DefaultHandler with code similar to:
public class ArrivalHandler extends DefaultHandler {
#Override
public void startElement(String namespaceUri, String localName, String qualifiedName, Attributes attributes) throws SAXException {
if (qualifiedName.equalsIgnoreCase("resultSet")) {
System.out.println("got a resultset");
} else if (qualifiedName.equalsIgnoreCase("location")) {
System.out.println("got a location");
} else if (qualifiedName.equalsIgnoreCase("arrival")) {
System.out.println("got an arrival");
} else {
System.out.println("There was an unknown XML element encountered: '" + qualifiedName + "'");
}
}
#Override
public void endElement(String namespaceUri, String localName, String qualifiedName) throws SAXException {
// we'll just ignore this for now
}
#Override
public void characters(char[] chars, int startIndex, int length) throws SAXException {
// ignore this too
}
}
The problem I'm having is that I'm just getting a series of empty elements. The log reads:
There was an unknown XML element encountered: ''
There was an unknown XML element encountered: ''
There was an unknown XML element encountered: ''
etc
This worked fine when I was just passing parser.parse a local file, but now I'm pulling it from the web with:
HttpClient httpClient = new DefaultHttpClient();
resp = httpClient.execute("http://example.com/whatever");
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
ArrivalHandler handler = new ArrivalHandler();
SAXParser parser = saxFactory.newSAXParser();
parser.parse(resp.getEntity().getContent(), handler);
and I get the (apparently) empty results described above.
What I've looked into so far:
I converted the InputStream from resp.getEntity().getContent() to a string and dumped it out and it looks like I'm getting the XML from the server correctly.
There are no exceptions thrown but there is a warning that reads "W/ExpatReader(232): DTD handlers aren't supported.".
Any other ideas for what I'm doing incorrectly or how to debug this?

From the docs for ContentHandler.startElement:
the qualified name is required when
the namespace-prefixes property is
true, and is optional when the
namespace-prefixes property is false
(the default).
So, do you have the namespace-prefixes property set to true?
Can you just cope with the uri and localName instead?

Related

Create hierarchy in Sax Parser JAVA

I have done a sax parser that parses a xml file and prints the tags on the console.
The problem is that they don't follow a hierarchy.
Look at this:
-------------------<GOT>
-------------------<character>
-------------------<id>
-------------------<name>
----------------------->Arya Stark
-------------------<gender>
----------------------->Female
-------------------<culture>
----------------------->Northmen
-------------------<born>
----------------------->In 289 AC, at Winterfell
-------------------<died>
-------------------<alive>
----------------------->TRUE
-------------------<titles>
-------------------<title>
----------------------->Princess
For example, character and id are on the same level. Any idead on how to detect if a tag is a child of another?
Thanks!
public class Sax extends DefaultHandler {
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("-------------------<" + qName + ">");
}
public void characters(char ch[], int start, int length)
throws SAXException {
if( new String(ch,start,length).matches(".*[a-zA-Z0-9]+.*")){
System.out.println("----------------------->" + new String(ch, start, length));
} else {
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
System.out.println("</" + qName + ">");
}
}
This is the code of the sax parser, I need to know a way to detect if a tag has a child.
I am currently reading about sax parser, so if I find out I will post it!
package sax;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class ParseXMLFileSax {
private static final String xmlFilePath = "got.xml";
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(xmlFilePath, new Sax());
} catch (Exception e) {
e.printStackTrace();
}
}
}
This class does the parser and calls newSaxParser class.
SAX just a stream of events, so you should somehow maintain handler state to implement your desired logic. E.g. here there is a bunch of boolean flags
How can I parse nested elements in SAX Parser in java?
In your question is not clear what's exactly your goal.
If you just want to indent tags in output, you could have a integer variable for indentation, so you could increment it on element start and decrement it on element end.
Try to find some tutorial and follow it, e.g. here https://www.informit.com/articles/article.aspx?p=26351&seqNum=5

How do I Ignore Data Contained within XML Comments Using a JAXP SAX Parser?

I have a large XML file containing many key value pairs. The file contains both multi-line comments and actual data. Within the comments section, there are examples of how the data/key-value pairs should be arranged. The SAX parser that I have made successfully retrieves the keys and values from the file, but it also reads the example keys/values contained within the comments, which I do not want to happen. How can I make it so my SAX parser ignores everything within the comments section? I am not allowed to edit the file and I must use java.
Below is an example of the file that I am working with. Notice how there are data tags within the comment section. I do not want to read the sample data within these tags, but my parser records them anyways.
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.
Example:
... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data> **I DO NOT WANT TO READ THIS**
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>
-->
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="AmountUnits" xml:space="preserve">
<value>Amount/Units</value>
</data>
</root>
Here is the code that I am using:
public class xmlPropertiesBuilder extends DefaultHandler {
private boolean valueFound;
public void readXMLFile(File xmlFile) throws SAXException, IOException, ParserConfigurationException {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
parser.parse(xmlFile, this);
valueFound = false;
}
#Override
public void startDocument() throws SAXException {
System.out.println("Start Document");
}
#Override
public void endDocument() throws SAXException {
System.out.println("End Document");
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equals("data")){
System.out.println("Start Element: " + qName);
System.out.println("Key: " + attributes.getValue("name"));
} else if(qName.equals("value")){
valueFound = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("data")){
System.out.println("End Element: " + qName + "\n");
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if(valueFound){
System.out.println("Value: " + new String(ch, start, length));
valueFound = false;
}
}
}
Well it looks like the JAXP SAX parser does in fact ignore data contained within comments. I was just interpreting my tests wrong. In my example XML file, I did not include some tags, one of which is called <reshader>. These reshader tags also contain a <value> tag, which my parser was picking up (which I assumed was from the comments, but it turned out to be from the reshaders).
I was able to fix my problem by adding a boolean variable called 'dataFound' that would only be set to true when a tag was found. Then within my characters method, I simply changed the if condition from if(valueFound){...} to if(dataFound && valueFound){...}. Finally, within the endElement() method, I set the 'dataFound' variable to false whenever a </data> tag was found.

SAX - Read HTML content without CDATA

I´m using SAX parser in Java and it's mandatory. I need to parse an XML with HTML tags that I must read like content, and I can´t use CDATA because I can´t change the XML file. The XML file is something like that:
<start id="123">
<tag1>text1</tag1>
<tag2>
This is an example
<span>
text inside an HTML tag
</span>
<p>
ABCDEFG<b>HIJK</b>LMNOP
</p>
</tag2>
</start>
What I need is that when I get the content of tag2, the content must be:
This is an example
<span>text inside an HTML tag</span>
<p>ABCDEFG<b>HIJK</b>LMNOP</p>
This is a test that I did and the content doesn´t show the HTML tags:
boolean istag2 = false;
StringBuilder text = new StringBuilder();
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equals("tag2")) {
istag2 = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("tag2")) {
istag2 = false;
String fullText = text.toString();
System.out.println("tag2 full_text: " + fullText);
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (istag2) {
text.append(new String(ch, start, length));
}
}
Thanks in advance
OK, I think I might understand where your expectations are wrong. I think you might be expecting that the strings "<span>" and "<p>" are passed to your application by calls on the characters() method. But that's not what happens: they are passed by calls on startElement() and endElement(). If you want to build up a string containing these tags in lexical form, you will need to do something like:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equals("tag2")) {
inTag2 = true;
} else if (inTag2) {
text.append("<" + qName);
// TODO: serialize any attributes
text.append(">")
}
}

Why some characters are missing when i parse a xml tag using SaxParser?

I am parsing a xml response which has almost 90000 characters in my android application using SaxParser. xml looks like following:
<Registration>
<Client>
<Name>John</Name>
<ID>1</ID>
<Date>2013:08:22T03:43:44</Date>
</Client>
<Client>
<Name>James</Name>
<ID>2</ID>
<Date>2013:08:23T16:28:00</Date>
</Client>
<Client>
<Name>Eric</Name>
<ID>3</ID>
<Date>2013:08:23T19:04:15</Date>
</Client>
.....
</Registration>
sometimes parser misses some characters from Date tag. Instead of giving 2013:08:23T19:04:15 back it gives 2013:08:23T back. I tried to skip all white spaces from response xml string using following line of code:
responseStr = responseStr.replaceAll("\\s","");
But then i get following exception:
Parsing exception: org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 16: not well-formed (invalid token)
Following is the code i am using for parsing:
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
tagName = qName;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
}
public void characters(char ch[], int start, int length) throws SAXException {
if(tagName.equals("Name")){
obj = new RegisteredUser();
String str = new String(ch, start, length);
obj.setName(str);
}else if(tagName.equals("ID")){
String str = new String(ch, start, length);
obj.setId(str);
}else if(tagName.equals("Date")){
String str = new String(ch, start, length);
obj.setDate(str);
users.add(obj);
}
}
public void startDocument() throws SAXException {
System.out.println("document started");
}
public void endDocument() throws SAXException {
System.out.println("document ended");
}
};
saxParser.parse(new InputSource(new StringReader(resp)), handler);
}catch(Exception e){
System.out.println("Parsing exception: "+e);
System.out.println("exception");
}
Any idea why is parser skipping characters from a tag and how can i solve this problem. Thanks in advance.
It's possible that characters is called more than once for any given text node.
In that case you'll have to concatenate the result yourself!
The reason for this is when some internal buffer of the parser ends while there's still content of the text node. Instead of enlarging the buffer (which could require a lot of memory when the text node is large), it let's that be handled by the client code.
You want something like that:
StringBuilder textContent = new StringBuilder();
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
tagName = qName;
textContent.setLength(0);
}
public void characters(char ch[], int start, int length) throws SAXException {
textContent.append(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
String text = textContent.toString();
// handle text here
}
Of course this code can be improved to only track the text content for nodes you actually care about.
As other mentioned characters method may be called multiple times, its upto the SAX parsers implementation to return all contiguous character data in a single chunk, or they may split it into several chunks.
See the docs SAX Parser characters
You're incorrectly assuming that all the characters in a text node will be read at once and sent to the characters() method. It's not the case. The characters() method can be called multiple times for a single text node.
You should append all the chars to a StringBuilder and then only convert to a String or Date when endElement() is called.

xml parsing using SAXParser

I am working with one application in which SAXparsing is placed. To get the City & State name from latitude and longitude I'm using Google API. Google API url google api
I want to get long_name short_name & type of header Tag address_component .
All the information I am getting successfully from this XML but problem is that when I am trying to get type Tag value . There are Two type tag in this header and I am always getting second type tag value .
Sample XML:
<address_component>
<long_name>Gujarat</long_name>
<short_name>Gujarat</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
How can I get type Tag value is administrative_area_level_1 as well as political?
I came across the following link which is really easy to give a start-
http://javarevisited.blogspot.com/2011/12/parse-read-xml-file-java-sax-parser.html
I add the data into one file named as location.xml(if you get this from web do your own logic for getting data after getting that data convert into Inputstream pass it to following code) i wrote a method in that you can get it
public void ReadAndWriteXMLFileUsingSAXParser(){
try
{
DefaultHandler handler = new MyHandler();
// parseXmlFile("infilename.xml", handler, true);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
InputStream rStream = null;
rStream = getClass().getResourceAsStream("location.xml");
saxParser.parse(rStream, handler);
}catch (Exception e)
{
System.out.println(e.getMessage());
}
}
This is MyHandler class. your final data stored into one vector called as "data"
class MyHandler extends DefaultHandler {
String rootname;Attributes atr;
private boolean flag=false;private Vector data;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
rootname=localName;
atr=atts;
if(rootname.equalsIgnoreCase("address_component")){
data=new Vector();
flag=true;
}
}
public void characters(char[] ch, int start, int length){
String value=new String(ch,start,length);
if(flag)
{
if(rootname.equalsIgnoreCase("type")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("long_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("short_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
}
}
public void endElement(String uri, String localName, String qName){
rootname=localName;
if(rootname.equalsIgnoreCase("address_component")){
flag=false;
}
}
}
you can find all data into the data vector and also you can find the data onconsole
as
++++++++++++++Gujarat
++++++++++++++Gujarat
++++++++++++++administrative_area_level_1
++++++++++++++political
Read this tutorial. This will help you to parse xml file using sax parser.

Categories