xml parsing using SAXParser - java

I am working with one application in which SAXparsing is placed. To get the City & State name from latitude and longitude I'm using Google API. Google API url google api
I want to get long_name short_name & type of header Tag address_component .
All the information I am getting successfully from this XML but problem is that when I am trying to get type Tag value . There are Two type tag in this header and I am always getting second type tag value .
Sample XML:
<address_component>
<long_name>Gujarat</long_name>
<short_name>Gujarat</short_name>
<type>administrative_area_level_1</type>
<type>political</type>
</address_component>
How can I get type Tag value is administrative_area_level_1 as well as political?

I came across the following link which is really easy to give a start-
http://javarevisited.blogspot.com/2011/12/parse-read-xml-file-java-sax-parser.html

I add the data into one file named as location.xml(if you get this from web do your own logic for getting data after getting that data convert into Inputstream pass it to following code) i wrote a method in that you can get it
public void ReadAndWriteXMLFileUsingSAXParser(){
try
{
DefaultHandler handler = new MyHandler();
// parseXmlFile("infilename.xml", handler, true);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
InputStream rStream = null;
rStream = getClass().getResourceAsStream("location.xml");
saxParser.parse(rStream, handler);
}catch (Exception e)
{
System.out.println(e.getMessage());
}
}
This is MyHandler class. your final data stored into one vector called as "data"
class MyHandler extends DefaultHandler {
String rootname;Attributes atr;
private boolean flag=false;private Vector data;
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
rootname=localName;
atr=atts;
if(rootname.equalsIgnoreCase("address_component")){
data=new Vector();
flag=true;
}
}
public void characters(char[] ch, int start, int length){
String value=new String(ch,start,length);
if(flag)
{
if(rootname.equalsIgnoreCase("type")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("long_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
if(rootname.equalsIgnoreCase("short_name")){
data.addElement(value) ;
System.out.println("++++++++++++++"+value);
}
}
}
public void endElement(String uri, String localName, String qName){
rootname=localName;
if(rootname.equalsIgnoreCase("address_component")){
flag=false;
}
}
}
you can find all data into the data vector and also you can find the data onconsole
as
++++++++++++++Gujarat
++++++++++++++Gujarat
++++++++++++++administrative_area_level_1
++++++++++++++political

Read this tutorial. This will help you to parse xml file using sax parser.

Related

Create hierarchy in Sax Parser JAVA

I have done a sax parser that parses a xml file and prints the tags on the console.
The problem is that they don't follow a hierarchy.
Look at this:
-------------------<GOT>
-------------------<character>
-------------------<id>
-------------------<name>
----------------------->Arya Stark
-------------------<gender>
----------------------->Female
-------------------<culture>
----------------------->Northmen
-------------------<born>
----------------------->In 289 AC, at Winterfell
-------------------<died>
-------------------<alive>
----------------------->TRUE
-------------------<titles>
-------------------<title>
----------------------->Princess
For example, character and id are on the same level. Any idead on how to detect if a tag is a child of another?
Thanks!
public class Sax extends DefaultHandler {
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
System.out.println("-------------------<" + qName + ">");
}
public void characters(char ch[], int start, int length)
throws SAXException {
if( new String(ch,start,length).matches(".*[a-zA-Z0-9]+.*")){
System.out.println("----------------------->" + new String(ch, start, length));
} else {
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
System.out.println("</" + qName + ">");
}
}
This is the code of the sax parser, I need to know a way to detect if a tag has a child.
I am currently reading about sax parser, so if I find out I will post it!
package sax;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
public class ParseXMLFileSax {
private static final String xmlFilePath = "got.xml";
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse(xmlFilePath, new Sax());
} catch (Exception e) {
e.printStackTrace();
}
}
}
This class does the parser and calls newSaxParser class.
SAX just a stream of events, so you should somehow maintain handler state to implement your desired logic. E.g. here there is a bunch of boolean flags
How can I parse nested elements in SAX Parser in java?
In your question is not clear what's exactly your goal.
If you just want to indent tags in output, you could have a integer variable for indentation, so you could increment it on element start and decrement it on element end.
Try to find some tutorial and follow it, e.g. here https://www.informit.com/articles/article.aspx?p=26351&seqNum=5

How do I Ignore Data Contained within XML Comments Using a JAXP SAX Parser?

I have a large XML file containing many key value pairs. The file contains both multi-line comments and actual data. Within the comments section, there are examples of how the data/key-value pairs should be arranged. The SAX parser that I have made successfully retrieves the keys and values from the file, but it also reads the example keys/values contained within the comments, which I do not want to happen. How can I make it so my SAX parser ignores everything within the comments section? I am not allowed to edit the file and I must use java.
Below is an example of the file that I am working with. Notice how there are data tags within the comment section. I do not want to read the sample data within these tags, but my parser records them anyways.
<?xml version="1.0" encoding="utf-8"?>
<root>
<!--
Microsoft ResX Schema
Version 2.0
The primary goals of this format is to allow a simple XML format
that is mostly human readable. The generation and parsing of the
various data types are done through the TypeConverter classes
associated with the data types.
Example:
... ado.net/XML headers & schema ...
<resheader name="resmimetype">text/microsoft-resx</resheader>
<resheader name="version">2.0</resheader>
<resheader name="reader">System.Resources.ResXResourceReader, System.Windows.Forms, ...</resheader>
<resheader name="writer">System.Resources.ResXResourceWriter, System.Windows.Forms, ...</resheader>
<data name="Name1"><value>this is my long string</value><comment>this is a comment</comment></data>
<data name="Color1" type="System.Drawing.Color, System.Drawing">Blue</data> **I DO NOT WANT TO READ THIS**
<data name="Bitmap1" mimetype="application/x-microsoft.net.object.binary.base64">
<value>[base64 mime encoded serialized .NET Framework object]</value>
</data>
<data name="Icon1" type="System.Drawing.Icon, System.Drawing" mimetype="application/x-microsoft.net.object.bytearray.base64">
<value>[base64 mime encoded string representing a byte array form of the .NET Framework object]</value>
<comment>This is a comment</comment>
</data>
-->
<resheader name="reader">
<value>System.Resources.ResXResourceReader, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="AmountUnits" xml:space="preserve">
<value>Amount/Units</value>
</data>
</root>
Here is the code that I am using:
public class xmlPropertiesBuilder extends DefaultHandler {
private boolean valueFound;
public void readXMLFile(File xmlFile) throws SAXException, IOException, ParserConfigurationException {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
parser.parse(xmlFile, this);
valueFound = false;
}
#Override
public void startDocument() throws SAXException {
System.out.println("Start Document");
}
#Override
public void endDocument() throws SAXException {
System.out.println("End Document");
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if(qName.equals("data")){
System.out.println("Start Element: " + qName);
System.out.println("Key: " + attributes.getValue("name"));
} else if(qName.equals("value")){
valueFound = true;
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if(qName.equals("data")){
System.out.println("End Element: " + qName + "\n");
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
if(valueFound){
System.out.println("Value: " + new String(ch, start, length));
valueFound = false;
}
}
}
Well it looks like the JAXP SAX parser does in fact ignore data contained within comments. I was just interpreting my tests wrong. In my example XML file, I did not include some tags, one of which is called <reshader>. These reshader tags also contain a <value> tag, which my parser was picking up (which I assumed was from the comments, but it turned out to be from the reshaders).
I was able to fix my problem by adding a boolean variable called 'dataFound' that would only be set to true when a tag was found. Then within my characters method, I simply changed the if condition from if(valueFound){...} to if(dataFound && valueFound){...}. Finally, within the endElement() method, I set the 'dataFound' variable to false whenever a </data> tag was found.

How to improve performance of querying xml file with VTD-XML and XPath?

I am querying XML files with size of around 1 MB(20k+ lines). I am using XPath to describe what I want to get and VTD-XML library to get it. I think that I have some problems with performance.
The problem is, I am making about 5k+ queries to XML file. It takes approximately 16-17 seconds to retrieve all values. I want to ask you, if this is normal performance for such task? How I can improve it?
I am using VTD-XML library with AutoPilot navigation approach which give me opportunity to use XPath. Implementation is as following:
private VTDGen vg = new VTDGen();
private VTDNav vn;
private AutoPilot ap = new AutoPilot();
public void init(String xml) {
log.info("Creating document");
xml = xml.replace("<?xml version=\"1.0\"?>", "<?xml version=\"1.0\" encoding=\"UTF-8\"?>");
byte[] bytes = xml.getBytes(StandardCharsets.UTF_8);
vg.setDoc(bytes);
try {
vg.parse(true);
vn = vg.getNav();
} catch (ParseException e) {
e.printStackTrace();
}
log.info("Document created");
}
public String parseXmlOrReturnNull(String query) {
String xPathStringVal = null;
try {
ap.selectXPath(query);
ap.bind(vn);
int i = -1;
while ((i = ap.evalXPath()) != -1) {
xPathStringVal = vn.getXPathStringVal();
}
}catch (XPathEvalException e) {
e.printStackTrace();
} catch (NavException e) {
e.printStackTrace();
} catch (XPathParseException e) {
e.printStackTrace();
}
return xPathStringVal;
}
My xml files have specific format, they are divided into lot of parts - segments, and my queries are same for all segments(I am querying it in a loop). For example part of xml:
<segment>
<a>
<b>value1</b>
<c>
<d>value2</d>
<e>value3</d>
</c>
</a>
</segment>
<segment>
<a>
<b>value4</b>
<c>
<d>value5</d>
<e>value6</d>
<f>value6</d>
</c>
</a>
</segment>
...
If I want to get value1 in first segment I am using query:
//segment[1]/a/b
for value 4 in second segment
//segment[2]/a/b
etc.
Intuition says a few things: in my approach every query is independent (it doesn't know anything about other query), it means that AutoPilot, my iterator, always starts at the beginning of the file when I want to query it.
My question is: Is there any way to set AutoPilot at the beginning of processing segment? And when I finish querying move AutoPilot to next segment? I think that if my method will start searching value not from the beginning but from specifying point It will be much faster.
Another way is to divide xml file into small xml files (one xml file = one segment) and querying those small xml files.
What do you think guys? Thanks in advance
Minor: The replace is not needed as UTF-8 is the default encoding; only when there is an encoding, one would need to patch it to UTF-8.
The XPath should only done once, to not start from [0] to the next index.
If you need a List representation you could use JAXB with annotations.
An event based primitive parsing without DOM object probably is best (SAXParser).
Handler handler = new org.xml.sax.helpers.DefaultHandler {
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes) throws SAXException {
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
}
#Override
public void characters(char ch[], int start, int length) throws SAXException {
}
};
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
InputStream in = new ByteArrayInputStream(bytes);
parser.parse(in, handler);

Get schema location from XML file (noNamespaceSchemaLocation)

We are parsing an XML file with the SAX parser. Is it possible to get the schema location from the XML?
<view id="..." title="..."
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="{schema}">
I want to retrieve the {schema} value from the XML. Is this possible? And how to I access this value of noNamespaceSchemaLocation? I'm using the default SAX Parser.
#Override
public void startElement(String uri, String localName,
String name, Attributes attributes)
{ .... }
Thank you.
It all depends with what kind of tool/library you are working (a basic SAXParser? Xerces? JDom? ...) But what you want is the value of the attribute "noNamespaceSchemaLocation" in the namspace defined by the URI "http://www.w3.org/2001/XMLSchema-instance"
in JDom, it would be something like:
Element view = ...; // get the view element
String value = view.getAttributeValue("noNamespaceSchemaLocation", Namespace.getNamespace("http://www.w3.org/2001/XMLSchema-instance"));
Here is how I get the XSD's name using XMLStreamReader:
public static String extractXsdValueOrNull(#NonNull final InputStream xmlInput)
{
final XMLInputFactory f = XMLInputFactory.newInstance();
try
{
final XMLStreamReader r = f.createXMLStreamReader(xmlInput);
while (r.hasNext())
{
final int eventType = r.next();
if (XMLStreamReader.START_ELEMENT == eventType)
{
for (int i = 0; i <= r.getAttributeCount(); i++)
{
final boolean foundSchemaNameSpace = XMLConstants.W3C_XML_SCHEMA_INSTANCE_NS_URI.equals(r.getAttributeNamespace(i));
final boolean foundLocationAttributeName = SCHEMA_LOCATION.equals(r.getAttributeLocalName(i));
if (foundSchemaNameSpace && foundLocationAttributeName)
{
return r.getAttributeValue(i);
}
}
return null; // only checked the first element
}
}
return null;
}
catch (final XMLStreamException e)
{
throw new RuntimeException(e);
}
}
Actually XMLStreamReader does all the magic, namely:
only parses the XML's beginning (not the whole XML)
does not assume a particular namespace alias (i.e. xsi)

SAX parsing problem in Android... empty elements?

I am using SAX to parse an XML file I'm pulling from the web. I've extended DefaultHandler with code similar to:
public class ArrivalHandler extends DefaultHandler {
#Override
public void startElement(String namespaceUri, String localName, String qualifiedName, Attributes attributes) throws SAXException {
if (qualifiedName.equalsIgnoreCase("resultSet")) {
System.out.println("got a resultset");
} else if (qualifiedName.equalsIgnoreCase("location")) {
System.out.println("got a location");
} else if (qualifiedName.equalsIgnoreCase("arrival")) {
System.out.println("got an arrival");
} else {
System.out.println("There was an unknown XML element encountered: '" + qualifiedName + "'");
}
}
#Override
public void endElement(String namespaceUri, String localName, String qualifiedName) throws SAXException {
// we'll just ignore this for now
}
#Override
public void characters(char[] chars, int startIndex, int length) throws SAXException {
// ignore this too
}
}
The problem I'm having is that I'm just getting a series of empty elements. The log reads:
There was an unknown XML element encountered: ''
There was an unknown XML element encountered: ''
There was an unknown XML element encountered: ''
etc
This worked fine when I was just passing parser.parse a local file, but now I'm pulling it from the web with:
HttpClient httpClient = new DefaultHttpClient();
resp = httpClient.execute("http://example.com/whatever");
SAXParserFactory saxFactory = SAXParserFactory.newInstance();
ArrivalHandler handler = new ArrivalHandler();
SAXParser parser = saxFactory.newSAXParser();
parser.parse(resp.getEntity().getContent(), handler);
and I get the (apparently) empty results described above.
What I've looked into so far:
I converted the InputStream from resp.getEntity().getContent() to a string and dumped it out and it looks like I'm getting the XML from the server correctly.
There are no exceptions thrown but there is a warning that reads "W/ExpatReader(232): DTD handlers aren't supported.".
Any other ideas for what I'm doing incorrectly or how to debug this?
From the docs for ContentHandler.startElement:
the qualified name is required when
the namespace-prefixes property is
true, and is optional when the
namespace-prefixes property is false
(the default).
So, do you have the namespace-prefixes property set to true?
Can you just cope with the uri and localName instead?

Categories