Jumping between XML tags - java

This is a doubt in SAX.
I want to process the children tags in a XML file,only if it matches the parent tag.
For ex:
<version>
<parent tag-1>
<tag 1>
<tag 2>
</parent tag-1 >
<parent tag-2>
<tag 1>
<tag 2>
</parent tag-2>
</version>
In the above code, I want to match the parent tag first (i.e parent tag-1 or parent tag``-2,based on user input) and only then process the children tags under it.
Can this be done in SAX parser, keeping in mind that SAX has limited control over DOM and that I am a novice in both SAX and Java? If so, could you please quote the corresponding method?
TIA

Surely, it can be done easily by remembering the parent tag.
In general, when parsing xml tags, people use stack to keep track of the family map of those tags. Your case can be solved easily with the following code:
Stack<Tag> tagStack = new Stack<Tag>();
public void startElement(String uri, String localName, String qName,
Attributes attributes)
if(localName.toLowerCase().equals("parent")){
tagStack.push(new ParentTag());
}else if(localName.toLowerCase().equals("tag")){
if(tagStack.peek() instanceof ParentTag){
//do your things here only when the parent tag is "parent"
}
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException{
if(localName.toLowerCase().equals("parent")){
tagStack.pop();
}
}
Or you can simply remember you are in what tag by updating tagname:
String tagName = null;
public void startElement(String uri, String localName, String qName,
Attributes attributes)
if(localName.toLowerCase().equals("parent")){
tagName = "parent";
}else if(localName.toLowerCase().equals("tag")){
if(tagName!= null && tagName.equals("parent")){
//do your things here only when the parent tag is "parent"
}
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException{
tagName = null;
}
But I prefer the stack way, because it keeps track of all your ancestor tags.

SAX is going to spool through the entire document anyway, if you're looking at doing this for performance reasons.
However, from a code niceness perspective, you could have the SAX parser not return the non-matching children, by wiring it up with an XMLFilter. You'd probably still have to write the logic yourself - something like that provided in Wing C. Chen's post - but instead of putting it on your application logic you could abstract it out into a filter implementation.
This would let you reuse the filtering logic more easily, and it would probably make your application code cleaner and easier to follow.

The solution proposed by #Wing C. Chen is more than decent, but in your case, I wouldn't use a stack.
A use case for a stack when parsing XML
A common use case for a stack and XML is for example verifying that XML tags are balanced, when using your own lexer(i.e. hand made XML parser with error tolerance).
A concrete example of it would be building the outline of an XML document for the Eclipse IDE.
When to use SAX, Pull parsers and alike
Memory efficiency when parsing a huge XML file
You don't need to navigate back and forth in the document.
However Using SAX to parse complex documents can become tedious, especially if you want to apply operations to nodes based on some conditions.
When to use DOM like APis
You want easy access to the nodes
You want to navigate back and forth in the document at any time
Speed is not the main requirement vs development time/readability/maintenance
My recommendation
If you don't have a huge XML, use a DOM like API and select the nodes with XPath.
I prefer Dom4J personally, but I don't mind other APis such as JDom or even Xpp3 which has XPath support.

The SAX Parser will call a method in your implementation, every time it hits a tag. If you want different behavior depending on the parent, you have to save it to a variable.

If you want to jump to particular tags then you would need to use a DOM parser. This will read the entire document into memory and then provide various ways of accessing particular nodes of the tree, such as requesting a tag by name then asking for the children of that tag.
So if you are not restricted to SAX then I would recommend DOM. I think the main reason for using SAX over DOM is that DOM requires more memory since the entire document is loaded at once.

Related

Most efficient way to read and edit an xml file

I have an xml template file, some fields are blank and need to be filled by my application.
This has to result in an xml string representation of that file which will be given to another routine.
So, let's take this simple xml as example:
<root>
<name anAttr=""></name>
<age></age>
</root>
As you can see I'd have to read the xml and, in the parsing process, add some contents to it.
I though about using a sax parser and in the handler I would do something like this:
StringBuilder finalXml = new StringBuilder();
DefaultHandler handler = new DefaultHandler(){
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
finalXml.append("<"+qName+">");
if(qName.equals("name")){
finalXml.append("donald");
}
}
would it be correct/efficient this way? Or is there a better way?
I've used dom4j when i have wanted to parse xml in Java, and it's quite efficient.
If you have a choice of technology then I would suggest using JAXB .
It will unmarshal the XML into Java Object ,here do the modifications to java Object and then Marshal the modified Java Object into new XML File.
It has little bit of learning curve but code will be readable and maintainable.
for Basic tutorial of JAXB please refer to URL

How to ignore similar tag in XML SAX PARSING

I have XML like this one
<OuterTag>
<Name>JAVA
</Name>
<InnerTag>
<Name> PHP
</Name>
</InnerTag>
</OuterTag>
I just want That value which contains "Java". But when I parse it also brings "PHP" because the local names are the same. Is it possible to filter multiple LocalNames and select my desired one? How can I do that?
The idea is to save the state in which you are, just use a boolean value and set that to true if you find a open tag for 'OuterTag' and set it to false when you find a open tag for 'InnerTag'.
This way when you find the 'name' tag you now where you are in.
Another more flexible way is to push/pop the tag names when you find them. This way you can check who is your parent tag when you find a 'name' tag and then get the right value.
If I understnd correclty you want the Name tag under OuterTag and not those under InnerTag. So, this is how I would do it with dom4j:
SAXReader saxReader = new SAXReader();
saxReader.addHandler("OuterTag/Name", new ElementHandler() {
#Override
public void onStart(ElementPath arg0) {
// TODO Auto-generated method stub
}
#Override
public void onEnd(ElementPath arg0) {
// TODO Auto-generated method stub
}
});
File inputFile = new File(filename);
saxReader.read(inputFile);
I hope this helps.
Sax parsers typically have hooks where you can write code, specifically StartElement, EndElement and characters.
Moss has the right answer -
StartElement: Push the element name onto a stack.
characters: If the element name is 'name', and the stack has a "OuterTag" element on it,
then you found your value. Otherwise, ignore it.
EndElement Pop the element off the stack.
Note that sax parsers are very powerful but sometimes overkill. Very fast, good for parsing malformed xml, or very large XML files, reacting to elements as the parser encounters them.
I would carefully suggest 'considering' an XPath Solution, that does the parsing work for you, allowing you to easily reference any element. Create an Xpath Object and query it with something like '/OuterTag/Name[1]' If you've used jQuery before, you'll be right at home.
However, if your XML is malformed or really large and complicated, this can be very slow. You've been warned.
Just know that XPath is available as a possible solution. http://www.javabeat.net/tips/182-how-to-query-xml-using-xpath.html

Using XPath in XMLObject to query by namespace

I have a simple XML document
<abc:MyForm xmlns:abc='http://myform.com'>
<abc:Forms>
<def:Form1 xmlns:def='http://decform.com'>
....
</def:Form1>
<ghi:Form2 xmlns:ghi='http://ghiform.com'>
....
</ghi:Form2>
</abc:Forms>
</abc:MyForm>
I'm using XMLObjects from Apache and when I try to do the following xpath expression it works perfectly
object.selectPath("declare namespace abc='http://myform.com'
abc:Form/abc:Forms/*");
this gives me the 2 Form nodes (def and ghi). However I want to be able to query by specifying a namespace, so let's say I only want Form2. I've tried this and it fails
object.selectPath("declare namespace abc='http://myform.com'
abc:Form/abc:Forms/*
[namespace-uri() = 'http://ghiform.com']");
The selectPath returns 0 nodes. Does anyone know what is going on?
Update:
If I do the following in 2 steps, then I can get the result that I want.
XmlObject forms = object.selectPath("declare namespace abc='http://myform.com'
abc:Form/abc:Forms")[0];
forms.selectPath("*[namespace-uri() = 'http://ghiform.com']");
this gives me the ghi:Form node just like it should, I don't understand why it doesn't do it as a single XPath expression though.
Thanks
The simple answer is that you can't. The namespace prefix is just a shorthand for the namespace URI, which is all that matters.
For a namespace-aware parser, your two tags are identical.
If you really want to differentiate using the prefix (although you really, really shouldn't be doing it), you can use a non namespace-aware parser and just treat the prefix as if it was part of the element name.
But ideally you should read a tutorial on how namespaces work and try to use them as they were designed to be used.

How to change values of some elements and attributes in an XML file [Java]?

I'm reading an XML file with a SAX-parser (this part can be changed it there's a good reason for it).
When I find necessary properties I need to change their values and save the resulting XML-file as a new file.
How can I do that?
Afaik, SAX is parser only. You must choose a different library to write XML.
If you are only changing attributes or changing element names and NOT changing structure of XML, then this should be relatively easy task. Use STaX as a writer:
// Start STaX
OutputStream out = new FileOutputStream("data.xml");
XMLOutputFactory factory = XMLOutputFactory.newInstance();
XMLStreamWriter writer = factory.createXMLStreamWriter(out);
Now, extend the SAX DefaultHandler:
startDocument(){
writer.writeStartDocument("UTF-8", "1.0");
}
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) {
writer.writeStartElement(namespaceURI, localName);
for(int i=0; i<atts.getLength(); i++){
writer.writeAttribute(atts.getQName(i), atts.getValue(i));
}
}
public void endElement(String uri, localName, qName){
writer.writeEndElement();
}
If your document is relatively small, I'd recommend using JDOM. You can instantiate a SaxBuilder to create the Document from an InputStream, then use Xpath to find the node/attributes you want to change, make your modifications, and then use XmlOutputter to write the modified document back out.
On the other hand, if your document is too large to effectively hold in memory (or you'd prefer not to use a 3rd party library), you'll want to stick with your the SAX parser, streaming out the nodes to disk as you read them, making any changes on the way.
You may also want to take a look at XSLT.

How can I parse a namespace using the SAX parser?

Using a twitter search URL ie. http://search.twitter.com/search.rss?q=android returns CSS that has an item that looks like:
<item>
<title>#UberTwiter still waiting for #ubertwitter android app!!!</title>
<link>http://twitter.com/meals69/statuses/21158076391</link>
<description>still waiting for an app!!!</description>
<pubDate>Sat, 14 Aug 2010 15:33:44 +0000</pubDate>
<guid>http://twitter.com/meals69/statuses/21158076391</guid>
<author>Some Twitter User</author>
<media:content type="image/jpg" height="48" width="48" url="http://a1.twimg.com/profile_images/756343289/me2_normal.jpg"/>
<google:image_link>http://a1.twimg.com/profile_images/756343289/me2_normal.jpg</google:image_link>
<twitter:metadata>
<twitter:result_type>recent</twitter:result_type>
</twitter:metadata>
</item>
Pretty simple. My code parses out everything (title, link, description, pubDate, etc.) without any problems. However, I'm getting null on:
<google:image_link>
I'm using Java to parse the RSS feed. Do I have to handle compound localnames differently than I would a more simple localname?
This is the bit of code that parses out Link, Description, pubDate, etc:
#Override
public void endElement(String uri, String localName, String name)
throws SAXException {
super.endElement(uri, localName, name);
if (this.currentMessage != null){
if (localName.equalsIgnoreCase(TITLE)){
currentMessage.setTitle(builder.toString());
} else if (localName.equalsIgnoreCase(LINK)){
currentMessage.setLink(builder.toString());
} else if (localName.equalsIgnoreCase(DESCRIPTION)){
currentMessage.setDescription(builder.toString());
} else if (localName.equalsIgnoreCase(PUB_DATE)){
currentMessage.setDate(builder.toString());
} else if (localName.equalsIgnoreCase(GUID)){
currentMessage.setGuid(builder.toString());
} else if (uri.equalsIgnoreCase(AVATAR)){
currentMessage.setAvatar(builder.toString());
} else if (localName.equalsIgnoreCase(ITEM)){
messages.add(currentMessage);
}
builder.setLength(0);
}
}
startDocument looks like:
#Override
public void startDocument() throws SAXException {
super.startDocument();
messages = new ArrayList<Message>();
builder = new StringBuilder();
}
startElement looks like:
#Override
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
super.startElement(uri, localName, name, attributes);
if (localName.equalsIgnoreCase(ITEM)){
this.currentMessage = new Message();
}
}
Tony
An element like <google:image_link> has the local name image_link belonging to the google namespace. You need to ensure that the XML parsing framework is aware of namespaces, and you'd then need to find this element using the appropriate namespace.
For example, a few SAX1 interfaces in package org.xml.sax has been deprecated, replaced by SAX2 counterparts that include namespace support (e.g. SAX1 Parser is deprecated and replaced by SAX2 XMLReader). Consult the documentation on how to specify the namespace uri or qualified (prefixed) qName.
See also
Wikipedia/XML namespace
package org.xml.sax
saxproject.org - Namespaces
From sample it is not actually clear what namespace that 'google' prefix binds to -- previous answer is slightly incorrect in that it is NOT in "google" namespace; rather, it is a namespace that prefix "google" binds to. As such you have to match the namespace (identified by URI), and not prefix. SAX does have confusing way of reporting local name / namespace-prefix combinations, and it depends on whether namespace processing is even enabled.
You could also consider alternative XML processing libraries / APIs; while SAX implementations are performant, there are as fast and more convenient alternatives. Stax (javax.xml.stream.*) implementations like Woodstox (and even default one that JDK 1.6 comes with) are fast and bit more convenient. And StaxMate library that builds on top of Stax is much simpler to use for both reading and writing, and speedwise as fast as SAX implementations like Xerces. Plus Stax API has less baggage wrt namespace handling so it is easier to see what is the actual namespace of elements.
Like user polygenelubricants said: generally the parser needs to be namespace aware to handle elements which belong to some prefixed namespace. (Like that <google:image_link> element.)
This needs to be set as a "parser feature" which AFAIK can be done in few different ways: The XMLReader interface itself has method setFeature() that can be used to set features for a certain parser but you can also use same method for SAXParserFactory class so that this factory generates parsers with those features already on or off. SAX2 standard feature flags should be on SAXproject's website but at least some of them are also listed in Java API documentation of package org.xml.sax.
For simple documents you can try to take a shortcut. If you don't actually care about namespaces and element names as in a URL + local-name combination, and you can trust that the elements you are looking for (and only these) always have certain prefix and that there aren't elements from other namespaces with same local name then you might just solve your problem by using qname parameter of startElement() method instead of localName or vice versa or by adding/dropping the prefix from the tag name string you compare to.
The contents of parameters namespaceUri, qname or localName is according to Java specs actually optional and AFAIK they might be null for this reason. Which ones of them are null depends on what are those aforementioned "parser features" that affect namespaces. I don't know can the parameter that is null vary between elements in a namespace and elements without a namespace - I haven't investigated that behaviour.
PS. XML is case sensitive. So ideally you don't need to ignore case in tag name string comparison.-First post, yay!
Might help someone using the Android SAX util. I was trying geo:lat to get the lat element form the geo namepace.
Sample XML:
<item>
<title>My Item title</title>
<geo:lat>40.720741</geo:lat>
</item>
First attempt returned null:
item.getChild("geo:lat");
As suggested above, I found passing the namespace URI to the getChild method worked.
item.getChild("http://www.w3.org/2003/01/geo/wgs84_pos#", "lat");
Using startPrefixMapping method of my xml handler I was able to parse out text of a namespace.
I placed several calls to this method beneath my handler instantiation.
GoogleReader xmlhandler = new GoogleReader();
xmlhandler.startPrefixMapping("dc", "http://purl.org/dc/elements/1.1/");
where dc is the namespace <dc:author>some text</dc:author>

Categories