Getting SAX Parser attributes - java

<Details><propname key="workorderid">799</propname>
How do i get 799 from workorderid useing SAXParing?
when i use this code i get "workorderid" but not the value of workorderid
if(localName.equals("propname")){
String workid = attributes.getValue("key");

if(localName.equals("propname")){
//set one flag here and in endElement() get the value associated with your localname(propname)
String workid = attributes.getValue("key");
I am providing you the code try to understand and customize in your way.
public class ExampleHandler extends DefaultHandler {
private String item;
private boolean inItem = false;
private StringBuilder content;
public ExampleHandler() {
items = new Items();
content = new StringBuilder();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
content = new StringBuilder();
if(localName.equalsIgnoreCase("propname")) {
inItem = true;
} else attributes.getValue("key");
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(localName.equalsIgnoreCase("propname")) {
if(inItem) {
item = (content.toString());
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
content.append(ch, start, length);
}
public void endDocument() throws SAXException {
// you can do something here for example send
// the Channel object somewhere or whatever.
}
}
May somewhere wrong i'm in hurry. If helps Appreciate.

The following will hold the value of the node.
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}
In the event handler method, you need to get it like this:
if(qName.equals("propname")) {
System.out.println(" node value " + tempVal); // node value
String attr = attributes.getValue("key") ; // will return attribute value for the propname node.
}

In propname the attribute Key having value workorderid which is correct.
You need to get the value propname.
//Provide you tagname which is propname
NodeList nl = ele.getElementsByTagName(tagName);
if(nl != null && nl.getLength() > 0) {
Element el = (Element)nl.item(0);
textVal = el.getFirstChild().getNodeValue();
}

Related

sax parsing - mapping nested tags into main tag

I want to use sax parser for a large xml file. Handler looks like this:
DefaultHandler handler = new DefaultHandler() {
String temp;
HashSet < String > xml_Elements = new LinkedHashSet < String > ();
HashMap < String, Boolean > xml_Tags = new LinkedHashMap < String, Boolean > ();
HashMap < String, ArrayList < String >> tags_Value = new LinkedHashMap < String, ArrayList < String >> ();
//###startElement#######
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
xml_Elements.add(qName);
for (String tag: xml_Elements) {
if (qName == tag) {
xml_Tags.put(qName, true);
}
}
}
//###########characters###########
public void characters(char ch[], int start, int length) throws SAXException {
temp = new String(ch, start, length);
}
//###########endElement############
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (xml_Tags.get(qName) == true) {
if (tags_Value.containsKey(qName)) {
tags_Value.get(qName).add(temp);
tags_Value.put(qName, tags_Value.get(qName));
} else {
ArrayList < String > tempList = new ArrayList < String > ();
tempList.add(temp);
//tags_Value.put(qName, new ArrayList<String>());
tags_Value.put(qName, tempList);
}
//documentWriter.write(qName+":"+temp+"\t");
for (String a: tags_Value.keySet()) {
try {
documentWriter.write(tags_Value.get(a) + "\t");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
xml_Tags.put(qName, false);
}
tags_Value.clear();
}
};
My xml is like :
<TermInfo>
<A>1/f noise</A>
<B>Random noise</B>
<C>Accepted</C>
<D>Flicker noise</D>
<F>Pink noise</F>
<I>1-f</I>
<I>1/f</I>
<I>1/f noise</I>
<I>1:f</I>
<I>flicker noise</I>
<I>noise</I>
<I>pink noise</I>
<ID>1</ID>
</TermInfo>
<TermInfo>
<A>3D printing</A>
<B>Materials fabrication</B>
<C>Accepted</C>
<D>3d printing</D>
<F>2</F>
<I>three dimension*</I>
<I>three-dimension*</I>
<I>3d</I>
<I>3-d</I>
<I>3d*</I>
</TermInfo>
I wanted to cluster all nested tags under Tag A.
ie for each A.. its B,C,D and I together.. etc. But using the above handler the output is like A-B-C-D-I-I-etc . Can I make one object for each A and add other elements into it. How can I include this..
I think this is along the lines of what you are asking for. It creates a List of HashMap objects. Every time it starts a TermInfo, it creates a new HashMap. Each endElement inside TermInfo puts a value into the Map. When endElement is TermInfo, it sets fieldMap to null so no intermediate tags are added. "TermInfo" represents A from your description.
public class TestHandler extends DefaultHandler
{
Map<String, String> fieldMap = null;
List<Map<String, String>> tags_Value = new ArrayList<Map<String, String>>();
String temp;
// ###startElement#######
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException
{
if (localName.equals("TermInfo")) // A
{
fieldMap = new HashMap<String, String>();
tags_Value.add(fieldMap);
}
}
// ###########characters###########
public void characters(char ch[], int start, int length)
throws SAXException
{
temp = new String(ch, start, length);
}
// ###########endElement############
public void endElement(String uri, String localName, String qName)
throws SAXException
{
if (fieldMap != null)
{
if (!localName.equals("TermInfo")) // A
{
fieldMap.put(localName, temp);
}
else
{
//END of TermInfo
fieldMap = null;
}
}
}

Java SAX is not parsing properly

I would appreciate any help on this.
This is my first handler I wrote.
I got I REST Webservice returning XML of links. It has quite simple structure and is not deep.
I wrote a handler for this:
public class SAXHandlerLnk extends DefaultHandler {
public List<Link> lnkList = new ArrayList();
Link lnk = null;
private StringBuilder content = new StringBuilder();
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("link")) {
lnk = new Link();
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("link")) {
lnkList.add(lnk);
}
else if (qName.equals("applicationCode")) {
lnk.applicationCode = content.toString();
}
else if (qName.equals("moduleCode")) {
lnk.moduleCode = content.toString();
}
else if (qName.equals("linkCode")) {
lnk.linkCode = content.toString();
}
else if (qName.equals("languageCode")) {
lnk.languageCode = content.toString();
}
else if (qName.equals("value")) {
lnk.value = content.toString();
}
else if (qName.equals("illustrationUrl")) {
lnk.illustrationUrl = content.toString();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
content.append(ch, start, length);
}
}
Some XML returned can be empty eg. or . When this happens my handler unfortunatelly adds previous value to the Object lnk. So when is empty in XML, I got lnk.illustrationUrl = content; equal to lnk.value.
Link{applicationCode='onedownload', moduleCode='onedownload',...}
In the above example, I would like moduleCode to be empty or null, because in XML it is an empty tag.
Here is the calling class:
public class XMLRepositoryRestLinksFilterSAXParser {
public static void main(String[] args) throws Exception {
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandlerLnk handler = new SAXHandlerLnk();
parser.parse({URL}, handler);
for ( Link lnk : handler.lnkList){
System.out.println(lnk);
}
}
}
Like stated in my comment, you'd do the following. The callbacks are usually called in startElement, characters, (nested?), characters, endElement order, where (nested?) represents an optional repeat of the entire sequence.
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
content = null;
if (qName.equals("link")) {
lnk = new Link();
}
}
Note that characters may be called multiple times per a single XML element in your document, so your current code might fail to capture all content. You'd be better off using a StringBuilder instead of a String object to hold your character content and append to it. See this answer for an example.

How to get content of <tagname> that contains other embedded XML tag in Java?

I have an XML document that has HTML tags included:
<chapter>
<h1>title of content</h1>
<p> my paragraph ... </p>
</chapter>
I need to get the content of <chapter> tag and my output will be:
<h1>title of content</h1>
<p> my paragraph ... </p>
My question is similar to this post: How parse XML to get one tag and save another tag inside
But I need to implement it in Java using SAX or DOM or ...?
I found a soluton using SAX in this post: SAX Parser : Retrieving HTML tags from XML but it's very buggy and doesn't work with large amounts of XML data.
Updated:
My SAX implementation:
In some situation it throw exception: java.lang.StringIndexOutOfBoundsException: String index out of range: -4029
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private char[] temp;
String insideTag;
private int startPosition;
private int endPosition;
private String tag;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
tagFlag = true;
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
insideTag = new String(temp, startPosition, endPosition - startPosition);
tagFlag = false;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
temp = ch;
if (tagFlag) {
startPosition = start;
tagFlag = false;
}
endPosition = start + length;
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
Update 2: (Using StringBuilder)
I have accumulated characters by StringBuilder in this way:
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private char[] temp;
String insideTag;
private String tag;
private StringBuilder builder;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
builder = new StringBuilder();
tagFlag = true;
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
insideTag = builder.toString();
tagFlag = false;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (tagFlag) {
builder.append(ch, start, length);
}
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
But builder.append(ch, start, length); doesn't append Start tag like<EmbeddedTag atr="..."> and </EmbeddedTag> in the Buffer. This Code print Output:
title of content
my paragraph ...
Instead of expected output:
<h1>title of content</h1>
<p> my paragraph ... </p>
Update 3:
Finally I have implemented the parser handler:
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private String insideTag;
private String tag;
private StringBuilder builder;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
builder = new StringBuilder();
tagFlag = true;
}
if (tagFlag) {
builder.append("<" + qName);
for (int i = 0; i < attributes.getLength(); i++) {
builder.append(" " + attributes.getLocalName(i) + "=\"" +
attributes.getValue(i) + "\"");
}
builder.append(">");
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (tagFlag) {
builder.append("</" + qName + ">");
}
if (qName.equalsIgnoreCase(tag)) {
insideTag = builder.toString();
tagFlag = false;
}
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
temp = ch;
if (tagFlag) {
builder.append(ch, start, length);
}
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
The problem with your code is that you try to remember the start and end positions of the string passed to you via the characters method. What you see in the exception thrown is the result of an inside tag that starts near the end of a character buffer and ends near the beginning of the next character buffer.
With sax you need to copy the characters when they are offered or the temporary buffer they occupy might be cleared when you need them.
Your best bet is not to remember the positions in the buffers, but to create a new StringBuilder in startElement and add the characters to that, then get the complete string out the builder in endElement.
Try to use Digester, I've used it years ago, version 1.5 and it were simply to create mapping for xml like you. Just simple article how to use Digester, but it is for version 1.5 and currently there is 3.0 I think last version contains a lot of new features ...

Reading nested tags with sax parser

i am trying to read a xml file with following tag, but the sax parser is unable to read nested tags like
<active-prod-ownership>
<ActiveProdOwnership>
<Product code="3N3" component="TRI_SCORE" orderNumber="1-77305469" />
</ActiveProdOwnership>
</active-prod-ownership>
here is the code i am using
public class LoginConsumerResponseParser extends DefaultHandler {
// ===========================================================
// Fields
// ===========================================================
static String str="default";
private boolean in_errorCode=false;
private boolean in_Ack=false;
private boolean in_activeProdOwnership= false;
private boolean in_consumerId= false;
private boolean in_consumerAccToken=false;
public void startDocument() throws SAXException {
Log.e("i am ","in start document");
}
public void endDocument() throws SAXException {
// Nothing to do
Log.e("doc read", " ends here");
}
/** Gets be called on opening tags like:
* <tag>
* Can provide attribute(s), when xml was like:
* <tag attribute="attributeValue">*/
public void startElement(String namespaceURI, String localName,
String qName, Attributes atts) throws SAXException {
if(localName.equals("ack")){
in_Ack=true;
}
if(localName.equals("error-code")){
in_errorCode=true;
}
if(localName.equals("active-prod-ownership")){
Log.e("in", "active product ownership");
in_activeProdOwnership=true;
}
if(localName.equals("consumer-id")){
in_consumerId= true;
}
if(localName.equals("consumer-access-token"))
{
in_consumerAccToken= true;
}
}
/** Gets be called on closing tags like:
* </tag> */
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {
if(localName.equals("ack")){
in_Ack=false;
}
if(localName.equals("error-code")){
in_errorCode=false;
}
if(localName.equals("active-prod-ownership")){
in_activeProdOwnership=false;
}
if(localName.equals("consumer-id")){
in_consumerId= false;
}
if(localName.equals("consumer-access-token"))
{
in_consumerAccToken= false;
}
}
/** Gets be called on the following structure:
* <tag>characters</tag> */
public void characters(char ch[], int start, int length) {
if(in_Ack){
str= new String(ch,start,length);
}
if(str.equalsIgnoreCase("success")){
if(in_consumerId){
}
if(in_consumerAccToken){
}
if(in_activeProdOwnership){
str= new String(ch,start,length);
Log.e("active prod",str);
}
}
}
}
but on reaching the tag in_activeProdOwnersip read only "<" as the contents of the tag
please help i need to the whole data to be read
The tags in your XML file and parser does not match. I think you are mixing-up tags with attribute names. Here is the code that correctly parses your sample XML:
public class LoginConsumerResponseParser extends DefaultHandler {
public void startDocument() throws SAXException {
System.out.println("startDocument()");
}
public void endDocument() throws SAXException {
System.out.println("endDocument()");
}
public void startElement(String namespaceURI, String localName,
String qName, Attributes attrs)
throws SAXException {
if (qName.equals("ActiveProdOwnership")) {
inActiveProdOwnership = true;
} else if (qName.equals("Product")) {
if (!inActiveProdOwnership) {
throw new SAXException("Product tag not expected here.");
}
int length = attrs.getLength();
for (int i=0; i<length; i++) {
String name = attrs.getQName(i);
System.out.print(name + ": ");
String value = attrs.getValue(i);
System.out.println(value);
}
}
}
public void endElement(String namespaceURI, String localName, String qName)
throws SAXException {
if (localName.equals("ActiveProdOwnership"))
inActiveProdOwnership = false;
}
public void characters(char ch[], int start, int length) {
}
public static void main(String args[]) throws Exception {
String xmlFile = args[0];
File file = new File(xmlFile);
if (file.exists()) {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = factory.newSAXParser();
DefaultHandler handler = new Test();
parser.parse(xmlFile, handler);
}
else {
System.out.println("File not found!");
}
}
private boolean inActiveProdOwnership = false;
}
A sample run will produce the following output:
startDocument()
code: 3N3
component: TRI_SCORE
orderNumber: 1-77305469
endDocument()
I suspect this is what's going wrong:
new String(ch,start,length);
Here, you're passing a char[] to the String constructor, but the constructor is supposed to take a byte[]. The end result is you get a mangled String.
I suggest instead that you make the str field a StringBuilder, not a String, and then use this:
builder.append(ch,start,length);
You then need to clear the StringBuilder each time startElement() is called.

Problem parsing XML document with Java SAX

I am parsing an XML document. I have done this thousands of times before, but I can't see why I am getting the following issue:
Here is the relevant part of the XML document that I am parsing:
XML: <?xml version="1.0" standalone="yes"?>
<ratings>
<url_template>http://api.netflix.com/users/T1BlCJtdcWMuF6gJEfue96_W.kZ_gW81h59KqLEfT1AzE-/ratings/title?{-join|&|title_refs}</url_template>
<ratings_item>
<user_rating value="not_interested"></user_rating>
<predicted_rating>4.8</predicted_rating>
<id>http://api.netflix.com/users/T1BlCJtdcWMuF6gJEfue96_W.kZ_gW81h59KqLEfT1AzE-/ratings/title/70112530</id>
<link href="http://api.netflix.com/catalog/titles/series/70112530/seasons/70112530" rel="http://schemas.netflix.com/catalog/title" title="Castle: Season 1">
</link>
.
.
.
So, I am trying to pase out the user_rating, the predicted_rating, and the id. I am doing this successfully. However, I am noticing that when user_rating contains no value, then the predicted_rating will automatically take the value of , rather than it's own value of 4.8. When user_rating does have value, however, then the predicted_rating will have the correct value. Here is my parsing code:
public class RatingsHandler extends DefaultHandler {
Vector vector;
Ratings ratings;
boolean inUserRating;
boolean inPredictedRating;
boolean inAverageRating;
boolean inID;
public void startDocument() throws SAXException {
vector = new Vector();
ratings = new Ratings();
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (localName.equals("user_rating")) {
inUserRating = true;
} else if (localName.equals("predicted_rating")) {
inPredictedRating = true;
} else if (localName.equals("average_rating")) {
inAverageRating = true;
} else if (localName.equals("id")) {
inID = true;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (inUserRating) {
ratings.setUserRating(new String(ch, start, length));
inUserRating = false;
} else if (inPredictedRating) {
ratings.setPredRating(new String(ch, start, length));
inPredictedRating = false;
} else if (inAverageRating) {
ratings.setAvgRating(new String(ch, start, length));
inAverageRating = false;
} else if (inID) {
Const.rating_id = new String(ch, start, length);
inID = false;
}
}
public void endDocument() throws SAXException {
if (ratings != null) {
vector.addElement(ratings);
}
}
public Vector getRatings() {
return vector;
}
}
Does it have something to do with the fact that user_rating has an attribute "value"? I would appreciate any help. Thanks!
I would suggest you to wait for the
endElement(String uri, String localName, String qName)
before you mark the element as passed by:
inSomething = false
I can imagine that when the element is empty, the
public void characters(char[] ch, int start, int length)
won't be called, your flag won't be cleared and you will run into inconsitent state having two inSomething flags set to true.

Categories