Parsing Mixed-Content XML with SAX

Parsing Mixed-Content XML with SAX - java

I have a sample mixed-content XML document (structure cannot be modified):
<items>
<item> ABC123 <status>UPDATE</status>
<units>
<unit Description="Each ">EA <saleprice>2.99</saleprice>
<saleprice2/>
</unit>
</units>
<warehouses>
<warehouse>100<availability>2987.000</availability>
</warehouse>
</warehouses>
</item>
</items>
I am attempting to use SAX parser on this XML document, but the mixed-content elements are causing some issues. Namely, I get an empty String returned when attempting to handle the <item/> node.
My handler:
#Override
public void startElement(final String uri,
final String localName, final String qName, final Attributes attributes) throws SAXException {
final String fixedQName = qName.toLowerCase();
switch (fixedQName) {
case "item":
prod = new Product();
//prod.setItem(content); <-- doesn't work, content is empty since element just started
break;
}
}
#Override
public void endElement(final String uri, final String localName, final String qName) throws SAXException {
final String fixedQName = qName.toLowerCase();
switch (fixedQName) {
case "item":
prod.setItem(content); // <-- doesn't work either, only returns an empty string
// end element, set item
productList.add(prod);
break;
case "status":
prod.setStatus(content);
break;
// ... etc....
}
}
#Override
public void characters(final char[] ch, final int start, final int length) throws SAXException {
content = "";
content = String.copyValueOf(ch, start, length).trim();
}
This handler works correctly for everything of interest, except the <item/> element. It always returns an empty string.
If I add a println() to the characters() method to print out the content, I can see the parser eventually does print the contents of <item/>, however it is after it is expected (on the next additional characters() method invocation by the parser)
Referencing http://docs.oracle.com/javase/tutorial/jaxp/sax/parsing.html, I know I should attempt to aggregate the strings returned from characters(), however I don't see how this can be since I do need to retrieve the other element's data, and hard-coding an exception for the first element into the characters() method seems like the wrong approach.
Howe can I use SAX to retrieve the mixed-content <item/>'s data 'ABC123'?

If the item content is only made of the text before the opening tag of the status element then you could get the item content in startElement:
public void startElement(final String uri,
final String localName, final String qName, final Attributes attributes) throws SAXException {
final String fixedQName = qName.toLowerCase();
switch (fixedQName) {
case "item":
prod = new Product();
break;
case "status":
prod.setItem(content);
break;
}
}
To understand consider the flow of events:
startElement item
characters "ABC123"
startElement status
characters "UPDATE"
endElement status
characters ""
endElement item

Related

Is it possible to merge XML-Elements with SAX (coremedia CAE filter)

Given is:
a XML structure like
<span class="abbreviation">AGB<span class"explanation">Allgemeine Geschäftsbedingungen</span></span>
and the result after the transformation should be:
<abbr title="Allgemeine Geschäftsbedingungen">AGB</abbr>
I know that SAX is an event-based XML-parser, and with methods like
#startElement(...)
#endElement(...)
I can capture events (like open-a-tag, close-a-tag) and with
#characters
I can extract the text between the tags.
My Question is:
Can i create a transformation mentioned above (is it possible)?
My Problem is:
I can extract the abbreviation text and the explanation text
I can call #startElement on the last span-Tag
but i can't create the content of the tag (in this case the text 'ABG')

The answer is yes it's possible!
The main argument/hint you can get from this StackOverflow-link
here is what has to be done:
you have to remember the states, at which span-tag the sax parser is located ("class=abbreviation" or "class=explanation")
you have to extract the content of the tags (this can be done with the #character method)
When you know the state of the sax parser and the content, you can create a new abbr-tag
all other tags, have to accede without any modification
For completeness here is the source code of the coremedia cae filter:
import com.coremedia.blueprint.cae.richtext.filter.FilterFactory;
import com.coremedia.xml.Filter;
import org.apache.commons.lang3.StringUtils;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.AttributesImpl;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
public class GlossaryFilter extends Filter implements FilterFactory {
private static final String SPAN = "span";
private static final String CLASS = "class";
private boolean isAbbreviation = false;
private boolean isExplanation = false;
private String abbreviation;
private String currentUri;
private boolean spanExplanationClose = false;
private boolean spanAbbreviationClose = false;
#Override
public Filter getInstance(final HttpServletRequest request, final HttpServletResponse response) {
return new GlossaryFilter();
}
#Override
public void startElement(final String uri, final String localName, final String qName,
final Attributes attributes) throws SAXException {
if (isSpanAbbreviationTag(qName, attributes)) {
isAbbreviation = true;
} else if (isSpanExplanationTag(qName, attributes)) {
isExplanation = true;
currentUri = uri;
} else {
super.startElement(uri, localName, qName, attributes);
}
}
private boolean isSpanExplanationTag(final String qName, final Attributes attributes) {
//noinspection OverlyComplexBooleanExpression
return StringUtils.isNotEmpty(qName) && qName.equalsIgnoreCase(SPAN) && (
attributes.getLength() > 0) && attributes.getValue(CLASS).equals("explanation");
}
private boolean isSpanAbbreviationTag(final String qName, final Attributes attributes) {
//noinspection OverlyComplexBooleanExpression
return StringUtils.isNotEmpty(qName) && qName.equalsIgnoreCase(SPAN) && (
attributes.getLength() > 0) && attributes.getValue(CLASS).equals("abbreviation");
}
#Override
public void endElement(final String uri, final String localName, final String qName)
throws SAXException {
if (spanExplanationClose) {
spanExplanationClose = false;
} else if (spanAbbreviationClose) {
spanAbbreviationClose = false;
} else {
super.endElement(uri, localName, qName);
}
}
#Override
public void characters(final char[] ch, final int start, final int length) throws SAXException {
if (isAbbreviation && isExplanation) {
final String explanation = new String(ch, start, length);
final AttributesImpl newAttributes = createAttributes(explanation);
writeAbbrTag(newAttributes);
changeState();
} else if (isAbbreviation && !isExplanation) {
abbreviation = new String(ch, start, length);
} else {
super.characters(ch, start, length);
}
}
private void changeState() {
isExplanation = false;
isAbbreviation = false;
spanExplanationClose = true;
spanAbbreviationClose = true;
}
#SuppressWarnings("TypeMayBeWeakened")
private void writeAbbrTag(final AttributesImpl newAttributes) throws SAXException {
super.startElement(currentUri, "abbr", "abbr", newAttributes);
super.characters(abbreviation.toCharArray(), 0, abbreviation.length());
super.endElement(currentUri, "abbr", "abbr");
}
private AttributesImpl createAttributes(final String explanation) {
final AttributesImpl newAttributes = new AttributesImpl();
newAttributes.addAttribute(currentUri, "title", "abbr:title", "CDATA", explanation);
return newAttributes;
}
}
The interesting stuff is in the methods:
startElement(...)
endElement(...)
characters(...)
startElement(...)
Here you store the state at which tag the sax-parser is located (more detailed: you store the state, which span-tag (the "class=abbreviation" or "class=explanation") was opened.
isAbbreviation for an opened span-tag with "class=abbreviation"
isExplanation for an opened span-tag with "class=explanation"
You only store states. The mentioned span-tags will not be processed/filtered (the result is, they would be removed). Every other tag is processed with no filtering, they will be applied without modification (that's the else-block).
endElement(...)
Here you want only process every tag except (the mentioned span-tags). All these tags are applied without modification (the else-block). If the sax parser is located at a closed span-tag (with "class=abbreviation" or "class=explanation") you want to do nothing (except store the state)
characters(...)
In this method the magic (creating a tag with the parser) happens. Depending on the state:
Sax parser is located at a span-tag with "class=explanation" (this means there was an open span-tag with "class=abbreviation" passed before) --> branch (isAbbreviation && isExplanation)
Sax parser is located at the first span-tag (the span-tag with "class=abbreviation") --> branch (isAbbreviation && !isExplanation)
every other character you find in any other tag --> branch else
for state 3.
simply copy the text you find
for state 2.
extract the content of the span-tag with "class=abbreviation" for later use
for state 3.
extract the content of the span-tag with "class=explanation"
create the attributes for the abbr-tag (title=....)
write the new abbr-tag (instead of the two span-tags)
set the state

SAX - Read HTML content without CDATA

I´m using SAX parser in Java and it's mandatory. I need to parse an XML with HTML tags that I must read like content, and I can´t use CDATA because I can´t change the XML file. The XML file is something like that:
<start id="123">
<tag1>text1</tag1>
<tag2>
This is an example
<span>
text inside an HTML tag
</span>
<p>
ABCDEFG<b>HIJK</b>LMNOP
</p>
</tag2>
</start>
What I need is that when I get the content of tag2, the content must be:
This is an example
<span>text inside an HTML tag</span>
<p>ABCDEFG<b>HIJK</b>LMNOP</p>
This is a test that I did and the content doesn´t show the HTML tags:
boolean istag2 = false;
StringBuilder text = new StringBuilder();
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equals("tag2")) {
istag2 = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("tag2")) {
istag2 = false;
String fullText = text.toString();
System.out.println("tag2 full_text: " + fullText);
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (istag2) {
text.append(new String(ch, start, length));
}
}
Thanks in advance

OK, I think I might understand where your expectations are wrong. I think you might be expecting that the strings "<span>" and "<p>" are passed to your application by calls on the characters() method. But that's not what happens: they are passed by calls on startElement() and endElement(). If you want to build up a string containing these tags in lexical form, you will need to do something like:
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equals("tag2")) {
inTag2 = true;
} else if (inTag2) {
text.append("<" + qName);
// TODO: serialize any attributes
text.append(">")
}
}

Why some characters are missing when i parse a xml tag using SaxParser?

I am parsing a xml response which has almost 90000 characters in my android application using SaxParser. xml looks like following:
<Registration>
<Client>
<Name>John</Name>
<ID>1</ID>
<Date>2013:08:22T03:43:44</Date>
</Client>
<Client>
<Name>James</Name>
<ID>2</ID>
<Date>2013:08:23T16:28:00</Date>
</Client>
<Client>
<Name>Eric</Name>
<ID>3</ID>
<Date>2013:08:23T19:04:15</Date>
</Client>
.....
</Registration>
sometimes parser misses some characters from Date tag. Instead of giving 2013:08:23T19:04:15 back it gives 2013:08:23T back. I tried to skip all white spaces from response xml string using following line of code:
responseStr = responseStr.replaceAll("\\s","");
But then i get following exception:
Parsing exception: org.apache.harmony.xml.ExpatParser$ParseException: At line 1, column 16: not well-formed (invalid token)
Following is the code i am using for parsing:
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
tagName = qName;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
}
public void characters(char ch[], int start, int length) throws SAXException {
if(tagName.equals("Name")){
obj = new RegisteredUser();
String str = new String(ch, start, length);
obj.setName(str);
}else if(tagName.equals("ID")){
String str = new String(ch, start, length);
obj.setId(str);
}else if(tagName.equals("Date")){
String str = new String(ch, start, length);
obj.setDate(str);
users.add(obj);
}
}
public void startDocument() throws SAXException {
System.out.println("document started");
}
public void endDocument() throws SAXException {
System.out.println("document ended");
}
};
saxParser.parse(new InputSource(new StringReader(resp)), handler);
}catch(Exception e){
System.out.println("Parsing exception: "+e);
System.out.println("exception");
}
Any idea why is parser skipping characters from a tag and how can i solve this problem. Thanks in advance.

It's possible that characters is called more than once for any given text node.
In that case you'll have to concatenate the result yourself!
The reason for this is when some internal buffer of the parser ends while there's still content of the text node. Instead of enlarging the buffer (which could require a lot of memory when the text node is large), it let's that be handled by the client code.
You want something like that:
StringBuilder textContent = new StringBuilder();
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
tagName = qName;
textContent.setLength(0);
}
public void characters(char ch[], int start, int length) throws SAXException {
textContent.append(ch, start, length);
}
public void endElement(String uri, String localName, String qName) throws SAXException {
String text = textContent.toString();
// handle text here
}
Of course this code can be improved to only track the text content for nodes you actually care about.

As other mentioned characters method may be called multiple times, its upto the SAX parsers implementation to return all contiguous character data in a single chunk, or they may split it into several chunks.
See the docs SAX Parser characters

You're incorrectly assuming that all the characters in a text node will be read at once and sent to the characters() method. It's not the case. The characters() method can be called multiple times for a single text node.
You should append all the chars to a StringBuilder and then only convert to a String or Date when endElement() is called.

Is there a way to use the Visitor pattern using a SAX Parser?

I'm curious about this: if I need to use a Sax parser to boost up efficiency (it's a big file). Usually I use something like this:
public class Example extends DefaultHandler
{
private Stack stack = new Stack ();
public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException
{
stack.push (qName);
}
public void endElement (String uri, String local, String qName) throws SAXException
{
if ("line".equals (qName))
System.out.println ();
stack.pop ();
}
public void characters (char buf [], int offset, int length) throws SAXException
{
if (!"line".equals (stack.peek ()))
return;
System.out.write (new String (buf, offset, length));
}
}
example taken from here.
The Sax is already an implementation of a Visitor Pattern but in my case I just need to take the content of every element and do something with it according to the nature of the element itself.
My typical XML file is something like:
<?xml version="1.0" encoding="utf-8"?>
<labs xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<auth>
<uid> </uid>
<gid> </gid>
<key> </key>
</auth>
<campaign>
<sms>
<newsletter>206</newsletter>
<message>
<from>Da Definire</from>
<subject>Da definire</subject>
<body><![CDATA[Testo Da Definire]]></body>
</message>
<delivery method="manual"></delivery>
<recipients>
<db>276</db>
<filter>
<test>1538</test>
</filter>
<new_recipients>
<csv_file>Corso2012_SMS.csv</csv_file>
</new_recipients>
</recipients>
</sms>
</campaign>
</labs>
When I'm in the csv_file node I need to take the filename and upload users from that file, if I'm in the filter/test I need to check if the filter exists and so on.
Is there a way to apply the Visitor Pattern with SAX?

You could simply have a Map<String, ElementHandler> in your SAX parser, and allow registering ElementHandlers for element names. Supposing that you're only interested in leaf elements:
each time an element starts, you look if there is a handler for this element name in the map, and you clear a buffer.
each time characters() is called, you append the characters to the buffer (if there was a handler for the previous element start)
each time an element is ended, if there was a handler for the previous element start, you call the handler with the content of the buffer
Here's an example:
private ElementHandler currentHandler;
private StringBuilder buffer = new StringBuilder();
private Map<String, ElementHandler> handlers = new HashMap<String, ElementHandler>();
public void registerHandler(String qName, ElementHandler handler) {
handlers.put(qName, handler);
}
public void startElement (String uri, String local, String qName, Attributes atts) throws SAXException {
currentHandler = handlers.get(qName);
buffer.delete(0, buffer.length());
}
public void characters (char buf [], int offset, int length) throws SAXException {
if (currentHandler != null) {
buffer.append(buf, offset, length);
}
}
public void endElement (String uri, String local, String qName) throws SAXException {
if (currentHandler != null) {
currentHandler.handle(buffer.toString();
}
}

Don't forget StAX . It probably won't make Visitor pattern any easier, but if your documents are relatively simple and you're already planning on streaming them, it does have a simpler programming model than SAX. You just iterate over the events in the parsed stream, one a time, ignoring or acting on them as you choose.

How to get an attribute from an XMLReader

I have some HTML that I'm converting to a Spanned using Html.fromHtml(...), and I have a custom tag that I'm using in it:
<customtag id="1234">
So I've implemented a TagHandler to handle this custom tag, like so:
public void handleTag( boolean opening, String tag, Editable output, XMLReader xmlReader ) {
if ( tag.equalsIgnoreCase( "customtag" ) ) {
String id = xmlReader.getProperty( "id" ).toString();
}
}
In this case I get a SAX exception, as I believe the "id" field is actually an attribute, not a property. However, there isn't a getAttribute() method for XMLReader. So my question is, how do I get the value of the "id" field using this XMLReader? Thanks.

Here is my code to get the private attributes of the xmlReader by reflection:
Field elementField = xmlReader.getClass().getDeclaredField("theNewElement");
elementField.setAccessible(true);
Object element = elementField.get(xmlReader);
Field attsField = element.getClass().getDeclaredField("theAtts");
attsField.setAccessible(true);
Object atts = attsField.get(element);
Field dataField = atts.getClass().getDeclaredField("data");
dataField.setAccessible(true);
String[] data = (String[])dataField.get(atts);
Field lengthField = atts.getClass().getDeclaredField("length");
lengthField.setAccessible(true);
int len = (Integer)lengthField.get(atts);
String myAttributeA = null;
String myAttributeB = null;
for(int i = 0; i < len; i++) {
if("attrA".equals(data[i * 5 + 1])) {
myAttributeA = data[i * 5 + 4];
} else if("attrB".equals(data[i * 5 + 1])) {
myAttributeB = data[i * 5 + 4];
}
}
Note you could put the values into a map but for my usage that's too much overhead.

Based on the answer by rekire I made this slightly more robust solution that will handle any tag.
private TagHandler tagHandler = new TagHandler() {
final HashMap<String, String> attributes = new HashMap<String, String>();
private void processAttributes(final XMLReader xmlReader) {
try {
Field elementField = xmlReader.getClass().getDeclaredField("theNewElement");
elementField.setAccessible(true);
Object element = elementField.get(xmlReader);
Field attsField = element.getClass().getDeclaredField("theAtts");
attsField.setAccessible(true);
Object atts = attsField.get(element);
Field dataField = atts.getClass().getDeclaredField("data");
dataField.setAccessible(true);
String[] data = (String[])dataField.get(atts);
Field lengthField = atts.getClass().getDeclaredField("length");
lengthField.setAccessible(true);
int len = (Integer)lengthField.get(atts);
/**
* MSH: Look for supported attributes and add to hash map.
* This is as tight as things can get :)
* The data index is "just" where the keys and values are stored.
*/
for(int i = 0; i < len; i++)
attributes.put(data[i * 5 + 1], data[i * 5 + 4]);
}
catch (Exception e) {
Log.d(TAG, "Exception: " + e);
}
}
...
And inside handleTag do:
#Override
public void handleTag(boolean opening, String tag, Editable output, XMLReader xmlReader) {
processAttributes(xmlReader);
...
And then the attributes will be accessible as so:
attributes.get("my attribute name");

It is possible to use XmlReader provided by TagHandler and get access to tag attribute values without reflection, but that method is even less straightforward than reflection. The trick is to replace ContentHandler used by XmlReader with custom object. Replacing ContentHandler can only be done in the call to handleTag(). That presents a problem getting attribute values for the first tag, which can be solved by adding a custom tag at the start of html.
import android.text.Editable;
import android.text.Html;
import android.text.Spanned;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import java.util.ArrayDeque;
public class HtmlParser implements Html.TagHandler, ContentHandler
{
public interface TagHandler
{
boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes);
}
public static Spanned buildSpannedText(String html, TagHandler handler)
{
// add a tag at the start that is not handled by default,
// allowing custom tag handler to replace xmlReader contentHandler
return Html.fromHtml("<inject/>" + html, null, new HtmlParser(handler));
}
public static String getValue(Attributes attributes, String name)
{
for (int i = 0, n = attributes.getLength(); i < n; i++)
{
if (name.equals(attributes.getLocalName(i)))
return attributes.getValue(i);
}
return null;
}
private final TagHandler handler;
private ContentHandler wrapped;
private Editable text;
private ArrayDeque<Boolean> tagStatus = new ArrayDeque<>();
private HtmlParser(TagHandler handler)
{
this.handler = handler;
}
#Override
public void handleTag(boolean opening, String tag, Editable output, XMLReader xmlReader)
{
if (wrapped == null)
{
// record result object
text = output;
// record current content handler
wrapped = xmlReader.getContentHandler();
// replace content handler with our own that forwards to calls to original when needed
xmlReader.setContentHandler(this);
// handle endElement() callback for <inject/> tag
tagStatus.addLast(Boolean.FALSE);
}
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException
{
boolean isHandled = handler.handleTag(true, localName, text, attributes);
tagStatus.addLast(isHandled);
if (!isHandled)
wrapped.startElement(uri, localName, qName, attributes);
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException
{
if (!tagStatus.removeLast())
wrapped.endElement(uri, localName, qName);
handler.handleTag(false, localName, text, null);
}
#Override
public void setDocumentLocator(Locator locator)
{
wrapped.setDocumentLocator(locator);
}
#Override
public void startDocument() throws SAXException
{
wrapped.startDocument();
}
#Override
public void endDocument() throws SAXException
{
wrapped.endDocument();
}
#Override
public void startPrefixMapping(String prefix, String uri) throws SAXException
{
wrapped.startPrefixMapping(prefix, uri);
}
#Override
public void endPrefixMapping(String prefix) throws SAXException
{
wrapped.endPrefixMapping(prefix);
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException
{
wrapped.characters(ch, start, length);
}
#Override
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
{
wrapped.ignorableWhitespace(ch, start, length);
}
#Override
public void processingInstruction(String target, String data) throws SAXException
{
wrapped.processingInstruction(target, data);
}
#Override
public void skippedEntity(String name) throws SAXException
{
wrapped.skippedEntity(name);
}
}
With this class reading attributes is easy:
HtmlParser.buildSpannedText("<x id=1 value=a>test<x id=2 value=b>", new HtmlParser.TagHandler()
{
#Override
public boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes)
{
if (opening && tag.equals("x"))
{
String id = HtmlParser.getValue(attributes, "id");
String value = HtmlParser.getValue(attributes, "value");
}
return false;
}
});
This approach has the advantage that it allows to disable processing of some tags while using default processing for others, e.g. you can make sure that ImageSpan objects are not created:
Spanned result = HtmlParser.buildSpannedText("<b><img src=nothing>test</b><img src=zilch>",
new HtmlParser.TagHandler()
{
#Override
public boolean handleTag(boolean opening, String tag, Editable output, Attributes attributes)
{
// return true here to indicate that this tag was handled and
// should not be processed further
return tag.equals("img");
}
});

There's an alternative to the other solutions, that doesn't allow you to use custom tags, but has the same effect:
<string name="foobar">blah <annotation customTag="1234">inside blah</annotation> more blah</string>
Then read it like this:
CharSequence annotatedText = context.getText(R.string.foobar);
// wrap, because getText returns a SpannedString, which is not mutable
CharSequence processedText = replaceCustomTags(new SpannableStringBuilder(annotatedText));
public static <T extends Spannable> T replaceCustomTags(T text) {
Annotation[] annotations = text.getSpans(0, text.length(), Annotation.class);
for (Annotation a : annotations) {
String attrName = a.getKey();
if ("customTag".equals(attrName)) {
String attrValue = a.getValue();
int contentStart = text.getSpanStart(a);
int contentEnd = text.getSpanEnd(a);
int contentFlags = text.getSpanFlags(a);
Object newFormat1 = new StyleSpan(Typeface.BOLD);
Object newFormat2 = new ForegroundColorSpan(Color.RED);
text.setSpan(newFormat1, contentStart, contentEnd, contentFlags);
text.setSpan(newFormat2, contentStart, contentEnd, contentFlags);
text.removeSpan(a);
}
}
return text;
}
Depending on what you wanted to do with your custom tags, the above may help you. If you just want to read them, you don't need a SpannableStringBuilder, just cast getText to Spanned interface to investigate.
Note that Annotation representing <annotation foo="bar">...</annotation> is an Android built-in since API level 1! It's one of those hidden gems again. The It has the limitation of one attribute per <annotation> tag, but nothing prevents you from nesting multiple annotations to achieve multiple attributes:
<string name="gold_admin_user"><annotation user="admin"><annotation rank="gold">$$username$$</annotation></annotation></string>
If you use the Editable interface instead of Spannable you can also modify the content around each annotation. For example changing the above code:
String attrValue = a.getValue();
text.insert(text.getSpanStart(a), attrValue);
text.insert(text.getSpanStart(a) + attrValue.length(), " ");
int contentStart = text.getSpanStart(a);
will result as if you had this in the XML:
blah <b><font color="#ff0000">1234 inside blah</font></b> more blah
One caveat to look out for is when you make modifications that affect the length of the text, the spans move around. Make sure you read the span start/end indices at the correct times, best if you inline them to the method calls.
Editable also allows you to do simple search and replace substitution:
index = TextUtils.indexOf(text, needle); // for example $$username$$ above
text.replace(index, index + needle.length(), replacement);

If all you need is just one attribute the suggestion by vorrtex is actually pretty solid. To give you an example of just how simple it would be to handle have a look here:
<xml>Click on <user1>Johnni<user1> or <user2>Jenny<user2> to see...</<xml>
And in your custom TagHandler you don't use equals but indexOf
final static String USER = "user";
if(tag.indexOf(USER) == 0) {
// Extract tag postfix.
String postfix = tag.substring(USER.length());
Log.d(TAG, "postfix: " + postfix);
}
And you can then pass the postfix value in your onClick view parameter as a tag to keep it generic.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parsing Mixed-Content XML with SAX - java

Related

Is it possible to merge XML-Elements with SAX (coremedia CAE filter)

SAX - Read HTML content without CDATA

Why some characters are missing when i parse a xml tag using SaxParser?

Is there a way to use the Visitor pattern using a SAX Parser?

How to get an attribute from an XMLReader

Categories

Resources