I'm using SAX parser to parse XML and is working fine.
I have below tag in XML.
<value>•CERTASS >> Certass</value>
Here I expect '•CERTASS >> Certass' as output. but below code returns only Certass. Is there any issue with the special chars of value tag?
public void characters(char[] buffer, int start, int length) {
temp = new String(buffer, start, length);
}
It is not guaranteed that the characters() method will run only once inside an element.
If you are storing the content in a String, and the characters() method happens to run twice, you will only get the content from the second run. The second time that the characters method runs it will overwrite the contents of your temp variable that was stored from the first time.
To remedy this, use a StringBuilder and append() the contents in characters() and then process the contents in endElement(). For example:
DefaultHandler handler = new DefaultHandler() {
private StringBuilder stringBuilder;
#Override
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
stringBuilder = new StringBuilder();
}
public void characters(char[] buffer, int start, int length) {
stringBuilder.append(new String(buffer, start, length));
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println(stringBuilder.toString());
}
};
Parsing the String "<value>•CERTASS >> Certass</value>" and the handler above gives the output:
?CERTASS >> Certass
I hope this helps.
I ran into this problem the other day, it turns out the reason for this is the CHaracters method is being called multiple times in case any of these Characters are contained in the Value:
" "
' '
< <
> >
& &
Also be careful about Linebreaks / newlines within the value!!!
If the xml is linewrapped without your controll the characters method wil also be called for each line that is in the statement, plus it will return the linebreak! (which you manually need to strip out in turn).
A sample Handler taking care of all these problems is this one:
DefaultHandler handler = new DefaultHandler() {
private boolean isInANameTag = false;
private String localname;
private StringBuilder elementContent;
#Override
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
if (qname.equalsIgnoreCase("myfield")) {
isInMyTag = true;
this.localname = localname;
this.elementContent = new StringBuilder();
}
}
public void characters(char[] buffer, int start, int length) {
if (isInMyTag) {
String content = new String(ch, start, length);
if (StringUtils.equals(content.substring(0, 1), "\n")) {
// remove leading newline
elementContent.append(content.substring(1));
} else {
elementContent.append(content);
}
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qname.equalsIgnoreCase("myfield")) {
isInMyTag = false;
// do something with elementContent.toString());
System.out.println(elementContent.toString());
this.localname = "";
}
}
}
Related
I try to update XML file which was read from the database, saved as a XMLType and then i performed SAXParse saving in variables all information i needed to use to construct further queries to the database. Basing on the values I've read I'm checking some conditions and then I want to update values of 3 nodes. How can I update the values. Below is the code I use to parse document but I have no idea how to update XML file in java using SAX.
public void parseXML(int i) throws XMLParseException, SAXException, IOException, SQLException {
String xml = printXML(i);
saxParser.parse(new InputSource(new StringReader(xml)), handler);
}
And in handler i have various conditions to save things I'm interested in like:
public class UserHandler extends DefaultHandler {
StringBuilder builder = new StringBuilder();
private Data data = new Data();
boolean idOrder = false;
boolean idReader = false;
#Override
public void startElement(String uri,
String localName, String qName, Attributes attributes)
throws SAXException {
if (qName.equalsIgnoreCase("order")) {
data.setIdOrder(attributes.getValue("ID_ORDER"));
} else if (qName.equalsIgnoreCase("id_reader")) {
idReader = true;
}
builder.setLength(0);
}
#Override
public void endElement(String uri,
String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("id_reader")) {
data.setIdReader(builder.toString());
}
}
#Override
public void characters(char ch[],
int start, int length) throws SAXException {
if (idReader) {
builder.append(new String(ch, start, length));
}
}
}
Please give me some hints.
I would appreciate any help on this.
This is my first handler I wrote.
I got I REST Webservice returning XML of links. It has quite simple structure and is not deep.
I wrote a handler for this:
public class SAXHandlerLnk extends DefaultHandler {
public List<Link> lnkList = new ArrayList();
Link lnk = null;
private StringBuilder content = new StringBuilder();
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("link")) {
lnk = new Link();
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("link")) {
lnkList.add(lnk);
}
else if (qName.equals("applicationCode")) {
lnk.applicationCode = content.toString();
}
else if (qName.equals("moduleCode")) {
lnk.moduleCode = content.toString();
}
else if (qName.equals("linkCode")) {
lnk.linkCode = content.toString();
}
else if (qName.equals("languageCode")) {
lnk.languageCode = content.toString();
}
else if (qName.equals("value")) {
lnk.value = content.toString();
}
else if (qName.equals("illustrationUrl")) {
lnk.illustrationUrl = content.toString();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
content.append(ch, start, length);
}
}
Some XML returned can be empty eg. or . When this happens my handler unfortunatelly adds previous value to the Object lnk. So when is empty in XML, I got lnk.illustrationUrl = content; equal to lnk.value.
Link{applicationCode='onedownload', moduleCode='onedownload',...}
In the above example, I would like moduleCode to be empty or null, because in XML it is an empty tag.
Here is the calling class:
public class XMLRepositoryRestLinksFilterSAXParser {
public static void main(String[] args) throws Exception {
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandlerLnk handler = new SAXHandlerLnk();
parser.parse({URL}, handler);
for ( Link lnk : handler.lnkList){
System.out.println(lnk);
}
}
}
Like stated in my comment, you'd do the following. The callbacks are usually called in startElement, characters, (nested?), characters, endElement order, where (nested?) represents an optional repeat of the entire sequence.
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
content = null;
if (qName.equals("link")) {
lnk = new Link();
}
}
Note that characters may be called multiple times per a single XML element in your document, so your current code might fail to capture all content. You'd be better off using a StringBuilder instead of a String object to hold your character content and append to it. See this answer for an example.
<Details><propname key="workorderid">799</propname>
How do i get 799 from workorderid useing SAXParing?
when i use this code i get "workorderid" but not the value of workorderid
if(localName.equals("propname")){
String workid = attributes.getValue("key");
if(localName.equals("propname")){
//set one flag here and in endElement() get the value associated with your localname(propname)
String workid = attributes.getValue("key");
I am providing you the code try to understand and customize in your way.
public class ExampleHandler extends DefaultHandler {
private String item;
private boolean inItem = false;
private StringBuilder content;
public ExampleHandler() {
items = new Items();
content = new StringBuilder();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
content = new StringBuilder();
if(localName.equalsIgnoreCase("propname")) {
inItem = true;
} else attributes.getValue("key");
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if(localName.equalsIgnoreCase("propname")) {
if(inItem) {
item = (content.toString());
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
content.append(ch, start, length);
}
public void endDocument() throws SAXException {
// you can do something here for example send
// the Channel object somewhere or whatever.
}
}
May somewhere wrong i'm in hurry. If helps Appreciate.
The following will hold the value of the node.
public void characters(char[] ch, int start, int length) throws SAXException {
tempVal = new String(ch,start,length);
}
In the event handler method, you need to get it like this:
if(qName.equals("propname")) {
System.out.println(" node value " + tempVal); // node value
String attr = attributes.getValue("key") ; // will return attribute value for the propname node.
}
In propname the attribute Key having value workorderid which is correct.
You need to get the value propname.
//Provide you tagname which is propname
NodeList nl = ele.getElementsByTagName(tagName);
if(nl != null && nl.getLength() > 0) {
Element el = (Element)nl.item(0);
textVal = el.getFirstChild().getNodeValue();
}
I have an XML document that has HTML tags included:
<chapter>
<h1>title of content</h1>
<p> my paragraph ... </p>
</chapter>
I need to get the content of <chapter> tag and my output will be:
<h1>title of content</h1>
<p> my paragraph ... </p>
My question is similar to this post: How parse XML to get one tag and save another tag inside
But I need to implement it in Java using SAX or DOM or ...?
I found a soluton using SAX in this post: SAX Parser : Retrieving HTML tags from XML but it's very buggy and doesn't work with large amounts of XML data.
Updated:
My SAX implementation:
In some situation it throw exception: java.lang.StringIndexOutOfBoundsException: String index out of range: -4029
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private char[] temp;
String insideTag;
private int startPosition;
private int endPosition;
private String tag;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
tagFlag = true;
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
insideTag = new String(temp, startPosition, endPosition - startPosition);
tagFlag = false;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
temp = ch;
if (tagFlag) {
startPosition = start;
tagFlag = false;
}
endPosition = start + length;
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
Update 2: (Using StringBuilder)
I have accumulated characters by StringBuilder in this way:
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private char[] temp;
String insideTag;
private String tag;
private StringBuilder builder;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
builder = new StringBuilder();
tagFlag = true;
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
insideTag = builder.toString();
tagFlag = false;
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
if (tagFlag) {
builder.append(ch, start, length);
}
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
But builder.append(ch, start, length); doesn't append Start tag like<EmbeddedTag atr="..."> and </EmbeddedTag> in the Buffer. This Code print Output:
title of content
my paragraph ...
Instead of expected output:
<h1>title of content</h1>
<p> my paragraph ... </p>
Update 3:
Finally I have implemented the parser handler:
public class MyXMLHandler extends DefaultHandler {
private boolean tagFlag = false;
private String insideTag;
private String tag;
private StringBuilder builder;
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase(tag)) {
builder = new StringBuilder();
tagFlag = true;
}
if (tagFlag) {
builder.append("<" + qName);
for (int i = 0; i < attributes.getLength(); i++) {
builder.append(" " + attributes.getLocalName(i) + "=\"" +
attributes.getValue(i) + "\"");
}
builder.append(">");
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (tagFlag) {
builder.append("</" + qName + ">");
}
if (qName.equalsIgnoreCase(tag)) {
insideTag = builder.toString();
tagFlag = false;
}
System.out.println("End Element :" + qName);
}
public void characters(char ch[], int start, int length)
throws SAXException {
temp = ch;
if (tagFlag) {
builder.append(ch, start, length);
}
}
public String getInsideTag(String tag) {
this.tag = tag;
return insideTag;
}
}
The problem with your code is that you try to remember the start and end positions of the string passed to you via the characters method. What you see in the exception thrown is the result of an inside tag that starts near the end of a character buffer and ends near the beginning of the next character buffer.
With sax you need to copy the characters when they are offered or the temporary buffer they occupy might be cleared when you need them.
Your best bet is not to remember the positions in the buffers, but to create a new StringBuilder in startElement and add the characters to that, then get the complete string out the builder in endElement.
Try to use Digester, I've used it years ago, version 1.5 and it were simply to create mapping for xml like you. Just simple article how to use Digester, but it is for version 1.5 and currently there is 3.0 I think last version contains a lot of new features ...
So I am currently using SAX to try and extract some information from a a number of xml documents I am working from. Thus far, it is really easy to extract the attribute values. However, I have no clue how to go about extracting actual values from a text node.
For example, in the given XML document:
<w:rStyle w:val="Highlight" />
</w:rPr>
</w:pPr>
- <w:r>
<w:t>Text to Extract</w:t>
</w:r>
</w:p>
- <w:p w:rsidR="00B41602" w:rsidRDefault="00B41602" w:rsidP="007C3A42">
- <w:pPr>
<w:pStyle w:val="Copy" />
I can extract "Highlight" no problem by getting the value from val. But I have no idea how to get into that text node and get out "Text to Extract".
Here is my Java code thus far to pull out the attribute values...
private static final class SaxHandler extends DefaultHandler
{
// invoked when document-parsing is started:
public void startDocument() throws SAXException
{
System.out.println("Document processing starting:");
}
// notifies about finish of parsing:
public void endDocument() throws SAXException
{
System.out.println("Document processing finished. \n");
}
// we enter to element 'qName':
public void startElement(String uri, String localName,
String qName, Attributes attrs) throws SAXException
{
if(qName.equalsIgnoreCase("Relationships"))
{
// do nothing
}
else if(qName.equalsIgnoreCase("Relationship"))
{
// goes into the element and if the attribute is equal to "Target"...
String val = attrs.getValue("Target");
// ...and the value is not null
if(val != null)
{
// ...and if the value contains "image" in it...
if (val.contains("image"))
{
// ...then get the id value
String id = attrs.getValue("Id");
// ...and use the substring method to isolate and print out only the image & number
int begIndex = val.lastIndexOf("/");
int endIndex = val.lastIndexOf(".");
System.out.println("Id: " + id + " & Target: " + val.substring(begIndex+1, endIndex));
}
}
}
else
{
throw new IllegalArgumentException("Element '" +
qName + "' is not allowed here");
}
}
// we leave element 'qName' without any actions:
public void endElement(String uri, String localName, String qName) throws SAXException
{
// do nothing;
}
}
But I have no clue where to start to get into that text node and pull out the values inside. Anyone have some ideas?
Here's some pseudo-code:
private boolean insideElementContainingTextNode;
private StringBuilder textBuilder;
public void startElement(String uri, String localName, String qName, Attributes attrs) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = true;
textBuilder = new StringBuilder();
}
}
public void characters(char[] ch, int start, int length) {
if (insideElementContainingTextNode) {
textBuilder.append(ch, start, length);
}
}
public void endElement(String uri, String localName, String qName) {
if ("w:t".equals(qName)) { // or is it localName?
insideElementContainingTextNode = false;
String theCompleteText = this.textBuilder.toString();
this.textBuilder = null;
}
}