Java SaxParser trim the string after & - java

I want to parse this xml:
<sparql xmlns="http://www.w3.org/2005/sparql-results#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2001/sw/DataAccess/rf1/result2.xsd">
<head>
<variable name="uri"/>
<variable name="id"/>
<variable name="label"/>
</head>
<results distinct="false" ordered="true">
<result>
<binding name="uri"><uri>http://dbpedia.org/resource/Davis_&_Weight_Motorsports</uri></binding>
<binding name="label"><literal xml:lang="en">Davis & Weight Motorsports</literal></binding>
<binding name="id"><literal datatype="http://www.w3.org/2001/XMLSchema#integer">5918444</literal></binding>
<binding name="label"><literal xml:lang="en">Davis & Weight Motorsports</literal></binding>
</result></results></sparql>
This is my handler:
public class DBpediaLookupClient extends DefaultHandler{
public DBpediaLookupClient(String query) throws Exception {
this.query = query;
HttpMethod method = new GetMethod("some_uri&query=" + query2);
try {
client.executeMethod(method);
InputStream ins = method.getResponseBodyAsStream();
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser sax = factory.newSAXParser();
sax.parse(ins, this);
} catch (HttpException he) {
System.err.println("Http error connecting to lookup.dbpedia.org");
} catch (IOException ioe) {
System.err.println("Unable to connect to lookup.dbpedia.org");
}
method.releaseConnection();
}
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("td") || qName.equalsIgnoreCase("uri") || qName.equalsIgnoreCase("literal")) {
tempBinding = new HashMap<String, String>();
}
lastElementName = qName;
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equalsIgnoreCase("uri") || qName.equalsIgnoreCase("literal") || qName.equalsIgnoreCase("td")) {
if (!variableBindings.contains(tempBinding))
variableBindings.add(tempBinding);
}
}
public void characters(char[] ch, int start, int length) throws SAXException {
String s = new String(ch, start, length).trim();
if (s.length() > 0) {
if ("td".equals(lastElementName)) {
if (tempBinding.get("td") == null) {
tempBinding.put("td", s);
}
}
else if ("uri".equals(lastElementName)) {
if (tempBinding.get("uri") == null) {
tempBinding.put("uri", s);
}
}
else if ("literal".equals(lastElementName)) {
if (tempBinding.get("literal") == null) {
tempBinding.put("literal", s);
}
}
//if ("URI".equals(lastElementName)) tempBinding.put("URI", s);
if ("URI".equals(lastElementName) && s.indexOf("Category")==-1 && tempBinding.get("URI") == null) {
tempBinding.put("URI", s);
}
if ("Label".equals(lastElementName)) tempBinding.put("Label", s);
}
}
}
And this is the result:
key: uri, value: http://dbpedia.org/resource/Davis_
key: literal, value: 5918444
key: literal, valueDavis
As you can see it gets seperated from the &
When I trace through the character() function I see that the lenght is wrong and is up to & instead of being up to the end of the string that I want to get as the result.
I copied this part of code and I don't know much about parser and handlers, I just know that much that I got from tracing the code, and wherever I searched it was said there should be & instead of & in an xml document, which is the case here.
What should I do in this code to get the complete string not get trimed by & character?

This is a lesson everyone has to learn when using SAX: the parser can break up text nodes and report the content in multiple calls to characters(), and it's the application's job to reassemble it (e.g. by using a StringBuilder). It's very common for parsers to break the text at any point where it would otherwise have to shunt characters around in memory, e.g. where entity references occur or where it hits an I/O buffer boundary.
It was designed this way to make SAX parsers super-efficient by minimizing text copying, but I suspect there's no real benefit, because the text copying just has to be done by the application instead.
Don't try and second-guess the parser as #DavidWallace suggests. The parser is allowed to break the text up any way it likes, and your application should cater for that.

Related

SAXParseException; systemId: cumulative size of entities exceeds bound

Morning,
I have to parse a huge xml file (2GB) in Java. It has many tags but I only need to write the content of two tags <title> and <subtext> each time in a common file, so I use SaxParse
So far, I have managed to write 1M95 text in the output file, by then this exception occurs:
org.xml.sax.SAXParseException; systemId: filePath; lineNumber: x; columnNumber: y; JAXP00010004 : La taille cumulée des entités est "50 000 001" et dépasse la limite de "50 000 000" définie par "FEATURE_SECURE_PROCESSING".
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:203)
at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:177)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:400)
at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:327)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.reportFatalError(XMLScanner.java:1465)
at com.sun.org.apache.xerces.internal.impl.XMLScanner.checkEntityLimit(XMLScanner.java:1544)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.handleCharacter(XMLDocumentFragmentScannerImpl.java:1940)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEntityReference(XMLDocumentFragmentScannerImpl.java:1866)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:3058)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606)
at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:504)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848)
at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777)
at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1213)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:643)
at com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.parse(SAXParserImpl.java:327)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:328)
at Parsing.main(Class.java:38)
The translation of the exception is like:
The cumulative size of the entities is "50 000 001" which exceeds the boundary of "50 000 000" defined by "FEATURE_SECURE_PROCESSING".
This is the code I've written:
public class Parsing {
public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
try {
File inputFile = new File(System.getProperty("user.dir") + "/input.xml");
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
UserHandler userhandler = new UserHandler();
saxParser.parse(inputFile, userhandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void doThingOne(String text, String title) throws IOException {
// Write the text and the title on a file
}
public static void doThingTwo(String text, String title) throws IOException {
//Write the text and the title on another file
}
class UserHandler extends DefaultHandler {
boolean bText = false;
boolean bTitle = false;
StringBuffer tagTextBuffer;
StringBuffer tagTitleBuffer;
String text = null;
String title = null;
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("title")) {
tagTitleBuffer = new StringBuffer();
bTitle = true;
} else if (qName.equalsIgnoreCase("text")) {
tagTextBuffer = new StringBuffer();
bText = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("title")) {
bTitle = false;
title = tagTextBuffer.toString();
} else if (qName.equals("text")) {
text = tagTextBuffer.toString();
bText = false;
if (text!=null && title == "One") {
try {
Parsing.doThingOne(page, title);
} catch (IOException e) {
e.printStackTrace();
}
} else if (text != null) {
try {
Parsing.doThingTwo(page, title);
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (bTitle) {
tagTitleBuffer.append(new String(ch, start, length));
} else if (bText) {
tagTextBuffer.append(new String(ch, start, length));
}
}
}
Thank you for your time.
Switch off the FEATURE_SECURE_PROCESSING has no effect (Java8).
For increases the limit, use :
System.setProperty("jdk.xml.totalEntitySizeLimit", String.valueOf(Integer.MAX_VALUE));
before SAXParserFactory.newInstance();
The limit is there to prevent the "billion laughs" attack. If you trust the XML source you could switch off the SECURE_PROCESSING feature which imposes this limit.
I would generally recommend using Apache Xerces in preference to the version bundled with the JDK.
Your code for the characters() method is wrong: both the text and the title element content can be delivered split into multiple calls so you need to accumulate a buffer for both cases.
It would be nice to know something about why the entity expansion limit is being hit. Does your document include lots of entity references to tiny entities, or a few references to big ones, or what? Do the entity references occur in the parts of the document you are interested in?

FLAT XML of any type using SAX Parser in Java

I am a novice in Java and I have written a code in which I am struggling to fetch the element value inside the tag. for example in the below xml- id = bk001 didn't appear in the output
<book id="bk001">
<author>Hightower, Kim</author>
<title>The First Book</title>
<genre>Fiction</genre>
<price>44.95</price>
<pub_date>2000-10-01</pub_date>
<date>
<auth_date>
2000-10-01
</auth_date>
<auth_date>
2000-10-05
</auth_date>
</date>
<review>An amazing story of nothing.</review>
</book>
We can expect XML of any type, we have to convert into a flat structure e.g. CSV
Code written
public class SAX
{
Map<String, String> list = new HashMap<String,String>();
public static void main(String[] args) throws IOException {
new SAX().printElementNames("input/books_1.xml");
}
public void printElementNames(String fileName) throws IOException
{
try {
SAXParserFactory parserFact = SAXParserFactory.newInstance();
SAXParser parser = parserFact.newSAXParser();
DefaultHandler handler = new DefaultHandler()
{
public void startElement(String uri, String lName, String ele, Attributes attributes) throws SAXException {
System.out.print(ele + " ");
if((attributes.getValue("TagValue"))==null)
{
return;
}
else
{
System.out.println(attributes.getValue("TagValue"));
}
}
public void characters(char ch[], int start, int length) throws SAXException {
String value = new String(ch, start, length).trim();
if(value.length() == 0) return;
System.out.println(value);
}
};
parser.parse(new File(fileName), handler);
}catch(Exception e){
e.printStackTrace();
}
}
}
Kindly help me with the same. I have tried to search the same on stackoverflow but couldn't get anything concrete.
Agenda of the code is that it should work for any valid XML.
Note - We are not allowed to use external libraries like gson etc.
The only attribute that your code is attempting to read is "TagValue", so why would you expect your code to display the value of an "id" attribute?
replace your startElement with:
public void startElement(String uri, String localName,String qName, Attributes attributes) throws SAXException {
System.out.print(qName + " ");
for(int i=0; i<attributes.getLength();i++) {
System.out.println(attributes.getQName(i) + " " + attributes.getValue(i));
}
}

Java SAX is not parsing properly

I would appreciate any help on this.
This is my first handler I wrote.
I got I REST Webservice returning XML of links. It has quite simple structure and is not deep.
I wrote a handler for this:
public class SAXHandlerLnk extends DefaultHandler {
public List<Link> lnkList = new ArrayList();
Link lnk = null;
private StringBuilder content = new StringBuilder();
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equals("link")) {
lnk = new Link();
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (qName.equals("link")) {
lnkList.add(lnk);
}
else if (qName.equals("applicationCode")) {
lnk.applicationCode = content.toString();
}
else if (qName.equals("moduleCode")) {
lnk.moduleCode = content.toString();
}
else if (qName.equals("linkCode")) {
lnk.linkCode = content.toString();
}
else if (qName.equals("languageCode")) {
lnk.languageCode = content.toString();
}
else if (qName.equals("value")) {
lnk.value = content.toString();
}
else if (qName.equals("illustrationUrl")) {
lnk.illustrationUrl = content.toString();
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
content.append(ch, start, length);
}
}
Some XML returned can be empty eg. or . When this happens my handler unfortunatelly adds previous value to the Object lnk. So when is empty in XML, I got lnk.illustrationUrl = content; equal to lnk.value.
Link{applicationCode='onedownload', moduleCode='onedownload',...}
In the above example, I would like moduleCode to be empty or null, because in XML it is an empty tag.
Here is the calling class:
public class XMLRepositoryRestLinksFilterSAXParser {
public static void main(String[] args) throws Exception {
SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandlerLnk handler = new SAXHandlerLnk();
parser.parse({URL}, handler);
for ( Link lnk : handler.lnkList){
System.out.println(lnk);
}
}
}
Like stated in my comment, you'd do the following. The callbacks are usually called in startElement, characters, (nested?), characters, endElement order, where (nested?) represents an optional repeat of the entire sequence.
#Override
//Triggered when the start of tag is found.
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
content = null;
if (qName.equals("link")) {
lnk = new Link();
}
}
Note that characters may be called multiple times per a single XML element in your document, so your current code might fail to capture all content. You'd be better off using a StringBuilder instead of a String object to hold your character content and append to it. See this answer for an example.

Better way to parse xml

I've been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I mean, sample dummy XML:
<?xml version="1.0"?>
<Order>
<Date>2003/07/04</Date>
<CustomerId>123</CustomerId>
<CustomerName>Acme Alpha</CustomerName>
<Item>
<ItemId> 987</ItemId>
<ItemName>Coupler</ItemName>
<Quantity>5</Quantity>
</Item>
<Item>
<ItemId>654</ItemId>
<ItemName>Connector</ItemName>
<Quantity unit="12">3</Quantity>
</Item>
<Item>
<ItemId>579</ItemId>
<ItemName>Clasp</ItemName>
<Quantity>1</Quantity>
</Item>
</Order>
This is relevant part (using sax) :
public class SaxParser extends DefaultHandler {
boolean isItem = false;
boolean isOrder = false;
boolean isDate = false;
boolean isCustomerId = false;
private Order order;
private Item item;
#Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
if (localName.equalsIgnoreCase("ORDER")) {
order = new Order();
}
if (localName.equalsIgnoreCase("DATE")) {
isDate = true;
}
if (localName.equalsIgnoreCase("CUSTOMERID")) {
isCustomerId = true;
}
if (localName.equalsIgnoreCase("ITEM")) {
isItem = true;
}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (isDate){
SimpleDateFormat formatter = new SimpleDateFormat("yyyy/MM/dd");
String value = new String(ch, start, length);
try {
order.setDate(formatter.parse(value));
} catch (ParseException e) {
e.printStackTrace();
}
}
if(isCustomerId){
order.setCustomerId(Integer.valueOf(new String(ch, start, length)));
}
if (isItem) {
item = new Item();
isItem = false;
}
}
}
I'm wondering is there a way to get rid of these hideous booleans which keep growing with number of elements. There must be a better way to parse this relatively simple xml. Just by looking the lines of code necessary to do this task looks ugly.
Currently I'm using SAX parser, but I'm open to any other suggestions (other than DOM, I can't afford in memory parsers I have huge XML files).
If you control the definition of the XML, you could use an XML binding tool, for example JAXB (Java Architecture for XML Binding.) In JAXB you can define a schema for the XML structure (XSD and others are supported) or annotate your Java classes in order to define the serialization rules. Once you have a clear declarative mapping between XML and Java, marshalling and unmarshalling to/from XML becomes trivial.
Using JAXB does require more memory than SAX handlers, but there exist methods to process the XML documents by parts: Dealing with large documents.
JAXB page from Oracle
Here's an example of using JAXB with StAX.
Input document:
<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
<Person>
<Name>Name 1</Name>
<Address>
<StreetAddress>Somestreet</StreetAddress>
<PostalCode>00001</PostalCode>
<CountryName>Finland</CountryName>
</Address>
</Person>
<Person>
<Name>Name 2</Name>
<Address>
<StreetAddress>Someotherstreet</StreetAddress>
<PostalCode>43400</PostalCode>
<CountryName>Sweden</CountryName>
</Address>
</Person>
</Personlist>
Person.java:
#XmlRootElement(name = "Person", namespace = "http://example.org")
public class Person {
#XmlElement(name = "Name", namespace = "http://example.org")
private String name;
#XmlElement(name = "Address", namespace = "http://example.org")
private Address address;
public String getName() {
return name;
}
public Address getAddress() {
return address;
}
}
Address.java:
public class Address {
#XmlElement(name = "StreetAddress", namespace = "http://example.org")
private String streetAddress;
#XmlElement(name = "PostalCode", namespace = "http://example.org")
private String postalCode;
#XmlElement(name = "CountryName", namespace = "http://example.org")
private String countryName;
public String getStreetAddress() {
return streetAddress;
}
public String getPostalCode() {
return postalCode;
}
public String getCountryName() {
return countryName;
}
}
PersonlistProcessor.java:
public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}
// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}
private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}
I've been using xsteam to serialize my own objects to xml and then load them back as Java objects. If you can represent everythign as POJOs and you properly annotate the POJOs to match the types in your xml file you might find it much easier to use.
When a String represents an object in XML, you can just write:
Order theOrder = (Order)xstream.fromXML(xmlString);
I have always used it to load an object into memory in a single line, but if you need to stream it and process as you go you should be able to use a HierarchicalStreamReader to iterate through the document. This might be very similar to Simple, suggested by #Dave.
In SAX the parser "pushes" events at your handler, so you have to do all the housekeeping as you are used to here. An alternative would be StAX (the javax.xml.stream package), which is still streaming but your code is responsible for "pulling" events from the parser. This way the logic of what elements are expected in what order is encoded in the control flow of your program rather than having to be explicitly represented in booleans.
Depending on the precise structure of the XML there may be a "middle way" using a toolkit like XOM, which has a mode of operation where you parse a subtree of the document into a DOM-like object model, process that twig, then throw it away and parse the next one. This is good for repetitive documents with many similar elements that can each be processed in isolation - you get the ease of programming to a tree-based API within each twig but still have the streaming behaviour that lets you parse huge documents efficiently.
public class ItemProcessor extends NodeFactory {
private Nodes emptyNodes = new Nodes();
public Nodes finishMakingElement(Element elt) {
if("Item".equals(elt.getLocalName())) {
// process the Item element here
System.out.println(elt.getFirstChildElement("ItemId").getValue()
+ ": " + elt.getFirstChildElement("ItemName").getValue());
// then throw it away
return emptyNodes;
} else {
return super.finishMakingElement(elt);
}
}
}
You can achieve a similar thing with a combination of StAX and JAXB - define JAXB annotated classes that represent your repeating element (Item in this example) and then create a StAX parser, navigate to the first Item start tag, and then you can unmarshal one complete Item at a time from the XMLStreamReader.
As others suggested, a Stax model would be a better approach to minimize the memory foot print since it is a push based model. I have personally used Axio (Which is used in Apache Axis) and parse elements using XPath expressions which is less verbose than going through node elements as you have done in the code snippet provided.
I've been using this library. It sits on top of the standard Java library and makes things easier for me. In particular, you can ask for a specific element or attribute by name, rather than using the big "if" statement you've described.
http://marketmovers.blogspot.com/2014/02/the-easy-way-to-read-xml-in-java.html
There is another library which supports more compact XML parsing, RTXML. The library and its documentation is on rasmustorkel.com. I implemented the parsing of the file in the original question and I am including the complete program here:
package for_so;
import java.io.File;
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import rasmus_torkel.xml_basic.read.TagNode;
import rasmus_torkel.xml_basic.read.XmlReadOptions;
import rasmus_torkel.xml_basic.read.impl.XmlReader;
public class Q15626686_ReadOrder
{
public static class Order
{
public final Date _date;
public final int _customerId;
public final String _customerName;
public final ArrayList<Item> _itemAl;
public
Order(TagNode node)
{
_date = (Date)node.nextStringMappedFieldE("Date", Date.class);
_customerId = (int)node.nextIntFieldE("CustomerId");
_customerName = node.nextTextFieldE("CustomerName");
_itemAl = new ArrayList<Item>();
boolean finished = false;
while (!finished)
{
TagNode itemNode = node.nextChildN("Item");
if (itemNode != null)
{
Item item = new Item(itemNode);
_itemAl.add(item);
}
else
{
finished = true;
}
}
node.verifyNoMoreChildren();
}
}
public static final Pattern DATE_PATTERN = Pattern.compile("^(\\d\\d\\d\\d)\\/(\\d\\d)\\/(\\d\\d)$");
public static class Date
{
public final String _dateString;
public final int _year;
public final int _month;
public final int _day;
public
Date(String dateString)
{
_dateString = dateString;
Matcher matcher = DATE_PATTERN.matcher(dateString);
if (!matcher.matches())
{
throw new RuntimeException(dateString + " does not match pattern " + DATE_PATTERN.pattern());
}
_year = Integer.parseInt(matcher.group(1));
_month = Integer.parseInt(matcher.group(2));
_day = Integer.parseInt(matcher.group(3));
}
}
public static class Item
{
public final int _itemId;
public final String _itemName;
public final Quantity _quantity;
public
Item(TagNode node)
{
_itemId = node.nextIntFieldE("ItemId");
_itemName = node.nextTextFieldE("ItemName");
_quantity = new Quantity(node.nextChildE("Quantity"));
node.verifyNoMoreChildren();
}
}
public static class Quantity
{
public final int _unitSize;
public final int _unitQuantity;
public
Quantity(TagNode node)
{
_unitSize = node.attributeIntD("unit", 1);
_unitQuantity = node.onlyInt();
}
}
public static void
main(String[] args)
{
File xmlFile = new File(args[0]);
TagNode orderNode = XmlReader.xmlFileToRoot(xmlFile, "Order", XmlReadOptions.DEFAULT);
Order order = new Order(orderNode);
System.out.println("Read order for " + order._customerName + " which has " + order._itemAl.size() + " items");
}
}
You will notice that the retrieval functions end in N, E or D. They refer to what to do when the desired data item is not there. N stands for return Null, E stands for throw Exception and D stands for use Default.
Solution without using outside package, or even XPath: use an enum "PARSE_MODE", probably in combination with a Stack<PARSE_MODE>:
1) The basic solution:
a) fields
private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
// NB: essential that all these enum values are upper case, but this is the convention anyway
private enum PARSE_MODE {
__UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
private List<String> parseModeStrings = new ArrayList<String>();
private Stack<PARSE_MODE> modeBreadcrumbs = new Stack<PARSE_MODE>();
b) make your List<String>, maybe in the constructor:
for( PARSE_MODE pm : PARSE_MODE.values() ){
// might want to check here that these are indeed upper case
parseModeStrings.add( pm.name() );
}
c) startElement and endElement:
#Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
String localNameUC = localName.toUpperCase();
// pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
assert ! localNameUC.equals( "__UNDEFINED__" );
if( parseModeStrings.contains( localNameUC )){
parseMode = PARSE_MODE.valueOf( localNameUC );
// any "policing" to do with which modes are allowed to switch into
// other modes could be put here...
// in your case, go `new Order()` here when parseMode == ORDER
modeBreadcrumbs.push( parseMode );
}
else {
// typically ignore the start of this element...
}
}
#Override
private void endElement(String uri, String localName, String qName) throws Exception {
String localNameUC = localName.toUpperCase();
if( parseModeStrings.contains( localNameUC )){
// will not fail unless XML structure which is malformed in some way
// or coding error in use of the Stack, etc.:
assert modeBreadcrumbs.pop() == parseMode;
if( modeBreadcrumbs.empty() ){
parseMode = PARSE_MODE.__UNDEFINED__;
}
else {
parseMode = modeBreadcrumbs.peek();
}
}
else {
// typically ignore the end of this element...
}
}
... so what does this all mean? At any one time you have knowledge of the "parse mode" you're in ... and you can also look at the Stack<PARSE_MODE> modeBreadcrumbs if you need to find out what other parse modes you passed through to get here...
Your characters method then becomes substantially cleaner:
public void characters(char[] ch, int start, int length) throws SAXException {
switch( parseMode ){
case DATE:
// PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
SimpleDateFormat formatter. ...
String value = ...
...
break;
case CUSTOMERID:
order.setCustomerId( ...
break;
case ITEM:
item = new Item();
// this next line probably won't be needed: when you get to endElement, if
// parseMode is ITEM, the previous mode will be restored automatically
// isItem = false ;
}
}
2) The more "professional" solution:
abstract class which concrete classes have to extend and which then have no ability to modify the Stack, etc. NB this examines qName rather than localName. Thus:
public abstract class AbstractSAXHandler extends DefaultHandler {
protected enum PARSE_MODE implements SAXHandlerParseMode {
__UNDEFINED__
};
// abstract: the concrete subclasses must populate...
abstract protected Collection<Enum<?>> getPossibleModes();
//
private Stack<SAXHandlerParseMode> modeBreadcrumbs = new Stack<SAXHandlerParseMode>();
private Collection<Enum<?>> possibleModes;
private Map<String, Enum<?>> nameToEnumMap;
private Map<String, Enum<?>> getNameToEnumMap(){
// lazy creation and population of map
if( nameToEnumMap == null ){
if( possibleModes == null ){
possibleModes = getPossibleModes();
}
nameToEnumMap = new HashMap<String, Enum<?>>();
for( Enum<?> possibleMode : possibleModes ){
nameToEnumMap.put( possibleMode.name(), possibleMode );
}
}
return nameToEnumMap;
}
protected boolean isLegitimateModeName( String name ){
return getNameToEnumMap().containsKey( name );
}
protected SAXHandlerParseMode getParseMode() {
return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
try {
_startElement(uri, localName, qName, attributes);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
protected void _startElement(String uri, String localName, String qName, Attributes attributes)
throws Exception {
String qNameUC = qName.toUpperCase();
// very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
if( getNameToEnumMap().containsKey( qNameUC )){
Enum<?> newMode = getNameToEnumMap().get( qNameUC );
modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
try {
_endElement(uri, localName, qName);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses
protected void _endElement(String uri, String localName, String qName) throws Exception {
String qNameUC = qName.toUpperCase();
if( getNameToEnumMap().containsKey( qNameUC )){
modeBreadcrumbs.pop();
}
}
public List<?> showModeBreadcrumbs(){
return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
}
}
interface SAXHandlerParseMode {
}
Then, salient part of concrete subclass:
private enum PARSE_MODE implements SAXHandlerParseMode {
ORDER, DATE, CUSTOMERID, ITEM
};
private Collection<Enum<?>> possibleModes;
#Override
protected Collection<Enum<?>> getPossibleModes() {
// lazy initiation
if (possibleModes == null) {
List<SAXHandlerParseMode> parseModes = new ArrayList<SAXHandlerParseMode>( Arrays.asList(PARSE_MODE.values()) );
possibleModes = new ArrayList<Enum<?>>();
for( SAXHandlerParseMode parseMode : parseModes ){
possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
}
// __UNDEFINED__ mode (from abstract superclass) must be added afterwards
possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
}
return possibleModes;
}
PS this is a starting point for more sophisticated stuff: for example, you might set up a List<Object> which is kept synchronised with the Stack<PARSE_MODE>: the Objects could then be anything you want, enabling you to "reach back" into the ascendant "XML nodes" of the one you're dealing with. Don't use a Map, though: the Stack can potentially contain the same PARSE_MODE object more than once. This in fact illustrates a fundamental characteristic of all tree-like structures: no individual node (here: parse mode) exists in isolation: its identity is always defined by the entire path leading to it.
import java.io.File;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
public class JXML {
private DocumentBuilder builder;
private Document doc = null;
private DocumentBuilderFactory factory ;
private XPathExpression expr = null;
private XPathFactory xFactory;
private XPath xpath;
private String xmlFile;
public static ArrayList<String> XMLVALUE ;
public JXML(String xmlFile){
this.xmlFile = xmlFile;
}
private void xmlFileSettings(){
try {
factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
xFactory = XPathFactory.newInstance();
xpath = xFactory.newXPath();
builder = factory.newDocumentBuilder();
doc = builder.parse(xmlFile);
}
catch (Exception e){
System.out.println(e);
}
}
public String[] selectQuery(String query){
xmlFileSettings();
ArrayList<String> records = new ArrayList<String>();
try {
expr = xpath.compile(query);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
records.add(nodes.item(i).getNodeValue());
}
return records.toArray(new String[records.size()]);
}
catch (Exception e) {
System.out.println("There is error in query string");
return records.toArray(new String[records.size()]);
}
}
public boolean updateQuery(String query,String value){
xmlFileSettings();
try{
NodeList nodes = (NodeList) xpath.evaluate(query, doc, XPathConstants.NODESET);
for (int idx = 0; idx < nodes.getLength(); idx++) {
nodes.item(idx).setTextContent(value);
}
Transformer xformer = TransformerFactory.newInstance().newTransformer();
xformer.transform(new DOMSource(doc), new StreamResult(new File(this.xmlFile)));
return true;
}catch(Exception e){
System.out.println(e);
return false;
}
}
public static void main(String args[]){
JXML jxml = new JXML("c://user.xml");
jxml.updateQuery("//Order/CustomerId/text()","222");
String result[]=jxml.selectQuery("//Order/Item/*/text()");
for(int i=0;i<result.length;i++){
System.out.println(result[i]);
}
}
}

Parsing Xml with SAX Parser

I am trying to parse an xml file with SAX Parser.
I need to get attributes and it's values of a start element
<?xml version="1.0" encoding="ISO-8859-1" ?>
<API type="Connection">
<INFO server="com.com" function="getAccount2" />
<RESULT code="0">Operation Succeeded</RESULT>
<RESPONSE numaccounts="1">
<ACCOUNT login="fa051981" skynum="111111" maxaliases="1" creationdate="Fri Nov 16 00:59:59 2001" password="pass" type="2222" status="open" mnemonic="32051981" ratelimit="0">
<CHECKATTR />
<REPLYATTR>Service-Type = Frames-User, Framed-Protocol = PPP, Framed-Routing = None</REPLYATTR>
<SETTINGS bitval="4" status="open" />
<SETTINGS bitval="8192" status="open" session_timeout="10800" />
<SETTINGS bitval="32768" status="open" cisco_address_pool="thepool" />
<ALIASES numaliases="0" />
</ACCOUNT>
</RESPONSE>
</API>
IN this xml, I need to get Settings tag/start element attributes along with it's values.
These attributes are dynamic, so I am trying to make a map of them. I am new to SAX Parser.
So far my java code:
public void startElement(String s, String s1, String elementName, Attributes attributes) throws SAXException {
if (elementName.equalsIgnoreCase(GenericConstants.INFO)) {
this.searchRaidusBean.setServer(attributes.getValue(GenericConstants.SERVER));
this.searchRaidusBean.setFunction(attributes.getValue(GenericConstants.FUNCTION));
}
if (elementName.equalsIgnoreCase(GenericConstants.RESULT)) {
this.searchRaidusBean.setResultCode(attributes.getValue(GenericConstants.CODE));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setLoginId(attributes.getValue(GenericConstants.LOGIN));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setSkyNum(attributes.getValue(GenericConstants.SKYNUM));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setMaxAliases(attributes.getValue(GenericConstants.MAXALIASES));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setCreationDate(attributes.getValue(GenericConstants.CREATION_DATE));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setType(attributes.getValue(GenericConstants.TYPE));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setStatus(attributes.getValue(GenericConstants.STATUS));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setMnemonic(attributes.getValue(GenericConstants.MNEMONIC));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setRateLimit(attributes.getValue(GenericConstants.RATELIMIT));
}
if (elementName.equalsIgnoreCase(GenericConstants.SETTINGS)) {
//this.searchRaidusBean.getBitval().add(attributes.getValue(GenericConstants.BITVAL));
System.out.println(attributes);
//stuck here
}
if (elementName.equalsIgnoreCase(GenericConstants.ALIASES)) {
this.tempKey = attributes.getValue(GenericConstants.MNEMONIC);
}
}
public void endElement(String str1, String str2, String element) throws SAXException {
if (element.equalsIgnoreCase(GenericConstants.RESULT)) {
this.searchRaidusBean.setResultMessage(this.tempValue);
}
if (element.equalsIgnoreCase(GenericConstants.ALIASES)) {
if (!StringUtils.isBlank(this.tempKey)) {
this.searchRaidusBean.getAlias().put(this.tempKey, this.tempValue);
}
}
}
public void characters(char[] charArray, int i, int j) throws SAXException {
this.tempValue = new String(charArray, i, j);
}
If you are using the DefaultHandler, then you will be receiving a startElement event.
This method carries the Attributes as one of it's parameters.
You will need to use getIndex(String) to get the index of the named attribute and getValue(int) to get the value of said attribute.
As Nambari has pointed out, there are hundreds of tutorials on the internet and more then a few posts on the subject on SO (I answered one over the weekend).
UPDATED
I'd suggest it should look something like this (I've not tested it)
public void startElement(String uri, String localName, String elementName, Attributes attributes) throws SAXException {
if (elementName.equalsIgnoreCase(GenericConstants.INFO)) {
this.searchRaidusBean.setServer(attributes.getValue(GenericConstants.SERVER));
this.searchRaidusBean.setFunction(attributes.getValue(GenericConstants.FUNCTION));
}
if (elementName.equalsIgnoreCase(GenericConstants.RESULT)) {
this.searchRaidusBean.setResultCode(attributes.getValue(GenericConstants.CODE));
}
if (elementName.equalsIgnoreCase(GenericConstants.ACCOUNT)) {
this.searchRaidusBean.setLoginId(attributes.getValue(GenericConstants.LOGIN));
this.searchRaidusBean.setSkyNum(attributes.getValue(GenericConstants.SKYNUM));
this.searchRaidusBean.setMaxAliases(attributes.getValue(GenericConstants.MAXALIASES));
this.searchRaidusBean.setCreationDate(attributes.getValue(GenericConstants.CREATION_DATE));
this.searchRaidusBean.setType(attributes.getValue(GenericConstants.TYPE));
this.searchRaidusBean.setStatus(attributes.getValue(GenericConstants.STATUS));
this.searchRaidusBean.setMnemonic(attributes.getValue(GenericConstants.MNEMONIC));
this.searchRaidusBean.setRateLimit(attributes.getValue(GenericConstants.RATELIMIT));
}
if (elementName.equalsIgnoreCase(GenericConstants.SETTINGS)) {
for (int index = 0; index < attributes.getLength(); index++) {
String attName = attributes.getLocalName(index);
String value = attributes.getValue(index);
map.put(attName, value);
}
}
if (elementName.equalsIgnoreCase(GenericConstants.ALIASES)) {
this.tempKey = attributes.getValue(GenericConstants.MNEMONIC);
}
}
UPDATED with tested example
I took you data (from the OP) and run it through the following handler
DefaultHandler handler = new DefaultHandler() {
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("settings")) {
System.out.println("Parse settings attributes...");
for (int index = 0; index < attributes.getLength(); index++) {
String aln = attributes.getLocalName(index);
String value = attributes.getValue(index);
System.out.println(" " + aln + " = " + value);
}
}
}
};
And I got the following output
Parse settings attributes...
bitval = 4
status = open
Parse settings attributes...
bitval = 8192
status = open
session_timeout = 10800
Parse settings attributes...
bitval = 32768
status = open
cisco_address_pool = thepool
So I don't know what you're doing.

Categories