How to Parse Big (50 GB) XML Files in Java

How to Parse Big (50 GB) XML Files in Java - java

Currently im trying to use a SAX Parser but about 3/4 through the file it just completely freezes up, i have tried allocating more memory etc but not getting any improvements.
Is there any way to speed this up? A better method?
Stripped it to bare bones, so i now have the following code and when running in command line it still doesn't go as fast as i would like.
Running it with "java -Xms-4096m -Xmx8192m -jar reader.jar" i get a GC overhead limit exceeded around article 700000
Main:
public class Read {
public static void main(String[] args) {
pages = XMLManager.getPages();
}
}
XMLManager
public class XMLManager {
public static ArrayList<Page> getPages() {
ArrayList<Page> pages = null;
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
File file = new File("..\\enwiki-20140811-pages-articles.xml");
PageHandler pageHandler = new PageHandler();
parser.parse(file, pageHandler);
pages = pageHandler.getPages();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return pages;
}
}
PageHandler
public class PageHandler extends DefaultHandler{
private ArrayList<Page> pages = new ArrayList<>();
private Page page;
private StringBuilder stringBuilder;
private boolean idSet = false;
public PageHandler(){
super();
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
stringBuilder = new StringBuilder();
if (qName.equals("page")){
page = new Page();
idSet = false;
} else if (qName.equals("redirect")){
if (page != null){
page.setRedirecting(true);
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (page != null && !page.isRedirecting()){
if (qName.equals("title")){
page.setTitle(stringBuilder.toString());
} else if (qName.equals("id")){
if (!idSet){
page.setId(Integer.parseInt(stringBuilder.toString()));
idSet = true;
}
} else if (qName.equals("text")){
String articleText = stringBuilder.toString();
articleText = articleText.replaceAll("(?s)<ref(.+?)</ref>", " "); //remove references
articleText = articleText.replaceAll("(?s)\\{\\{(.+?)\\}\\}", " "); //remove links underneath headings
articleText = articleText.replaceAll("(?s)==See also==.+", " "); //remove everything after see also
articleText = articleText.replaceAll("\\|", " "); //Separate multiple links
articleText = articleText.replaceAll("\\n", " "); //remove new lines
articleText = articleText.replaceAll("[^a-zA-Z0-9- \\s]", " "); //remove all non alphanumeric except dashes and spaces
articleText = articleText.trim().replaceAll(" +", " "); //convert all multiple spaces to 1 space
Pattern pattern = Pattern.compile("([\\S]+\\s*){1,75}"); //get first 75 words of text
Matcher matcher = pattern.matcher(articleText);
matcher.find();
try {
page.setSummaryText(matcher.group());
} catch (IllegalStateException se){
page.setSummaryText("None");
}
page.setText(articleText);
} else if (qName.equals("page")){
pages.add(page);
page = null;
}
} else {
page = null;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
stringBuilder.append(ch,start, length);
}
public ArrayList<Page> getPages() {
return pages;
}
}

Your parsing code is likely working fine, but the volume of data you're loading is probably just too large to hold in memory in that ArrayList.
You need some sort of pipeline to pass the data on to its actual destination without ever
store it all in memory at once.
What I've sometimes done for this sort of situation is similar to the following.
Create an interface for processing a single element:
public interface PageProcessor {
void process(Page page);
}
Supply an implementation of this to the PageHandler through a constructor:
public class Read {
public static void main(String[] args) {
XMLManager.load(new PageProcessor() {
#Override
public void process(Page page) {
// Obviously you want to do something other than just printing,
// but I don't know what that is...
System.out.println(page);
}
}) ;
}
}
public class XMLManager {
public static void load(PageProcessor processor) {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
File file = new File("pages-articles.xml");
PageHandler pageHandler = new PageHandler(processor);
parser.parse(file, pageHandler);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Send data to this processor instead of putting it in the list:
public class PageHandler extends DefaultHandler {
private final PageProcessor processor;
private Page page;
private StringBuilder stringBuilder;
private boolean idSet = false;
public PageHandler(PageProcessor processor) {
this.processor = processor;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//Unchanged from your implementation
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
//Unchanged from your implementation
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
// Elide code not needing change
} else if (qName.equals("page")){
processor.process(page);
page = null;
}
} else {
page = null;
}
}
}
Of course, you can make your interface handle chunks of multiple records rather than just one and have the PageHandler collect pages locally in a smaller list and periodically send the list off for processing and clear the list.
Or (perhaps better) you could implement the PageProcessor interface as defined here and build in logic there that buffers the data and sends it on for further handling in chunks.

Don Roby's approach is somewhat reminiscent to the approach I followed creating a code generator designed to solve this particular problem (an early version was conceived in 2008). Basically each complexType has its Java POJO equivalent and handlers for the particular type are activated when the context changes to that element. I used this approach for SEPA, transaction banking and for instance discogs (30GB). You can specify what elements you want to process at runtime, declaratively using a propeties file.
XML2J uses mapping of complexTypes to Java POJOs on the one hand, but lets you specify events you want to listen on.
E.g.
account/#process = true
account/accounts/#process = true
account/accounts/#detach = true
The essence is in the third line. The detach makes sure individual accounts are not added to the accounts list. So it won't overflow.
class AccountType {
private List<AccountType> accounts = new ArrayList<>();
public void addAccount(AccountType tAccount) {
accounts.add(tAccount);
}
// etc.
};
In your code you need to implement the process method (by default the code generator generates an empty method:
class AccountsProcessor implements MessageProcessor {
static private Logger logger = LoggerFactory.getLogger(AccountsProcessor.class);
// assuming Spring data persistency here
final String path = new ClassPathResource("spring-config.xml").getPath();
ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(path);
AccountsTypeRepo repo = context.getBean(AccountsTypeRepo.class);
#Override
public void process(XMLEvent evt, ComplexDataType data)
throws ProcessorException {
if (evt == XMLEvent.END) {
if( data instanceof AccountType) {
process((AccountType)data);
}
}
}
private void process(AccountType data) {
if (logger.isInfoEnabled()) {
// do some logging
}
repo.save(data);
}
}
Note that XMLEvent.END marks the closing tag of an element. So, when you are processing it, it is complete. If you have to relate it (using a FK) to its parent object in the database, you could process the XMLEvent.BEGIN for the parent, create a placeholder in the database and use its key to store with each of its children. In the final XMLEvent.END you would then update the parent.
Note that the code generator generates everything you need. You just have to implement that method and of course the DB glue code.
There are samples to get you started. The code generator even generates your POM files, so you can immediately after generation build your project.
The default process method is like this:
#Override
public void process(XMLEvent evt, ComplexDataType data)
throws ProcessorException {
/*
* TODO Auto-generated method stub implement your own handling here.
* Use the runtime configuration file to determine which events are to be sent to the processor.
*/
if (evt == XMLEvent.END) {
data.print( ConsoleWriter.out );
}
}
Downloads:
https://github.com/lolkedijkstra/xml2j-core
https://github.com/lolkedijkstra/xml2j-gen
https://sourceforge.net/projects/xml2j/
First mvn clean install the core (it has to be in the local maven repo), then the generator. And don't forget to set up the environment variable XML2J_HOME as per directions in the usermanual.

Related

Xstream stops writing to file when not finished

I have written some code to generate passwords for users that were written to sql before. Then I wanted to write each user with username and password to xml. The code seems to work fine except at around 200th user it suddenly stops halfway through xml tag and ends, which is pretty weird. I'm using Xstream as my library. The Arraylist has like 215 users.
I tried StaxDriver and DomDriver. The Stax Driver result was same as empty Xstream constructor, but Dom was even worse.
XStream xstream = new XStream();
xstream.alias("Zakaznici", ListZakazniku.class);
try {
PrintWriter out = new PrintWriter("Zakaznici.xml");
out.write(xstream.toXML(ListZakazniku.zakaznici));
}catch (Exception e){
e.printStackTrace();
}
public class ListZakazniku {
public static ArrayList<Zakaznik> zakaznici = new ArrayList<>();
public ListZakazniku(){
zakaznici= new ArrayList<Zakaznik>();
}
public void setZakaznici(ArrayList<Zakaznik> zakaznik){
this.zakaznici.clear();
this.zakaznici = zakaznik;
}
public static ArrayList<Zakaznik> getZakaznici() {
return zakaznici;
}
public void add(Zakaznik elbow){
zakaznici.add(elbow);
}
and Zakaznik is pretty basic object with username, password, id....
the cut was like
</Zakaznik>
<Zaka
I don't know what's wrong with it. Im looking forward to any suggestions :)

Your list should not be static, also slightly modified your printing code. An approach like this will work fine:
#XStreamAlias("listZakazniku")
public class ListZakazniku {
private List<Zakaznik> zakaznicis;
public ListZakazniku() {
zakaznicis = new ArrayList<Zakaznik>();
}
public void add(Zakaznik user) {
zakaznicis.add(user);
}
#XStreamAlias("zakaznik")
private static class Zakaznik {
private String user;
private String pwd;
public Zakaznik(String user, String pwd) {
this.user = user;
this.pwd = pwd;
}
}
public static void main(String[] args){
XStream xstream = new XStream();
xstream.processAnnotations(ListZakazniku.class);
ListZakazniku ll = new ListZakazniku();
ll.add(new Zakaznik("user1", "pwd1"));
ll.add(new Zakaznik("user2", "pwd2"));
try {
try (PrintWriter out = new PrintWriter("Zakaznici.xml")) {
out.println(xstream.toXML(ll));
}
}catch (Exception e){
e.printStackTrace();
}
}
}
Output:
<listZakazniku>
<zakaznicis>
<zakaznik>
<user>user1</user>
<pwd>pwd1</pwd>
</zakaznik>
<zakaznik>
<user>user2</user>
<pwd>pwd2</pwd>
</zakaznik>
</zakaznicis>
</listZakazniku>
Don't forget the processAnnotations-call for each annotated class! (also, your Zakaznik is not an internal static class I guess like in my example above, this was just to squeeze in the complete code..)

How do I determine the line number where invalid XML occurs?

I need to support the situation where a user submits an invalid XML file to me and I report back to them information about the error. Ideally the location of the error (line number and column number) and the nature of the error.
My sample code (see below) works well enough when there is a missing tag or similar error. In that case, I get an approximate location and a useful explanation. However my code fails spectacularly when the XML file contains non-UTF-8 characters. In this case, I get a useless error:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
I cannot find a way to determine the line number where the invalid character might be, nor the character itself. Is there a way to do this?
If, as one comment suggests, it may not be possible as we don't get to the parsing step, is there a way to process the XML file, not with a parser, but simply line-by-line, looking for and reporting non-UTF-8 characters?
Sample code follows. First a basic error handler:
public class XmlErrorHandler implements ErrorHandler {
#Override
public void warning(SAXParseException e) throws SAXException {
show("Warning", e); throw e;
}
#Override
public void error(SAXParseException e) throws SAXException {
show("Error", e); throw e;
}
#Override
public void fatalError(SAXParseException e) throws SAXException {
show("Fatal", e); throw e;
}
private void show(String type, SAXParseException e) {
System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
System.out.println(type + ": " + e.getMessage());
}
}
And a trivial test program:
public class XmlTest {
public static void main(String[] args) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(new DefaultHandler());
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
}
}
}
Sample XML File (with non-UTF-8 smart quotes from (say) a Word document):
<?xml version="1.0" encoding="UTF-8"?>
<example>
<![CDATA[Text with <91>smart quotes<92>.]]>
</example>

I had some success with identifying where the issue in the XML file is using a couple of approaches.
Adapting the code from my question to use a home-grown ContentHandler with a Locator (see below) demonstrated that the XML was being processed up until the invalid character is encountered. In particular, the line number is being tracked. Preserving the line number allowed it to be retrieved from the ContentHandler when the problematic exception occurs.
At this point, I came up with two possibilities. The first is to re-run the processing with a different encoding on the InputStream, eg. Windows-1252. Parsing completed without error in this instance and I was able to retrieve the characters on the line with the known issue. This allows for a reasonably useful error message to the user, ie. line number and the characters.
My second approach was to adapt the code from the top-rated answer to this SO question. This code allows you to find the first non-UTF-8 character in a byte stream. If you assume that 0x0A (linefeed) represents a new line in the XML (and this seems to work pretty well in practice), then the line number, column number and the invalid characters can be extracted easily enough for a precise error message.
// Modified test program
public class XmlTest {
public static void main(String[] args) {
ErrorFinder errorFinder = new ErrorFinder(0); // Create our own content handler
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(errorFinder); // Use instead of the default handler
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
// Option 1: repeat parsing (see above) with a new ErrorFinder initialised thus:
ErrorFinder ef2 = new ErrorFinder(errorFinder.getCurrentLineNumber()); // and
is.setEncoding("Windows-1252");
}
}
}
// Content handler with irrelevant method implementations elided.
public class ErrorFinder implements ContentHandler {
private int lineNumber; // If non-zero, the line number to retrieve characters for.
private int currentLineNumber;
private char[] chars;
private Locator locator;
public ErrorFinder(int lineNumber) {
super();
this.lineNumber = lineNumber;
}
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
#Override
public void startDocument() throws SAXException {
currentLineNumber = locator.getLineNumber();
}
... // Skip other over-ridden methods as they have same code as startDocument().
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
currentLineNumber = locator.getLineNumber();
if (currentLineNumber == lineNumber) {
char[] c = new char[length];
System.arraycopy(ch, start, c, 0, length);
chars = c;
}
}
public int getCurrentLineNumber() {
return currentLineNumber;
}
public char[] getChars() {
return chars;
}
}

Whats the Best Practice to call a method out of a Callback-Response?

I'm using an asyncronus XML-RPC-Client (https://github.com/gturri/aXMLRPC) in my Project and wrote some methods using the asyncronous Callback-Methods of this Client like this this:
public void xmlRpcMethod(final Object callbackSync) {
XMLRPCCallback listener = new XMLRPCCallback() {
public void onResponse(long id, final Object result) {
// Do something
if (callbackSync != null) {
synchronized (callbackSync) {
callbackSync.notify();
}
}
}
public void onError(long id, final XMLRPCException error) {
// Do something
if (callbackSync != null) {
synchronized (callbackSync) {
callbackSync.notify();
}
}
}
public void onServerError(long id, final XMLRPCServerException error) {
Log.e(TAG, error.getMessage());
if (callbackSync != null) {
synchronized (callbackSync) {
callbackSync.notifyAll();
}
}
}
};
XMLRPCClient client = new XMLRPCClient("<url>");
long id = client.callAsync(listener, "<method>");
}
In other methods I like to call this method (here "xmlRpcMethod") and wait until it finished. I wrote methods like this:
public void testMethod(){
Object sync = new Object();
xmlRpcMethod(sync);
synchronized (sync){
try{
sync.wait();
}catch(Interrupted Exception e){
e.printStackTrace();
}
}
// Do something after xmlRcpFinished
}
But this way of waiting and synchronizing get's ugly when the projects grows larger and I need to wait for many requests to finish.
So is this the only possible / best way? Or does someone knows a better solution?

My first shot to create blocking RPC calls would be:
// Little helper class:
class RPCResult<T>{
private final T result;
private final Exception ex;
private final long id;
public RPCResult( long id, T result, Exception ex ){
// TODO set fields
}
// TODO getters
public boolean hasError(){ return null != this.ex; }
}
public Object xmlRpcMethod() {
final BlockingQueue<RPCResult> pipe = new ArrayBlockingQueue<RPCResult>(1);
XMLRPCCallback listener = new XMLRPCCallback() {
public void onResponse(long id, final Object result) {
// Do something
pipe.put( new RPCResult<Object>(id, result, null) );
}
public void onError(long id, final XMLRPCException error) {
// Do something
pipe.put( new RPCResult<Object>(id, null, error) );
}
public void onServerError(long id, final XMLRPCServerException error) {
Log.e(TAG, error.getMessage());
pipe.put(new RPCResult<Object>(id, null, error));
}
};
XMLRPCClient client = new XMLRPCClient("<url>");
long id = client.callAsync(listener, "<method>");
RPCResult result = pipe.take(); // blocks until there is an element available
// TODO: catch and handle InterruptedException!
if( result.hasError() ) throw result.getError(); // Relay Exceptions - do not swallow them!
return result.getResult();
}
Client:
public void testMethod(){
Object result = xmlRpcMethod(); // blocks until result is available or throws exception
}
Next step would be to make a strongly typed version public T xmlRpcMethod().

Java: SAX Parsing a huge XML file

I have a 35 GB XML file (yes, some organizations do that and I have no control over it) that I would like to SAX parse. I found an example here:
http://www.java2s.com/Code/Java/XML/SAXDemo.htm
of how to run a SAX parser and avoid loading everything. However, I get an out of memory error immediatly. Why does this happens and how I can make this code perfectly scalable for any XML file size?
Here my code:
import org.apache.log4j.Logger;
import org.xml.sax.AttributeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class XMLSAXTools extends org.xml.sax.helpers.DefaultHandler {
/**
* Logging facility
*/
static Logger logger = Logger.getLogger(XMLSAXTools.class);
private String fileName = "C:/Data/hugefile.xml";
private int counter = 0;
/** The main method sets things up for parsing */
public void test() throws IOException, SAXException,
ParserConfigurationException {
// Create a JAXP "parser factory" for creating SAX parsers
javax.xml.parsers.SAXParserFactory spf = SAXParserFactory.newInstance();
// Configure the parser factory for the type of parsers we require
spf.setValidating(false); // No validation required
// Now use the parser factory to create a SAXParser object
// Note that SAXParser is a JAXP class, not a SAX class
javax.xml.parsers.SAXParser sp = spf.newSAXParser();
// Create a SAX input source for the file argument
org.xml.sax.InputSource input = new InputSource(new FileReader(fileName));
// Give the InputSource an absolute URL for the file, so that
// it can resolve relative URLs in a <!DOCTYPE> declaration, e.g.
input.setSystemId("file://" + new File(fileName).getAbsolutePath());
// Create an instance of this class; it defines all the handler methods
XMLSAXTools handler = new XMLSAXTools();
// Finally, tell the parser to parse the input and notify the handler
sp.parse(input, handler);
// Instead of using the SAXParser.parse() method, which is part of the
// JAXP API, we could also use the SAX1 API directly. Note the
// difference between the JAXP class javax.xml.parsers.SAXParser and
// the SAX1 class org.xml.sax.Parser
//
// org.xml.sax.Parser parser = sp.getParser(); // Get the SAX parser
// parser.setDocumentHandler(handler); // Set main handler
// parser.setErrorHandler(handler); // Set error handler
// parser.parse(input); // Parse!
}
StringBuffer accumulator = new StringBuffer(); // Accumulate parsed text
String servletName; // The name of the servlet
String servletClass; // The class name of the servlet
String servletId; // Value of id attribute of <servlet> tag
// When the parser encounters plain text (not XML elements), it calls
// this method, which accumulates them in a string buffer
public void characters(char[] buffer, int start, int length) {
accumulator.append(buffer, start, length);
}
// Every time the parser encounters the beginning of a new element, it
// calls this method, which resets the string buffer
public void startElement(String name, AttributeList attributes) {
accumulator.setLength(0); // Ready to accumulate new text
if (name.equals("item")) {
logger.info("item tag opened");
counter++;
}
}
// When the parser encounters the end of an element, it calls this method
public void endElement(String name) {
if (name.equals("item")) {
logger.info("item tag closed. Counter: " + counter);
}
}
/** This method is called when warnings occur */
public void warning(SAXParseException exception) {
System.err.println("WARNING: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
}
/** This method is called when errors occur */
public void error(SAXParseException exception) {
System.err.println("ERROR: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
}
/** This method is called when non-recoverable errors occur. */
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
throw (exception);
}
public static void main(String[] args){
XMLSAXTools t = new XMLSAXTools();
try {
t.test();
} catch (Exception e){
logger.error("Exception in XMLSAXTools: " + e.getMessage());
e.printStackTrace();
}
}
}

You are filling up your accumulator without ever emptying it - this is unlikely to be what you want.
Just using SAX is not sufficient to ensure you do not run out of memory - you still need to implement the code that finds, selects and processes what you do need from the xml and discards the rest.
Here's a fairly simple parser that is designed to be run in a separate thread. It communicates with the calling thread via n ArrayBlockingQueue<String> queue which is defined in an enclosing class.
The huge data files I have to deal with are essentially <Batch> ... a few thousand items ... </Batch>. This parser pulls each item out and presents them one-at-a-time through the blocking queue. One day I will turn them into XOM Elements but atm it uses Strings.
Notice how it clears down its temporary data fields when enque is called to ensure we don't run out of memory:
private class Parser extends DefaultHandler {
// Track the depth of the xml - whenever we hit level 1 we add the accumulated xml to the queue.
private int level = 0;
// The current xml fragment.
private final StringBuilder xml = new StringBuilder();
// We've had a start tag but no data yet.
private boolean tagWithNoData = false;
/*
* Called when the starting of the Element is reached. For Example if we have Tag
* called <Title> ... </Title>, then this method is called when <Title> tag is
* Encountered while parsing the Current XML File. The AttributeList Parameter has
* the list of all Attributes declared for the Current Element in the XML File.
*/
#Override
public void startElement(final String uri, final String localName, final String name, final Attributes atrbts) throws SAXException {
checkForAbort();
// Have we got back to level 1 yet?
if (level == 1) {
// Emit any built ones.
try {
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
// Add it on.
if (level > 0) {
// The name.
xml.append("<").append(name);
// The attributes.
for (int i = 0; i < atrbts.getLength(); i++) {
final String att = atrbts.getValue(i);
xml.append(" ").append(atrbts.getQName(i)).append("=\"").append(XML.to(att)).append("\"");
}
// Done.
xml.append(">");
// Remember we've not had any data yet.
tagWithNoData = true;
}
// Next element is a sub-element.
level += 1;
}
/*
* Called when the Ending of the current Element is reached. For example in the
* above explanation, this method is called when </Title> tag is reached
*/
#Override
public void endElement(final String uri, final String localName, final String name) throws SAXException {
checkForAbort();
if (level > 1) {
if (tagWithNoData) {
// No data. Make the > into a />
xml.insert(xml.length() - 1, "/");
// I've closed this one but the enclosing one has data (i.e. this one).
tagWithNoData = false;
} else {
// Had data, finish properly.
xml.append("</").append(name).append(">");
}
}
// Done with that level.
level -= 1;
if (level == 1) {
// Finished and at level 1.
try {
// Enqueue the results.
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
}
/*
* Called when the data part is encountered.
*/
#Override
public void characters(final char buf[], final int offset, final int len) throws SAXException {
checkForAbort();
// I want it trimmed.
final String chs = new String(buf, offset, len).trim();
if (chs.length() > 0) {
// Grab that data.
xml.append(XML.to(chs));
tagWithNoData = false;
}
}
/*
* Called when the Parser starts parsing the Current XML File.
*/
#Override
public void startDocument() throws SAXException {
checkForAbort();
tagWithNoData = false;
}
/*
* Called when the Parser Completes parsing the Current XML File.
*/
#Override
public void endDocument() throws SAXException {
checkForAbort();
try {
// Enqueue the results.
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
private void enqueue() throws InterruptedException, SAXException {
// We may have been closed while blocking on the queue.
checkForAbort();
final String x = xml.toString().trim();
if (x.length() > 0) {
// Add it to the queue.
queue.put(x);
// Clear out.
xml.setLength(0);
tagWithNoData = false;
}
// We may have been closed while blocking on the queue.
checkForAbort();
}
private void checkForAbort() throws XMLInnerDocumentIteratorAbortedException {
if (iteratorFinished) {
LOGGER.debug("Aborting!!!");
throw new XMLInnerDocumentIterator.XMLInnerDocumentIteratorAbortedException("Aborted!");
}
}
}
}

Call a web service and parse xml response in blackberry

Currently I have a ready design for blackberry application.
Now, I need to call the web service in my app, and that web service will give me some xml response.
So, I need to parse that response from xml to some POJO.
So, for parsing the xml response should I go with the basic DOM praser, or should I use any other J2ME specific prasing concept ?
If anybody have any sample tutorial link for the same then it would be very much useful to me.
Thanks in advance....

It depends on what your web service serves.
If it is REST-based, you're likely responsible to parse the XML yourself, with a library. I've only ever used kXml 2, a J2ME library that can be used on BlackBerry devices. To use it, it's best to link to the source (otherwise, you have to preverify the jar and export it and that never seems to work for me). It's a forward-only pull parser, similar to XmlReader in .NET, if you're familiar with that.
If your web service is WS*-based (i.e. it uses SOAP), you can use a stub generator to generate a client class that you can use. BlackBerry supports JSR 172, the web services API for J2ME. The WTK has a stub generator that works well. Just point the generator to your web service's wsdl file. A web search should clarify how to use it.

Add your xml file data in to strXML
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
InputStream inputStream = new ByteArrayInputStream(strXML.getBytes("UTF-8"));
Document document = builder.parse( inputStream );
Element rootElement = document.getDocumentElement();
rootElement.normalize();
blnViewReport=false;
listNodes(rootElement); // use this function to parse the xml
inputStream.close();
void listNodes(Node node)
{
Node tNode;
String strData;
String nodeName = node.getNodeName();
if( nodeName.equals("Tagname"))
{
tNode=node.getFirstChild();
if(tNode.getNodeType() == Node.TEXT_NODE)
{
// here you get the specified tag value
}
}
else if(nodeName.equals(“Tag name 2”))
.....
.....
NodeList list = node.getChildNodes();
if(list.getLength() > 0)
{
for(int i = 0 ; i<list.getLength() ; i++)
{
listNodes(list.item(i));
}
}
}

I believe that you have recieved the request object.
I will give the code I used to parse the request object from XML.
_value is the object
System.out.println("value="+_value);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser parser = null; // create a parser
try {
parser = factory.newSAXParser();
}
catch (ParserConfigurationException e1)
{
System.out.println("ParserConfigurationException"+e1.getMessage());
}
catch (SAXException e1)
{
System.out.println("SAXException"+e1.getMessage());
}
// instantiate our handler
PharmacyDataXMLHandler pharmacydataXMLHandler= new PharmacyDataXMLHandler();
ByteArrayInputStream objBAInputStream = new java.io.ByteArrayInputStream(_value.getBytes());
InputSource inputSource = new InputSource(objBAInputStream);
// perform the synchronous parse
try {
parser.parse(inputSource, pharmacydataXMLHandler);
} catch (SAXException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
_pharmacydataList = pharmacydataXMLHandler.getpharmacydataList();
}
public class PharmacyDataXMLHandler extends DefaultHandler
{
private Vector _pharmacyDataList = new Vector();
PharmacyData _pharmacydata;
StringBuffer _sb = null;
public void warning(SAXParseException e) {
System.err.println("warning: " + e.getMessage());
}
public void error(SAXParseException e) {
System.err.println("error: " + e.getMessage());
}
public void fatalError(SAXParseException e) {
System.err.println("fatalError: " + e.getMessage());
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
try{
_sb = new StringBuffer("");
if(localName.equals("Table"))
{
_pharmacydata= new PharmacyData();
}
}catch (Exception e) {
System.out.println(""+e.getMessage());
}
}
public void endElement(String namespaceURI, String localName, String qName) throws SAXException
{
try{
if(localName.equals("ID"))
{
// System.out.println("Id :"+sb.toString());
this._pharmacydata.setId(_sb.toString());
}
else if(localName.equals("Name"))
{
//System.out.println("name :"+sb.toString());
this._pharmacydata.setName(_sb.toString());
}
else if(localName.equals("PharmacyID"))
{
// System.out.println("pharmacyId :"+sb.toString());
this._pharmacydata.setPharmacyId(_sb.toString());
}
else if(localName.equals("Password"))
{
// System.out.println("password :"+sb.toString());
this._pharmacydata.setPassword(_sb.toString());
}
else if(localName.equals("Phone"))
{
// System.out.println("phone:"+sb.toString());
this._pharmacydata.setPhone(_sb.toString());
}
else if(localName.equals("Transmit"))
{
//System.out.println("transmit"+sb.toString());
this._pharmacydata.setTransmit(_sb.toString());
}
else if(localName.equals("TimeZone"))
{
// System.out.println("timeZone"+sb.toString());
this._pharmacydata.setTimeZone(_sb.toString());
}
else if(localName.equals("FaxModem"))
{
// System.out.println("faxModem"+sb.toString());
this._pharmacydata.setFaxModem(_sb.toString());
}
else if(localName.equals("VoicePhone"))
{
// System.out.println("voicePhone"+sb.toString());
this._pharmacydata.setVoicePhone(_sb.toString());
}
else if(localName.equals("ZipCode"))
{
// System.out.println("zipCode"+sb.toString());
this._pharmacydata.setZipCode(_sb.toString());
}
else if(localName.equals("Address"))
{
// System.out.println("address"+sb.toString());
this._pharmacydata.setAddress(_sb.toString());
}
else if(localName.equals("City"))
{
// System.out.println("city"+sb.toString());
this._pharmacydata.setCity(_sb.toString());
}
else if(localName.equals("State"))
{
// System.out.println("state"+sb.toString());
this._pharmacydata.setState(_sb.toString());
}
else if(localName.equals("WebInterface"))
{
// System.out.println("webInterface"+sb.toString());
this._pharmacydata.setWebInterface(_sb.toString());
}
else if(localName.equals("NABPnumber"))
{
// System.out.println("nabPnumber"+sb.toString());
this._pharmacydata.setNabPnumber(_sb.toString());
}
else if(localName.equals("ServiceType"))
{
// System.out.println("serviceType:"+sb.toString());
this._pharmacydata.setServiceType(_sb.toString());
}
else if(localName.equals("Mobile"))
{
// System.out.println("mobile:"+sb.toString());
this._pharmacydata.setMobile(_sb.toString());
}
else if(localName.equals("Table"))
{
// System.out.println("end table:"+sb.toString());
_pharmacyDataList.addElement(_pharmacydata);
}
}catch (Exception e) {
System.out.println(""+e.getMessage());
}
}
public void characters(char ch[], int start, int length) {
String theString = new String(ch, start, length);
_sb.append(theString);
}
/**
* #return the PharmacyDataList
*/
public Vector getpharmacydataList()
{
return _pharmacyDataList;
}
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to Parse Big (50 GB) XML Files in Java - java

Related

Xstream stops writing to file when not finished

How do I determine the line number where invalid XML occurs?

Whats the Best Practice to call a method out of a Callback-Response?

Java: SAX Parsing a huge XML file

Call a web service and parse xml response in blackberry

Categories

Resources