parsing XML in Java using SAX: value cut in 2 halves - java

I am trying to read a file format that is based on xml and is called mzXML using SAX in JAVA. It carries partially encoded mass spectrometric data (signals with intensities).
This is what the entry of interest looks like (there is more information around that):
<peaks ... >eJwBgAN//EByACzkZJkHP/NlAceAXLJAckeQ4CIUJz/203q2...</peaks>
A complete file that forces the Error in my case can be downloaded here.
The String in one of these entries holds about 500 compressed and base64 encoded pairs of doubles (signals and intensities). What I do is to decompress and decode, to get the values (decoding not shown in the example below). That is all working fine on a small dataset. Now I used a bigger one and i ran into a problem that I don´t understand:
The procedure characters(ch,start,length) does not read the complete entry in the line shown before. The length-value seems to be to small.
I did not see that problem, when I just printed the peaks entry to the console, as there are a lot of letters and I did´nt recognize letters were missing. But the decompression fails, when there is information missing. When I repeatedly run this program, it always breaks the same line at the same point without giving any Exception. If I change the mzXML file by e.g. deleting a scan, it breaks at a different position. I found this out using breakpoints in the character() procedure looking at the content of currentValue
Here is the piece of code necessary to recapitulate the problem:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.DataFormatException;
import java.util.zip.Inflater;
import javax.xml.bind.DatatypeConverter;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFile {
public static byte[] decompress(byte[] data) throws IOException, DataFormatException {
Inflater inflater = new Inflater();
inflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[data.length*2];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] output = outputStream.toByteArray();
return output;
}
public static void main(String args[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean peaks = false;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("PEAKS")) {
peaks = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (peaks) {peaks = false;}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (peaks) {
String currentValue = new String(ch, start, length);
System.out.println(currentValue);
try {
byte[] array = decompress(DatatypeConverter.parseBase64Binary(currentValue));
System.out.println(array[1]);
} catch (IOException | DataFormatException e) {e.printStackTrace();}
peaks = false;
}
}
};
saxParser.parse("file1_zlib.mzxml", handler);
} catch (Exception e) {e.printStackTrace();}
}
}
Is there a safer way to read large xml files? Can you tell me where the error comes from or how to avoid it?
Thanks, Michael

The procedure characters(ch,start,length) does not read the complete entry in the line shown before. The length-value seems to be to small.
That is precisely the way it is desgined to work. From the documentation of ContentHandler:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks.
Therefore, you should not try calling decompress inside the characters implementation. Instead, you should append the characters that you get to an expandable buffer, and call decompress only when you get the corresponding endElement:
StringBuilder sb = null;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("PEAKS")) {
sb = new StringBuilder();
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (sb == null) return;
try {
byte[] array = decompress(DatatypeConverter.parseBase64Binary(sb.toString()));
System.out.println(array[1]);
} catch (IOException | DataFormatException e) {e.printStackTrace();}
sb = null;
}
public void characters(char ch[], int start, int length) throws SAXException {
if (sb == null) return;
String currentValue = new String(ch, start, length);
sb.appens(currentValue);
}

Try this! Use a LinkedList to store the tag names at every startElement() and remove the last element using pollLast() at every endElement(). Use String.trim() to get the data from characters(). So everytime the characters() function returns some actual data (Use String.length()!=0) you can associate it with the last element (peekLast()) in the LinkedList
Then you can choose to append() it or may be do otherwise

Related

Not able to Catch Element using SAX Parser

I'm reading XML file using SAX parser utility.
Here is my sample XML
<?xml version="1.0"?><company><Account AccountNumber="100"><staff><firstname>yong</firstname><firstname>jin</firstname></staff></Account></company>
Here is the code
import java.util.Arrays;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXML {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bAccount = false;
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("ACCOUNT")) {
bAccount = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.println("Im here:" + bAccount);
if (bAccount) {
System.out.println("Account First Name : " + new String(ch, start, length));
bAccount = false;
StringBuilder Account = new StringBuilder();
for (int i = start; i < ch.length - 1; i--) {
if (String.valueOf(ch[i]).equals("<")) {
System.out.println("Account:" +Account);
break;
} else {
Account.append(ch[i]);
}
}
}
}
};
saxParser.parse("C:\\Lenny\\Work\\XML\\Out_SaxParsing_01.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
As you can see in XML, Account tag is something like this Account AccountNumber="100", What I want to do is, I want to capture Tag too as well.
So to achieve that, in characters method, I'm trying to read the array from right to left, So that I could get the Account AccountNumber="100", when Account encountered as event.
But am not able to reach there, The event is getting generated, but its not going to characters method. I think it should go into characters method once Account tag is encountered. But its not..!
May I know please what am missing or doing wrong ?
Any Help please..!
AccountNumber="100" is an attribute of the Account element so inside the startElement handler you have you can read out the attributes parameter to access that value.

Is there a way to have tika stop parsing a file once a match is found?

I have a Java 8 program that walks the directory tree from a user-supplied node, searching for files that match a list of user-supplied filename patterns.
The list of matched files can be filtered with an optional user-supplied String to find. The code checks for this string using the end result of parsing. This is really bad when huge files are found along the tree walk.
But it's bad anyway. As soon as the string to find is found, we're wasting time parsing the rest of the file.
Is there a way to have tika stop parsing a file once a match is found?
EDIT
The code that the program is based on:
package org.apache.tika.example;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.MalformedURLException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.mime.MimeTypeException;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class ParsingExample {
public static boolean contains(File file, String s) throws MalformedURLException,
IOException, MimeTypeException, SAXException, TikaException
{
InputStream stream = new FileInputStream(file);
AutoDetectParser parser = new AutoDetectParser();
BodyContentHandler handler = new BodyContentHandler(-1);
Metadata metadata = new Metadata();
try
{
parser.parse(stream, handler, metadata);
return handler.toString().toLowerCase().contains(s.toLowerCase());
}
catch (IOException | SAXException | TikaException e)
{
System.out.println(file + ": " + e + "\n");
return false;
}
}
public static void main(String[] args)
{
try
{
System.out.println("File " + filename + " contains <" + searchString + "> : " + contains(new File(filename), searchString));
}
catch (IOException | SAXException | TikaException ex)
{
System.out.println("Error: " + ex);
}
}
static String parseExample = ":(";
static String searchString = "test";
static String filename = "test.doc";
}
Parser.parser returns all the text in the file for BodyContentHandler handler. There's no loop available to the implementer of a parser. None that I'm aware of; hence the question.
EDIT 2
What I really want to know, I guess, is whether there's a tika method that only reads n characters from a file instead of all. Then I could maybe stick a loop around it and exit if search string is found.
You can move query matching part into your own ContentHandler implementation (you can take DefaultHandler as base) with reassembling text from parts passed to ContentHander#characters(char[],int,int) and abort parsing by throwing exception there after match found.
It's definitely not a pretty solution but it should stop parsing.
UPD code sample:
public class InterruptableParsingExample {
private Tika tika = new Tika(); // for default autodetect parser
public boolean findInFile(String query, File file) {
Metadata metadata = new Metadata();
InterruptingContentHandler handler = new InterruptingContentHandler(query);
ParseContext context = new ParseContext();
context.set(Parser.class, tika.getParser());
try (InputStream is = new BufferedInputStream(new FileInputStream(file))) {
tika.getParser().parse(is, handler, metadata, context);
} catch (QueryMatchedException e) {
return true;
} catch (SAXException | TikaException | IOException e) {
// something went wrong with parsing...
e.printStackTrace();
}
return false;
}
}
class QueryMatchedException extends SAXException {}
class InterruptingContentHandler extends DefaultHandler {
private String query;
private StringBuilder sb = new StringBuilder();
InterruptingContentHandler(String query) {
this.query = query;
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
sb.append(new String(ch, start, length).toLowerCase());
if (sb.toString().contains(query))
throw new QueryMatchedException(); // interrupt parsing by throwing SaxException
if (sb.length() > 2 * query.length())
sb.delete(0, sb.length() - query.length()); // keep tail with query.length() chars
}
}
UPD2 Added to tika-example package: https://github.com/apache/tika/blob/trunk/tika-example/src/main/java/org/apache/tika/example/InterruptableParsingExample.java

Reading XML for getting entities

I'm using SAX (Simple API for XML) to parse an XML document. My purpose is to parse the document so that i can separate entities from the the XML and create an ER Diagram from these entities (which i will create manually after i get all the entities the file have).
Although i'm on very initial stage of coding every thing i have discussed above, but i' just stuck at this particular problem right now.
here is my code:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Parser extends DefaultHandler {
public void getXml() {
try {
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
final MySet openingTagList = new MySet();
final MySet closingTagList = new MySet();
DefaultHandler defaultHandler = new DefaultHandler() {
public void startDocument() throws SAXException {
System.out.println("Starting Parsing...\n");
}
public void endDocument() throws SAXException {
System.out.print("\n\nDone Parsing!");
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (!openingTagList.contains(qName)) {
openingTagList.add(qName);
System.out.print("<" + qName + ">");
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
for (int i = start; i < (start + length); i++) {
System.out.print(ch[i]);
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (!closingTagList.contains(qName)) {
closingTagList.add(qName);
System.out.print("</" + qName + ">");
}
}
};
saxParser.parse("student.xml", defaultHandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
Parser readXml = new Parser();
readXml.getXml();
}
}
What i'm trying to achieve is when the startElement method detects that the tag was already traversed it should skip the tag as well all the other entities inside the tag, but i'm confused about how to implement that part.
Note: Purpose is to read the tags, i don't care about the records in between them. MySet is just an abstraction which contains method like contains (if the set has the passed data) etc nothing much.
Any help would be appropriated. Thanks
Due to the nature of xml it's not possible to know which tags will appear later in the file. So there is no 'skip the next x bytes'-trick.
Just ask for reasonable sized files - maybe there is a possibility to split the data.
In my opinion reading a xml file with more than 1 gb is no fun - regardless of the used library.

What is best approach for Storing data into MySQL after Parsing XML file using SAX Parser?

I have student.xml file and am parsing this file using SAX Parser and now I need to store data into MySQL Database and so what approach is recommended.
Code:
package sax;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXML extends DefaultHandler{
public void characters(char[] ch, int start, int length) throws SAXException {
String s =new String(ch, start, length);
if(s.trim().length()>0) {
System.out.println(" Value: "+s);
}
}
public void startDocument() throws SAXException {
System.out.println("Start document");
}
public void endDocument() throws SAXException {
System.out.println("End document");
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
System.out.println("start element : "+name);
}
public void endElement(String uri, String localName, String name) throws SAXException {
System.out.println("end element");
}
public static void main(String[] args) {
ReadXML handler = new ReadXML();
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse("student.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I would create a set of tables that represent the data contained in students.xml and then populate them as you parse the data.
You might be able to directly store the XML into the DB. Many DB packages have the functionality that allows for XML formated data to be inserted into the appropriate spots in the database. I believe that PostGres and MS-SQL can do it.
There is existing functionality to do this in MySQL. See here

What is the most memory-efficient way to emit XML from a JAXP SAX ContentHandler?

I have a situation similar to an earlier question about emitting XML. I am analyzing data in a SAX ContentHandler while serializing it to a stream. I am suspicious that the solution in the linked question -- though it is exactly what I am looking for in terms of the API -- is not memory-efficient, since it involves an identity transform with the XSLT processor. I want the memory consumption of the program to be bounded, rather than it growing with the input size.
How can I easily forward the parameters to my ContentHandler methods to a serializer without doing acrobatics to adapt e.g. StAX to SAX, or worse yet, copying the SAX event contents to the output stream?
Edit: here's a minimal example of what I am after. thingIWant should just write to the OutputStream given to it. Like I said, the earlier question has a TransformerHandler that gives me the right API, but it uses the XSLT processor instead of just a simple serialization.
public class MyHandler implements ContentHandler {
ContentHandler thingIWant;
MyHandler(OutputStream outputStream) {
thingIWant = setup(outputStream);
}
public void startDocument() throws SAXException {
// parsing logic
thingIWant.startDocument();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
// parsing logic
thingIWant.startElement(uri, localName, qName, atts);
}
public void characters(char[] ch, int start, int length) throws SAXException {
// parsing logic
thingIWant.characters(ch, start, length);
}
// etc...
}
I recently had a similar problem. Here is the class I wrote to get you thingIWant:
import java.io.OutputStream;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.*;
public class XMLSerializer implements ContentHandler {
static final private TransformerFactory tf = TransformerFactory.newInstance();
private ContentHandler ch;
public XMLSerializer(OutputStream os) throws SAXException {
try {
final Transformer t = tf.newTransformer();
t.transform(new SAXSource(
new XMLReader() {
public ContentHandler getContentHandler() { return ch; }
public DTDHandler getDTDHandler() { return null; }
public EntityResolver getEntityResolver() { return null; }
public ErrorHandler getErrorHandler() { return null; }
public boolean getFeature(String name) { return false; }
public Object getProperty(String name) { return null; }
public void parse(InputSource input) { }
public void parse(String systemId) { }
public void setContentHandler(ContentHandler handler) { ch = handler; }
public void setDTDHandler(DTDHandler handler) { }
public void setEntityResolver(EntityResolver resolver) { }
public void setErrorHandler(ErrorHandler handler) { }
public void setFeature(String name, boolean value) { }
public void setProperty(String name, Object value) { }
}, new InputSource()),
new StreamResult(os));
}
catch (TransformerException e) {
throw new SAXException(e);
}
if (ch == null)
throw new SAXException("Transformer didn't set ContentHandler");
}
public void setDocumentLocator(Locator locator) {
ch.setDocumentLocator(locator);
}
public void startDocument() throws SAXException {
ch.startDocument();
}
public void endDocument() throws SAXException {
ch.endDocument();
}
public void startPrefixMapping(String prefix, String uri) throws SAXException {
ch.startPrefixMapping(prefix, uri);
}
public void endPrefixMapping(String prefix) throws SAXException {
ch.endPrefixMapping(prefix);
}
public void startElement(String uri, String localName, String qName, Attributes atts)
throws SAXException {
ch.startElement(uri, localName, qName, atts);
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
ch.endElement(uri, localName, qName);
}
public void characters(char[] ch, int start, int length)
throws SAXException {
this.ch.characters(ch, start, length);
}
public void ignorableWhitespace(char[] ch, int start, int length)
throws SAXException {
this.ch.ignorableWhitespace(ch, start, length);
}
public void processingInstruction(String target, String data)
throws SAXException {
ch.processingInstruction(target, data);
}
public void skippedEntity(String name) throws SAXException {
ch.skippedEntity(name);
}
}
Basically, it intercepts the Transformer's call to parse(), and grabs a reference to its internal ContentHandler. After that, the class acts as a proxy to the snagged ContentHandler.
Not very clean, but it works.
First: don't worry about the identity transform; it does not build an in-memory representation of the data.
To implement your "tee" functionality, you have to create a content handler that listens to the stream of events produced by the parser, and passes them on to the handler provided for you by the transformer. Unfortunately, this is not as easy as it sounds: the parser wants to send events to a DefaultHandler, while the transformer wants to read events from an XMLReader. The former is an abstract class, the latter is an interface. The JDK also provides the class XMLFilterImpl, which implements all of the interfaces of DefaultHandler, but does not extend from it ... that's what you get for incorporating two different projects as your "reference implementations."
So, you need to write a bridge class between the two:
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLFilterImpl;
/**
* Uses a decorator ContentHandler to insert a "tee" into a SAX parse/serialize
* stream.
*/
public class SaxTeeExample
{
public static void main(String[] argv)
throws Exception
{
StringReader src = new StringReader("<root><child>text</child></root>");
StringWriter dst = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
XMLReader reader = new MyReader(SAXParserFactory.newInstance().newSAXParser());
xform.transform(new SAXSource(reader, new InputSource(src)),
new StreamResult(dst));
System.out.println(dst.toString());
}
private static class MyReader
extends XMLFilterImpl
{
private SAXParser _parser;
public MyReader(SAXParser parser)
{
_parser = parser;
}
#Override
public void parse(InputSource input)
throws SAXException, IOException
{
_parser.parse(input, new XMLFilterBridge(this));
}
// this is an example of a "tee" function
#Override
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
{
System.out.println("startElement: " + name);
super.startElement(uri, localName, name, atts);
}
}
private static class XMLFilterBridge
extends DefaultHandler
{
private XMLFilterImpl _filter;
public XMLFilterBridge(XMLFilterImpl myFilter)
{
_filter = myFilter;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException
{
_filter.characters(ch, start, length);
}
// override all other methods of DefaultHandler
// ...
}
}
The main method sets up the transformer. The interesting part is that the SAXSource is constructed around MyReader. When the transformer is ready for events, it will call the parse() method ofthat object, passing it the specified InputSource.
The next part is not obvious: XMLFilterImpl follows the Decorator pattern. The transformer will call various setter methods on this object before starting the transform, passing its own handlers. Any methods that I don't override (eg, startDocument()) will simply call the delegate. As an example override, I'm doing "analysis" (just a println) in startElement(). You'll probably override other ContentHandler methods.
And finally, XMLFilterBridge is the bridge between DefaultHandler and XmlReader; it's also a decorator, and every method simply calls the delegate. I show one override, but you'll have to do them all.
Edit: Includes default JDK version
The most efficient would be an XMLWriter which implements ContentHandler. In nutshell, you are reading and writing from and to IO buffers. There is an XMLWriter in DOM4J which is being used below. You can either subclass XMLWriter or use XMLFilter to do analysis. I am using XMLFilter in this example. Note that XMLFilter is also a ContentHandler. Here is the complete code.
import org.dom4j.io.XMLWriter;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.PrintStream;
public class XMLPipeline {
public static void main(String[] args) throws Exception {
String inputFile = "build.xml";
PrintStream outputStream = System.out;
new XMLPipeline().pipe(inputFile, outputStream);
}
//dom4j
public void pipe(String inputFile, OutputStream outputStream) throws
SAXException, ParserConfigurationException, IOException {
XMLWriter xwriter = new XMLWriter(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
analyzer.setContentHandler(xwriter);
analyzer.parse(inputFile);
//do what you want with analyzer
System.err.println(analyzer.elementCount);
}
//default JDK
public void pipeTrax(String inputFile, OutputStream outputStream) throws
SAXException, ParserConfigurationException,
IOException, TransformerException {
StreamResult xwriter = new StreamResult(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
TransformerFactory stf = SAXTransformerFactory.newInstance();
SAXSource ss = new SAXSource(analyzer, new InputSource(inputFile));
stf.newTransformer().transform(ss, xwriter);
System.out.println(analyzer.elementCount);
}
//This method simply reads from a file, runs it through SAX parser and dumps it
//to dom4j writer
public void dom4jNoop(String inputFile, OutputStream outputStream) throws
IOException, SAXException {
XMLWriter xwriter = new XMLWriter(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
xreader.setContentHandler(xwriter);
xreader.parse(inputFile);
}
//Simplest way to read a file and write it back to an output stream
public void traxNoop(String inputFile, OutputStream outputStream)
throws TransformerException {
TransformerFactory stf = SAXTransformerFactory.newInstance();
stf.newTransformer().transform(new StreamSource(inputFile),
new StreamResult(outputStream));
}
//this analyzer counts the number of elements in sax stream
public static class XMLAnalyzer extends XMLFilterImpl {
int elementCount = 0;
public XMLAnalyzer(XMLReader xmlReader) {
super(xmlReader);
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
super.startElement(uri, localName, qName, atts);
elementCount++;
}
}
}

Categories