Reading XML for getting entities - java

I'm using SAX (Simple API for XML) to parse an XML document. My purpose is to parse the document so that i can separate entities from the the XML and create an ER Diagram from these entities (which i will create manually after i get all the entities the file have).
Although i'm on very initial stage of coding every thing i have discussed above, but i' just stuck at this particular problem right now.
here is my code:
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class Parser extends DefaultHandler {
public void getXml() {
try {
SAXParserFactory saxParserFactory = SAXParserFactory.newInstance();
SAXParser saxParser = saxParserFactory.newSAXParser();
final MySet openingTagList = new MySet();
final MySet closingTagList = new MySet();
DefaultHandler defaultHandler = new DefaultHandler() {
public void startDocument() throws SAXException {
System.out.println("Starting Parsing...\n");
}
public void endDocument() throws SAXException {
System.out.print("\n\nDone Parsing!");
}
public void startElement(String uri, String localName, String qName,
Attributes attributes) throws SAXException {
if (!openingTagList.contains(qName)) {
openingTagList.add(qName);
System.out.print("<" + qName + ">");
}
}
public void characters(char ch[], int start, int length)
throws SAXException {
for (int i = start; i < (start + length); i++) {
System.out.print(ch[i]);
}
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
if (!closingTagList.contains(qName)) {
closingTagList.add(qName);
System.out.print("</" + qName + ">");
}
}
};
saxParser.parse("student.xml", defaultHandler);
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String args[]) {
Parser readXml = new Parser();
readXml.getXml();
}
}
What i'm trying to achieve is when the startElement method detects that the tag was already traversed it should skip the tag as well all the other entities inside the tag, but i'm confused about how to implement that part.
Note: Purpose is to read the tags, i don't care about the records in between them. MySet is just an abstraction which contains method like contains (if the set has the passed data) etc nothing much.
Any help would be appropriated. Thanks

Due to the nature of xml it's not possible to know which tags will appear later in the file. So there is no 'skip the next x bytes'-trick.
Just ask for reasonable sized files - maybe there is a possibility to split the data.
In my opinion reading a xml file with more than 1 gb is no fun - regardless of the used library.

Related

Not able to Catch Element using SAX Parser

I'm reading XML file using SAX parser utility.
Here is my sample XML
<?xml version="1.0"?><company><Account AccountNumber="100"><staff><firstname>yong</firstname><firstname>jin</firstname></staff></Account></company>
Here is the code
import java.util.Arrays;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXML {
public static void main(String argv[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean bAccount = false;
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
System.out.println("Start Element :" + qName);
if (qName.equalsIgnoreCase("ACCOUNT")) {
bAccount = true;
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
System.out.println("End Element :" + qName);
}
public void characters(char[] ch, int start, int length) throws SAXException {
System.out.println("Im here:" + bAccount);
if (bAccount) {
System.out.println("Account First Name : " + new String(ch, start, length));
bAccount = false;
StringBuilder Account = new StringBuilder();
for (int i = start; i < ch.length - 1; i--) {
if (String.valueOf(ch[i]).equals("<")) {
System.out.println("Account:" +Account);
break;
} else {
Account.append(ch[i]);
}
}
}
}
};
saxParser.parse("C:\\Lenny\\Work\\XML\\Out_SaxParsing_01.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
As you can see in XML, Account tag is something like this Account AccountNumber="100", What I want to do is, I want to capture Tag too as well.
So to achieve that, in characters method, I'm trying to read the array from right to left, So that I could get the Account AccountNumber="100", when Account encountered as event.
But am not able to reach there, The event is getting generated, but its not going to characters method. I think it should go into characters method once Account tag is encountered. But its not..!
May I know please what am missing or doing wrong ?
Any Help please..!
AccountNumber="100" is an attribute of the Account element so inside the startElement handler you have you can read out the attributes parameter to access that value.

traversing xml document using SAX parser and printing output in desired format

I am trying to parse attached xml(Please find attachment) file.
xml document is as given below.check the attachment 1 and 2
sample data of xml file
In order to parse this xml, I used SAX parser. program is as follows.
package com.dom;
import java.io.File;
import java.io.IOException;
import java.util.Enumeration;
import java.util.Hashtable;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
public class DemoXML {
File file;
SAXParserFactory factory;
SAXParser saxParser;
UserHandler handler;
public void loadXML()
{
file = new File("E:/fifthWorkbenchProjects/XMLUtility/src/input/FIXBOND.xml");
System.out.println(file.exists());
}
public void readXML()
{
factory = SAXParserFactory.newInstance();
try {
saxParser = factory.newSAXParser();
handler = new UserHandler();
try {
saxParser.parse(file,handler);
} catch (IOException e) {
e.printStackTrace();
}
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
}
}
public static void main(String args[])
{
DemoXML ob = new DemoXML();
ob.loadXML();
ob.readXML();
}
}
class UserHandler extends DefaultHandler
{
Hashtable tags;
#Override
public void startDocument()
{
System.out.println("Document started");
tags = new Hashtable();
}
#Override
public void endDocument()
{
System.out.println("Documents ended");
}
#Override
public void startElement(String namespaceURI,String localName,String qname,Attributes atts) throws SAXException
{
// System.out.println("Element started");
// if(qname.equals("Currency"))
System.out.print(qname+"-->");
}
#Override
public void endElement(String uri,String localName, String qname)
{
}
#Override
public void characters(char[] ch, int start, int length)
{
String str = new String(ch,start,length);
System.out.println(str);
System.out.println();
}
}
I get output in following manner.
true
Document started
FIgovcorpagncy-->InstrumentDescription-->InstrumentType-->FI GOVCORPAGNCY
InstrumentSubType-->FIXDBOND
InstrumentName-->QUEENSNR 0% 07/06/2016
InstrumentDescription-->QUEENSNR 0% 07/06/2016
Currency-->GBP
InstrumentStatus-->ACTIVE
AmountOutstanding-->48384375
AmtOutstandingDate-->2012-06-27T00:00:00.000
PrincipalExchange-->N
CountryOfRisk-->GB
InstrumentCompleteness-->50
CapitalRanking-->1
AtIssuance-->IssueDate-->2012-06-27T00:00:00.000
OriginalIssueAmount-->48384375
PrivatePlacementFlag-->Y
MinimumDenomination-->1000
MinimumIncrement-->0.01
and so on ....I am able to access all nodes but observe one thing over here for first element in tree,complete element address is printed like
FIgovcorpagncy-->InstrumentDescription-->InstrumentType-->FI GOVCORPAGNCY
then for rest of the elements in tree, it prints tag name and corresponding value like
InstrumentSubType-->FIXDBOND
InstrumentName-->QUEENSNR 0% 07/06/2016
InstrumentDescription-->QUEENSNR 0% 07/06/2016
Currency-->GBP
InstrumentStatus-->ACTIVE
AmountOutstanding-->48384375
so on....
here my requirement is I want to print these elements also with full hierarchic manner as the first element.
how to go about it?
class UserHandler extends DefaultHandler
{
List li_elements,li_values;
LinkedHashMap<List<String>,List<String>> hm;
boolean endElementFlag;
#Override
public void startDocument()
{
System.out.println("Document started");
li_elements = new ArrayList<String>();
li_values=new ArrayList<String>();
}
#Override
public void endDocument()
{
System.out.println("Documents ended"+hm.size());
for(Map.Entry m:hm.entrySet())
{
System.out.println(m.getKey()+""+m.getValue());
}
}
#Override
public void startElement(String namespaceURI,String localName,String qname,Attributes atts) throws SAXException
{
li_elements.add(qname);
//System.out.println("Element Started");
//System.out.println(qname+" added in element list");
}
#Override
public void endElement(String uri,String localName, String qname)
{
if(!li_values.isEmpty())
{
System.out.println("Element address list:-"+li_elements+"and Corresponding Value:-"+li_values);
System.out.println();
}
li_elements.remove(li_elements.size()-1);
li_values.clear();
}
#Override
public void characters(char[] ch, int start, int length)
{
String str = new String(ch,start,length);
li_values.add(str);
}
}
I was expecting something like this. this prints the output in a format that I was hoping for.

parsing XML in Java using SAX: value cut in 2 halves

I am trying to read a file format that is based on xml and is called mzXML using SAX in JAVA. It carries partially encoded mass spectrometric data (signals with intensities).
This is what the entry of interest looks like (there is more information around that):
<peaks ... >eJwBgAN//EByACzkZJkHP/NlAceAXLJAckeQ4CIUJz/203q2...</peaks>
A complete file that forces the Error in my case can be downloaded here.
The String in one of these entries holds about 500 compressed and base64 encoded pairs of doubles (signals and intensities). What I do is to decompress and decode, to get the values (decoding not shown in the example below). That is all working fine on a small dataset. Now I used a bigger one and i ran into a problem that I don´t understand:
The procedure characters(ch,start,length) does not read the complete entry in the line shown before. The length-value seems to be to small.
I did not see that problem, when I just printed the peaks entry to the console, as there are a lot of letters and I did´nt recognize letters were missing. But the decompression fails, when there is information missing. When I repeatedly run this program, it always breaks the same line at the same point without giving any Exception. If I change the mzXML file by e.g. deleting a scan, it breaks at a different position. I found this out using breakpoints in the character() procedure looking at the content of currentValue
Here is the piece of code necessary to recapitulate the problem:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.DataFormatException;
import java.util.zip.Inflater;
import javax.xml.bind.DatatypeConverter;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXMLFile {
public static byte[] decompress(byte[] data) throws IOException, DataFormatException {
Inflater inflater = new Inflater();
inflater.setInput(data);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream(data.length);
byte[] buffer = new byte[data.length*2];
while (!inflater.finished()) {
int count = inflater.inflate(buffer);
outputStream.write(buffer, 0, count);
}
outputStream.close();
byte[] output = outputStream.toByteArray();
return output;
}
public static void main(String args[]) {
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
DefaultHandler handler = new DefaultHandler() {
boolean peaks = false;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("PEAKS")) {
peaks = true;
}
}
public void endElement(String uri, String localName,
String qName) throws SAXException {
if (peaks) {peaks = false;}
}
public void characters(char ch[], int start, int length) throws SAXException {
if (peaks) {
String currentValue = new String(ch, start, length);
System.out.println(currentValue);
try {
byte[] array = decompress(DatatypeConverter.parseBase64Binary(currentValue));
System.out.println(array[1]);
} catch (IOException | DataFormatException e) {e.printStackTrace();}
peaks = false;
}
}
};
saxParser.parse("file1_zlib.mzxml", handler);
} catch (Exception e) {e.printStackTrace();}
}
}
Is there a safer way to read large xml files? Can you tell me where the error comes from or how to avoid it?
Thanks, Michael
The procedure characters(ch,start,length) does not read the complete entry in the line shown before. The length-value seems to be to small.
That is precisely the way it is desgined to work. From the documentation of ContentHandler:
SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks.
Therefore, you should not try calling decompress inside the characters implementation. Instead, you should append the characters that you get to an expandable buffer, and call decompress only when you get the corresponding endElement:
StringBuilder sb = null;
public void startElement(String uri, String localName,String qName,
Attributes attributes) throws SAXException {
if (qName.equalsIgnoreCase("PEAKS")) {
sb = new StringBuilder();
}
}
public void endElement(String uri, String localName, String qName) throws SAXException {
if (sb == null) return;
try {
byte[] array = decompress(DatatypeConverter.parseBase64Binary(sb.toString()));
System.out.println(array[1]);
} catch (IOException | DataFormatException e) {e.printStackTrace();}
sb = null;
}
public void characters(char ch[], int start, int length) throws SAXException {
if (sb == null) return;
String currentValue = new String(ch, start, length);
sb.appens(currentValue);
}
Try this! Use a LinkedList to store the tag names at every startElement() and remove the last element using pollLast() at every endElement(). Use String.trim() to get the data from characters(). So everytime the characters() function returns some actual data (Use String.length()!=0) you can associate it with the last element (peekLast()) in the LinkedList
Then you can choose to append() it or may be do otherwise

What is best approach for Storing data into MySQL after Parsing XML file using SAX Parser?

I have student.xml file and am parsing this file using SAX Parser and now I need to store data into MySQL Database and so what approach is recommended.
Code:
package sax;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;
public class ReadXML extends DefaultHandler{
public void characters(char[] ch, int start, int length) throws SAXException {
String s =new String(ch, start, length);
if(s.trim().length()>0) {
System.out.println(" Value: "+s);
}
}
public void startDocument() throws SAXException {
System.out.println("Start document");
}
public void endDocument() throws SAXException {
System.out.println("End document");
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
System.out.println("start element : "+name);
}
public void endElement(String uri, String localName, String name) throws SAXException {
System.out.println("end element");
}
public static void main(String[] args) {
ReadXML handler = new ReadXML();
try {
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser saxParser = factory.newSAXParser();
saxParser.parse("student.xml", handler);
} catch (Exception e) {
e.printStackTrace();
}
}
}
I would create a set of tables that represent the data contained in students.xml and then populate them as you parse the data.
You might be able to directly store the XML into the DB. Many DB packages have the functionality that allows for XML formated data to be inserted into the appropriate spots in the database. I believe that PostGres and MS-SQL can do it.
There is existing functionality to do this in MySQL. See here

What is the most memory-efficient way to emit XML from a JAXP SAX ContentHandler?

I have a situation similar to an earlier question about emitting XML. I am analyzing data in a SAX ContentHandler while serializing it to a stream. I am suspicious that the solution in the linked question -- though it is exactly what I am looking for in terms of the API -- is not memory-efficient, since it involves an identity transform with the XSLT processor. I want the memory consumption of the program to be bounded, rather than it growing with the input size.
How can I easily forward the parameters to my ContentHandler methods to a serializer without doing acrobatics to adapt e.g. StAX to SAX, or worse yet, copying the SAX event contents to the output stream?
Edit: here's a minimal example of what I am after. thingIWant should just write to the OutputStream given to it. Like I said, the earlier question has a TransformerHandler that gives me the right API, but it uses the XSLT processor instead of just a simple serialization.
public class MyHandler implements ContentHandler {
ContentHandler thingIWant;
MyHandler(OutputStream outputStream) {
thingIWant = setup(outputStream);
}
public void startDocument() throws SAXException {
// parsing logic
thingIWant.startDocument();
}
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
// parsing logic
thingIWant.startElement(uri, localName, qName, atts);
}
public void characters(char[] ch, int start, int length) throws SAXException {
// parsing logic
thingIWant.characters(ch, start, length);
}
// etc...
}
I recently had a similar problem. Here is the class I wrote to get you thingIWant:
import java.io.OutputStream;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerException;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.*;
public class XMLSerializer implements ContentHandler {
static final private TransformerFactory tf = TransformerFactory.newInstance();
private ContentHandler ch;
public XMLSerializer(OutputStream os) throws SAXException {
try {
final Transformer t = tf.newTransformer();
t.transform(new SAXSource(
new XMLReader() {
public ContentHandler getContentHandler() { return ch; }
public DTDHandler getDTDHandler() { return null; }
public EntityResolver getEntityResolver() { return null; }
public ErrorHandler getErrorHandler() { return null; }
public boolean getFeature(String name) { return false; }
public Object getProperty(String name) { return null; }
public void parse(InputSource input) { }
public void parse(String systemId) { }
public void setContentHandler(ContentHandler handler) { ch = handler; }
public void setDTDHandler(DTDHandler handler) { }
public void setEntityResolver(EntityResolver resolver) { }
public void setErrorHandler(ErrorHandler handler) { }
public void setFeature(String name, boolean value) { }
public void setProperty(String name, Object value) { }
}, new InputSource()),
new StreamResult(os));
}
catch (TransformerException e) {
throw new SAXException(e);
}
if (ch == null)
throw new SAXException("Transformer didn't set ContentHandler");
}
public void setDocumentLocator(Locator locator) {
ch.setDocumentLocator(locator);
}
public void startDocument() throws SAXException {
ch.startDocument();
}
public void endDocument() throws SAXException {
ch.endDocument();
}
public void startPrefixMapping(String prefix, String uri) throws SAXException {
ch.startPrefixMapping(prefix, uri);
}
public void endPrefixMapping(String prefix) throws SAXException {
ch.endPrefixMapping(prefix);
}
public void startElement(String uri, String localName, String qName, Attributes atts)
throws SAXException {
ch.startElement(uri, localName, qName, atts);
}
public void endElement(String uri, String localName, String qName)
throws SAXException {
ch.endElement(uri, localName, qName);
}
public void characters(char[] ch, int start, int length)
throws SAXException {
this.ch.characters(ch, start, length);
}
public void ignorableWhitespace(char[] ch, int start, int length)
throws SAXException {
this.ch.ignorableWhitespace(ch, start, length);
}
public void processingInstruction(String target, String data)
throws SAXException {
ch.processingInstruction(target, data);
}
public void skippedEntity(String name) throws SAXException {
ch.skippedEntity(name);
}
}
Basically, it intercepts the Transformer's call to parse(), and grabs a reference to its internal ContentHandler. After that, the class acts as a proxy to the snagged ContentHandler.
Not very clean, but it works.
First: don't worry about the identity transform; it does not build an in-memory representation of the data.
To implement your "tee" functionality, you have to create a content handler that listens to the stream of events produced by the parser, and passes them on to the handler provided for you by the transformer. Unfortunately, this is not as easy as it sounds: the parser wants to send events to a DefaultHandler, while the transformer wants to read events from an XMLReader. The former is an abstract class, the latter is an interface. The JDK also provides the class XMLFilterImpl, which implements all of the interfaces of DefaultHandler, but does not extend from it ... that's what you get for incorporating two different projects as your "reference implementations."
So, you need to write a bridge class between the two:
import java.io.IOException;
import java.io.StringReader;
import java.io.StringWriter;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamResult;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLFilterImpl;
/**
* Uses a decorator ContentHandler to insert a "tee" into a SAX parse/serialize
* stream.
*/
public class SaxTeeExample
{
public static void main(String[] argv)
throws Exception
{
StringReader src = new StringReader("<root><child>text</child></root>");
StringWriter dst = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
XMLReader reader = new MyReader(SAXParserFactory.newInstance().newSAXParser());
xform.transform(new SAXSource(reader, new InputSource(src)),
new StreamResult(dst));
System.out.println(dst.toString());
}
private static class MyReader
extends XMLFilterImpl
{
private SAXParser _parser;
public MyReader(SAXParser parser)
{
_parser = parser;
}
#Override
public void parse(InputSource input)
throws SAXException, IOException
{
_parser.parse(input, new XMLFilterBridge(this));
}
// this is an example of a "tee" function
#Override
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
{
System.out.println("startElement: " + name);
super.startElement(uri, localName, name, atts);
}
}
private static class XMLFilterBridge
extends DefaultHandler
{
private XMLFilterImpl _filter;
public XMLFilterBridge(XMLFilterImpl myFilter)
{
_filter = myFilter;
}
#Override
public void characters(char[] ch, int start, int length)
throws SAXException
{
_filter.characters(ch, start, length);
}
// override all other methods of DefaultHandler
// ...
}
}
The main method sets up the transformer. The interesting part is that the SAXSource is constructed around MyReader. When the transformer is ready for events, it will call the parse() method ofthat object, passing it the specified InputSource.
The next part is not obvious: XMLFilterImpl follows the Decorator pattern. The transformer will call various setter methods on this object before starting the transform, passing its own handlers. Any methods that I don't override (eg, startDocument()) will simply call the delegate. As an example override, I'm doing "analysis" (just a println) in startElement(). You'll probably override other ContentHandler methods.
And finally, XMLFilterBridge is the bridge between DefaultHandler and XmlReader; it's also a decorator, and every method simply calls the delegate. I show one override, but you'll have to do them all.
Edit: Includes default JDK version
The most efficient would be an XMLWriter which implements ContentHandler. In nutshell, you are reading and writing from and to IO buffers. There is an XMLWriter in DOM4J which is being used below. You can either subclass XMLWriter or use XMLFilter to do analysis. I am using XMLFilter in this example. Note that XMLFilter is also a ContentHandler. Here is the complete code.
import org.dom4j.io.XMLWriter;
import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLFilterImpl;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.IOException;
import java.io.PrintStream;
public class XMLPipeline {
public static void main(String[] args) throws Exception {
String inputFile = "build.xml";
PrintStream outputStream = System.out;
new XMLPipeline().pipe(inputFile, outputStream);
}
//dom4j
public void pipe(String inputFile, OutputStream outputStream) throws
SAXException, ParserConfigurationException, IOException {
XMLWriter xwriter = new XMLWriter(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
analyzer.setContentHandler(xwriter);
analyzer.parse(inputFile);
//do what you want with analyzer
System.err.println(analyzer.elementCount);
}
//default JDK
public void pipeTrax(String inputFile, OutputStream outputStream) throws
SAXException, ParserConfigurationException,
IOException, TransformerException {
StreamResult xwriter = new StreamResult(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
XMLAnalyzer analyzer = new XMLAnalyzer(xreader);
TransformerFactory stf = SAXTransformerFactory.newInstance();
SAXSource ss = new SAXSource(analyzer, new InputSource(inputFile));
stf.newTransformer().transform(ss, xwriter);
System.out.println(analyzer.elementCount);
}
//This method simply reads from a file, runs it through SAX parser and dumps it
//to dom4j writer
public void dom4jNoop(String inputFile, OutputStream outputStream) throws
IOException, SAXException {
XMLWriter xwriter = new XMLWriter(outputStream);
XMLReader xreader = XMLReaderFactory.createXMLReader();
xreader.setContentHandler(xwriter);
xreader.parse(inputFile);
}
//Simplest way to read a file and write it back to an output stream
public void traxNoop(String inputFile, OutputStream outputStream)
throws TransformerException {
TransformerFactory stf = SAXTransformerFactory.newInstance();
stf.newTransformer().transform(new StreamSource(inputFile),
new StreamResult(outputStream));
}
//this analyzer counts the number of elements in sax stream
public static class XMLAnalyzer extends XMLFilterImpl {
int elementCount = 0;
public XMLAnalyzer(XMLReader xmlReader) {
super(xmlReader);
}
#Override
public void startElement(String uri, String localName, String qName,
Attributes atts) throws SAXException {
super.startElement(uri, localName, qName, atts);
elementCount++;
}
}
}

Categories