How do I determine the line number where invalid XML occurs?

How do I determine the line number where invalid XML occurs? - java

I need to support the situation where a user submits an invalid XML file to me and I report back to them information about the error. Ideally the location of the error (line number and column number) and the nature of the error.
My sample code (see below) works well enough when there is a missing tag or similar error. In that case, I get an approximate location and a useful explanation. However my code fails spectacularly when the XML file contains non-UTF-8 characters. In this case, I get a useless error:
com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 1 of 1-byte UTF-8 sequence.
I cannot find a way to determine the line number where the invalid character might be, nor the character itself. Is there a way to do this?
If, as one comment suggests, it may not be possible as we don't get to the parsing step, is there a way to process the XML file, not with a parser, but simply line-by-line, looking for and reporting non-UTF-8 characters?
Sample code follows. First a basic error handler:
public class XmlErrorHandler implements ErrorHandler {
#Override
public void warning(SAXParseException e) throws SAXException {
show("Warning", e); throw e;
}
#Override
public void error(SAXParseException e) throws SAXException {
show("Error", e); throw e;
}
#Override
public void fatalError(SAXParseException e) throws SAXException {
show("Fatal", e); throw e;
}
private void show(String type, SAXParseException e) {
System.out.println("Line " + e.getLineNumber() + " Column " + e.getColumnNumber());
System.out.println(type + ": " + e.getMessage());
}
}
And a trivial test program:
public class XmlTest {
public static void main(String[] args) {
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(new DefaultHandler());
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
}
}
}
Sample XML File (with non-UTF-8 smart quotes from (say) a Word document):
<?xml version="1.0" encoding="UTF-8"?>
<example>
<![CDATA[Text with <91>smart quotes<92>.]]>
</example>

I had some success with identifying where the issue in the XML file is using a couple of approaches.
Adapting the code from my question to use a home-grown ContentHandler with a Locator (see below) demonstrated that the XML was being processed up until the invalid character is encountered. In particular, the line number is being tracked. Preserving the line number allowed it to be retrieved from the ContentHandler when the problematic exception occurs.
At this point, I came up with two possibilities. The first is to re-run the processing with a different encoding on the InputStream, eg. Windows-1252. Parsing completed without error in this instance and I was able to retrieve the characters on the line with the known issue. This allows for a reasonably useful error message to the user, ie. line number and the characters.
My second approach was to adapt the code from the top-rated answer to this SO question. This code allows you to find the first non-UTF-8 character in a byte stream. If you assume that 0x0A (linefeed) represents a new line in the XML (and this seems to work pretty well in practice), then the line number, column number and the invalid characters can be extracted easily enough for a precise error message.
// Modified test program
public class XmlTest {
public static void main(String[] args) {
ErrorFinder errorFinder = new ErrorFinder(0); // Create our own content handler
try {
SAXParserFactory spf = SAXParserFactory.newInstance();
SAXParser parser = spf.newSAXParser();
XMLReader reader = parser.getXMLReader();
reader.setContentHandler(errorFinder); // Use instead of the default handler
reader.setErrorHandler(new XmlErrorHandler());
InputSource is = new InputSource(args[0]);
reader.parse(is);
}
catch (SAXException e) { // Useful error case
System.err.println(e);
e.printStackTrace(System.err);
}
catch (Exception e) { // Useless error case arrives here
System.err.println(e);
e.printStackTrace();
// Option 1: repeat parsing (see above) with a new ErrorFinder initialised thus:
ErrorFinder ef2 = new ErrorFinder(errorFinder.getCurrentLineNumber()); // and
is.setEncoding("Windows-1252");
}
}
}
// Content handler with irrelevant method implementations elided.
public class ErrorFinder implements ContentHandler {
private int lineNumber; // If non-zero, the line number to retrieve characters for.
private int currentLineNumber;
private char[] chars;
private Locator locator;
public ErrorFinder(int lineNumber) {
super();
this.lineNumber = lineNumber;
}
public void setDocumentLocator(Locator locator) {
this.locator = locator;
}
#Override
public void startDocument() throws SAXException {
currentLineNumber = locator.getLineNumber();
}
... // Skip other over-ridden methods as they have same code as startDocument().
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
currentLineNumber = locator.getLineNumber();
if (currentLineNumber == lineNumber) {
char[] c = new char[length];
System.arraycopy(ch, start, c, 0, length);
chars = c;
}
}
public int getCurrentLineNumber() {
return currentLineNumber;
}
public char[] getChars() {
return chars;
}
}

Related

How can I write an XML string in Stax without duplicating namespaces

Part way through creating an XML file with Stax I have some XML in the form of a String. I write this to the Stax output using:
public void addInnerXml(String xml) throws TinyException {
try {
parent.adjustStack(this);
XMLStreamReader2 sr = (XMLStreamReader2) ifact.createXMLStreamReader(new ByteArrayInputStream(xml.getBytes("UTF8")));
for (int type = sr.getEventType(); sr.hasNext(); type = sr.next()) {
switch (type) {
case XMLStreamConstants.COMMENT:
case XMLStreamConstants.DTD:
case XMLStreamConstants.START_DOCUMENT:
case XMLStreamConstants.END_DOCUMENT:
continue;
}
parent.getWriter().copyEventFromReader(sr, false);
}
sr.close();
} catch (XMLStreamException e) {
throw new TinyException("addInnerXml", e);
} catch (UnsupportedEncodingException e) {
throw new TinyException("addInnerXml", e);
}
}
It all works great except namespaces used in the passed in XML, that are defined in the root element, are duplicated again in the inner nodes. Note that the p prefix is repeated
<p:sld xmlns:a="http://schemas.openxmlformats.org/drawingml/2006/main"
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
<p:cSld> <p:spTree> <p:pic
xmlns:p="http://schemas.openxmlformats.org/presentationml/2006/main">
Is there a way to turn this off?
notes: XMLStreamReader2 implements org.codehaus.stax2.typed.TypedXMLStreamReader and is actually XMLStreamReader.
parent.getWriter() also returns an XMLStreamReader2.
thanks - dave

Jaxb generated corrupted XML(Intermittent) file during marshalling eventhough validated against XSD, but rerunning the job suceeded

We are having some issue with XML file generation using JAXB.
Even though the marshalling is successfully completed the generated XML file is corrupted sometimes(Everyday we generate around 200 xml files each is 150MB in size, so far only 2 files are corrupted in two months).
There were no errors reported in the log file(catalina.out) even though we log each exceptions.
But when we rerun the job, the files are generated successfully.
The application uses the following code segment to marshall the Java object to XML.
public static String marshall(final Marshaller marshaller, final Object obj, final String transformFileName)
throws ServiceException {
File file1 = null;
try {
file1 = new File(transformFileName);
marshaller.marshal(obj, new StreamResult( file1 ) );
return file1.getAbsolutePath();
} catch (XmlMappingException e) {
throw new ServiceException(e);
} catch (IOException e) {
throw new ServiceException(e);
}
}
The following is the bean definition of marshaller
<bean id="bankStatementMarshaller" class="org.springframework.oxm.jaxb.Jaxb2Marshaller">
<property name="contextPath" value="*.bankstatement" />
<property name="schema" value="classpath:Statement.xsd" />
</bean>
The marshaller is used by multiple threads at the same time, we have already checked the spring code that for every call of marshall, spring create new marshaller object. So we ruled out the concurrency issue(As per our understanding).
In addition to this, we are using the NFS file system to create all the XML files.
And when we are going through the JAXB Marshaller implementation, we found the following code segment that ignores the IOException when flushing the streams(The cleanup method below).
private void write(Object obj, XmlOutput out, Runnable postInitAction) throws JAXBException {
try {
if( obj == null )
throw new IllegalArgumentException(Messages.NOT_MARSHALLABLE.format());
if( schema!=null ) {
// send the output to the validator as well
ValidatorHandler validator = schema.newValidatorHandler();
validator.setErrorHandler(new FatalAdapter(serializer));
// work around a bug in JAXP validator in Tiger
XMLFilterImpl f = new XMLFilterImpl() {
#Override
public void startPrefixMapping(String prefix, String uri) throws SAXException {
super.startPrefixMapping(prefix.intern(), uri.intern());
}
};
f.setContentHandler(validator);
out = new ForkXmlOutput( new SAXOutput(f) {
#Override
public void startDocument(XMLSerializer serializer, boolean fragment, int[] nsUriIndex2prefixIndex, NamespaceContextImpl nsContext) throws SAXException, IOException, XMLStreamException {
super.startDocument(serializer, false, nsUriIndex2prefixIndex, nsContext);
}
#Override
public void endDocument(boolean fragment) throws SAXException, IOException, XMLStreamException {
super.endDocument(false);
}
}, out );
}
try {
prewrite(out,isFragment(),postInitAction);
serializer.childAsRoot(obj);
postwrite();
} catch( SAXException e ) {
throw new MarshalException(e);
} catch (IOException e) {
throw new MarshalException(e);
} catch (XMLStreamException e) {
throw new MarshalException(e);
} finally {
serializer.close();
}
} finally {
cleanUp();
}
}
private void cleanUp() {
if(toBeFlushed!=null)
try {
toBeFlushed.flush();
} catch (IOException e) {
// ignore
}
if(toBeClosed!=null)
try {
toBeClosed.close();
} catch (IOException e) {
// ignore
}
toBeFlushed = null;
toBeClosed = null;
}
Can anyone suggest what could be the potential issue?
Something like
Since we use multiple thread, concurrency issue can cause this corrupted file generation?
Since we use NFS, the nonavailability of NFS during the generation can cause the corrupted file generation?
Since we generate lot of big size xml, memory usage during the generation is up to 80%. Can this cause the corrupted xml file generation?
Regards,
Mayuran

How to Parse Big (50 GB) XML Files in Java

Currently im trying to use a SAX Parser but about 3/4 through the file it just completely freezes up, i have tried allocating more memory etc but not getting any improvements.
Is there any way to speed this up? A better method?
Stripped it to bare bones, so i now have the following code and when running in command line it still doesn't go as fast as i would like.
Running it with "java -Xms-4096m -Xmx8192m -jar reader.jar" i get a GC overhead limit exceeded around article 700000
Main:
public class Read {
public static void main(String[] args) {
pages = XMLManager.getPages();
}
}
XMLManager
public class XMLManager {
public static ArrayList<Page> getPages() {
ArrayList<Page> pages = null;
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
File file = new File("..\\enwiki-20140811-pages-articles.xml");
PageHandler pageHandler = new PageHandler();
parser.parse(file, pageHandler);
pages = pageHandler.getPages();
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return pages;
}
}
PageHandler
public class PageHandler extends DefaultHandler{
private ArrayList<Page> pages = new ArrayList<>();
private Page page;
private StringBuilder stringBuilder;
private boolean idSet = false;
public PageHandler(){
super();
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
stringBuilder = new StringBuilder();
if (qName.equals("page")){
page = new Page();
idSet = false;
} else if (qName.equals("redirect")){
if (page != null){
page.setRedirecting(true);
}
}
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
if (page != null && !page.isRedirecting()){
if (qName.equals("title")){
page.setTitle(stringBuilder.toString());
} else if (qName.equals("id")){
if (!idSet){
page.setId(Integer.parseInt(stringBuilder.toString()));
idSet = true;
}
} else if (qName.equals("text")){
String articleText = stringBuilder.toString();
articleText = articleText.replaceAll("(?s)<ref(.+?)</ref>", " "); //remove references
articleText = articleText.replaceAll("(?s)\\{\\{(.+?)\\}\\}", " "); //remove links underneath headings
articleText = articleText.replaceAll("(?s)==See also==.+", " "); //remove everything after see also
articleText = articleText.replaceAll("\\|", " "); //Separate multiple links
articleText = articleText.replaceAll("\\n", " "); //remove new lines
articleText = articleText.replaceAll("[^a-zA-Z0-9- \\s]", " "); //remove all non alphanumeric except dashes and spaces
articleText = articleText.trim().replaceAll(" +", " "); //convert all multiple spaces to 1 space
Pattern pattern = Pattern.compile("([\\S]+\\s*){1,75}"); //get first 75 words of text
Matcher matcher = pattern.matcher(articleText);
matcher.find();
try {
page.setSummaryText(matcher.group());
} catch (IllegalStateException se){
page.setSummaryText("None");
}
page.setText(articleText);
} else if (qName.equals("page")){
pages.add(page);
page = null;
}
} else {
page = null;
}
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
stringBuilder.append(ch,start, length);
}
public ArrayList<Page> getPages() {
return pages;
}
}

Your parsing code is likely working fine, but the volume of data you're loading is probably just too large to hold in memory in that ArrayList.
You need some sort of pipeline to pass the data on to its actual destination without ever
store it all in memory at once.
What I've sometimes done for this sort of situation is similar to the following.
Create an interface for processing a single element:
public interface PageProcessor {
void process(Page page);
}
Supply an implementation of this to the PageHandler through a constructor:
public class Read {
public static void main(String[] args) {
XMLManager.load(new PageProcessor() {
#Override
public void process(Page page) {
// Obviously you want to do something other than just printing,
// but I don't know what that is...
System.out.println(page);
}
}) ;
}
}
public class XMLManager {
public static void load(PageProcessor processor) {
SAXParserFactory factory = SAXParserFactory.newInstance();
try {
SAXParser parser = factory.newSAXParser();
File file = new File("pages-articles.xml");
PageHandler pageHandler = new PageHandler(processor);
parser.parse(file, pageHandler);
} catch (ParserConfigurationException e) {
e.printStackTrace();
} catch (SAXException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
Send data to this processor instead of putting it in the list:
public class PageHandler extends DefaultHandler {
private final PageProcessor processor;
private Page page;
private StringBuilder stringBuilder;
private boolean idSet = false;
public PageHandler(PageProcessor processor) {
this.processor = processor;
}
#Override
public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
//Unchanged from your implementation
}
#Override
public void characters(char[] ch, int start, int length) throws SAXException {
//Unchanged from your implementation
}
#Override
public void endElement(String uri, String localName, String qName) throws SAXException {
// Elide code not needing change
} else if (qName.equals("page")){
processor.process(page);
page = null;
}
} else {
page = null;
}
}
}
Of course, you can make your interface handle chunks of multiple records rather than just one and have the PageHandler collect pages locally in a smaller list and periodically send the list off for processing and clear the list.
Or (perhaps better) you could implement the PageProcessor interface as defined here and build in logic there that buffers the data and sends it on for further handling in chunks.

Don Roby's approach is somewhat reminiscent to the approach I followed creating a code generator designed to solve this particular problem (an early version was conceived in 2008). Basically each complexType has its Java POJO equivalent and handlers for the particular type are activated when the context changes to that element. I used this approach for SEPA, transaction banking and for instance discogs (30GB). You can specify what elements you want to process at runtime, declaratively using a propeties file.
XML2J uses mapping of complexTypes to Java POJOs on the one hand, but lets you specify events you want to listen on.
E.g.
account/#process = true
account/accounts/#process = true
account/accounts/#detach = true
The essence is in the third line. The detach makes sure individual accounts are not added to the accounts list. So it won't overflow.
class AccountType {
private List<AccountType> accounts = new ArrayList<>();
public void addAccount(AccountType tAccount) {
accounts.add(tAccount);
}
// etc.
};
In your code you need to implement the process method (by default the code generator generates an empty method:
class AccountsProcessor implements MessageProcessor {
static private Logger logger = LoggerFactory.getLogger(AccountsProcessor.class);
// assuming Spring data persistency here
final String path = new ClassPathResource("spring-config.xml").getPath();
ClassPathXmlApplicationContext context = new ClassPathXmlApplicationContext(path);
AccountsTypeRepo repo = context.getBean(AccountsTypeRepo.class);
#Override
public void process(XMLEvent evt, ComplexDataType data)
throws ProcessorException {
if (evt == XMLEvent.END) {
if( data instanceof AccountType) {
process((AccountType)data);
}
}
}
private void process(AccountType data) {
if (logger.isInfoEnabled()) {
// do some logging
}
repo.save(data);
}
}
Note that XMLEvent.END marks the closing tag of an element. So, when you are processing it, it is complete. If you have to relate it (using a FK) to its parent object in the database, you could process the XMLEvent.BEGIN for the parent, create a placeholder in the database and use its key to store with each of its children. In the final XMLEvent.END you would then update the parent.
Note that the code generator generates everything you need. You just have to implement that method and of course the DB glue code.
There are samples to get you started. The code generator even generates your POM files, so you can immediately after generation build your project.
The default process method is like this:
#Override
public void process(XMLEvent evt, ComplexDataType data)
throws ProcessorException {
/*
* TODO Auto-generated method stub implement your own handling here.
* Use the runtime configuration file to determine which events are to be sent to the processor.
*/
if (evt == XMLEvent.END) {
data.print( ConsoleWriter.out );
}
}
Downloads:
https://github.com/lolkedijkstra/xml2j-core
https://github.com/lolkedijkstra/xml2j-gen
https://sourceforge.net/projects/xml2j/
First mvn clean install the core (it has to be in the local maven repo), then the generator. And don't forget to set up the environment variable XML2J_HOME as per directions in the usermanual.

Java: SAX Parsing a huge XML file

I have a 35 GB XML file (yes, some organizations do that and I have no control over it) that I would like to SAX parse. I found an example here:
http://www.java2s.com/Code/Java/XML/SAXDemo.htm
of how to run a SAX parser and avoid loading everything. However, I get an out of memory error immediatly. Why does this happens and how I can make this code perfectly scalable for any XML file size?
Here my code:
import org.apache.log4j.Logger;
import org.xml.sax.AttributeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParserFactory;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
public class XMLSAXTools extends org.xml.sax.helpers.DefaultHandler {
/**
* Logging facility
*/
static Logger logger = Logger.getLogger(XMLSAXTools.class);
private String fileName = "C:/Data/hugefile.xml";
private int counter = 0;
/** The main method sets things up for parsing */
public void test() throws IOException, SAXException,
ParserConfigurationException {
// Create a JAXP "parser factory" for creating SAX parsers
javax.xml.parsers.SAXParserFactory spf = SAXParserFactory.newInstance();
// Configure the parser factory for the type of parsers we require
spf.setValidating(false); // No validation required
// Now use the parser factory to create a SAXParser object
// Note that SAXParser is a JAXP class, not a SAX class
javax.xml.parsers.SAXParser sp = spf.newSAXParser();
// Create a SAX input source for the file argument
org.xml.sax.InputSource input = new InputSource(new FileReader(fileName));
// Give the InputSource an absolute URL for the file, so that
// it can resolve relative URLs in a <!DOCTYPE> declaration, e.g.
input.setSystemId("file://" + new File(fileName).getAbsolutePath());
// Create an instance of this class; it defines all the handler methods
XMLSAXTools handler = new XMLSAXTools();
// Finally, tell the parser to parse the input and notify the handler
sp.parse(input, handler);
// Instead of using the SAXParser.parse() method, which is part of the
// JAXP API, we could also use the SAX1 API directly. Note the
// difference between the JAXP class javax.xml.parsers.SAXParser and
// the SAX1 class org.xml.sax.Parser
//
// org.xml.sax.Parser parser = sp.getParser(); // Get the SAX parser
// parser.setDocumentHandler(handler); // Set main handler
// parser.setErrorHandler(handler); // Set error handler
// parser.parse(input); // Parse!
}
StringBuffer accumulator = new StringBuffer(); // Accumulate parsed text
String servletName; // The name of the servlet
String servletClass; // The class name of the servlet
String servletId; // Value of id attribute of <servlet> tag
// When the parser encounters plain text (not XML elements), it calls
// this method, which accumulates them in a string buffer
public void characters(char[] buffer, int start, int length) {
accumulator.append(buffer, start, length);
}
// Every time the parser encounters the beginning of a new element, it
// calls this method, which resets the string buffer
public void startElement(String name, AttributeList attributes) {
accumulator.setLength(0); // Ready to accumulate new text
if (name.equals("item")) {
logger.info("item tag opened");
counter++;
}
}
// When the parser encounters the end of an element, it calls this method
public void endElement(String name) {
if (name.equals("item")) {
logger.info("item tag closed. Counter: " + counter);
}
}
/** This method is called when warnings occur */
public void warning(SAXParseException exception) {
System.err.println("WARNING: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
}
/** This method is called when errors occur */
public void error(SAXParseException exception) {
System.err.println("ERROR: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
}
/** This method is called when non-recoverable errors occur. */
public void fatalError(SAXParseException exception) throws SAXException {
System.err.println("FATAL: line " + exception.getLineNumber() + ": "
+ exception.getMessage());
throw (exception);
}
public static void main(String[] args){
XMLSAXTools t = new XMLSAXTools();
try {
t.test();
} catch (Exception e){
logger.error("Exception in XMLSAXTools: " + e.getMessage());
e.printStackTrace();
}
}
}

You are filling up your accumulator without ever emptying it - this is unlikely to be what you want.
Just using SAX is not sufficient to ensure you do not run out of memory - you still need to implement the code that finds, selects and processes what you do need from the xml and discards the rest.
Here's a fairly simple parser that is designed to be run in a separate thread. It communicates with the calling thread via n ArrayBlockingQueue<String> queue which is defined in an enclosing class.
The huge data files I have to deal with are essentially <Batch> ... a few thousand items ... </Batch>. This parser pulls each item out and presents them one-at-a-time through the blocking queue. One day I will turn them into XOM Elements but atm it uses Strings.
Notice how it clears down its temporary data fields when enque is called to ensure we don't run out of memory:
private class Parser extends DefaultHandler {
// Track the depth of the xml - whenever we hit level 1 we add the accumulated xml to the queue.
private int level = 0;
// The current xml fragment.
private final StringBuilder xml = new StringBuilder();
// We've had a start tag but no data yet.
private boolean tagWithNoData = false;
/*
* Called when the starting of the Element is reached. For Example if we have Tag
* called <Title> ... </Title>, then this method is called when <Title> tag is
* Encountered while parsing the Current XML File. The AttributeList Parameter has
* the list of all Attributes declared for the Current Element in the XML File.
*/
#Override
public void startElement(final String uri, final String localName, final String name, final Attributes atrbts) throws SAXException {
checkForAbort();
// Have we got back to level 1 yet?
if (level == 1) {
// Emit any built ones.
try {
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
// Add it on.
if (level > 0) {
// The name.
xml.append("<").append(name);
// The attributes.
for (int i = 0; i < atrbts.getLength(); i++) {
final String att = atrbts.getValue(i);
xml.append(" ").append(atrbts.getQName(i)).append("=\"").append(XML.to(att)).append("\"");
}
// Done.
xml.append(">");
// Remember we've not had any data yet.
tagWithNoData = true;
}
// Next element is a sub-element.
level += 1;
}
/*
* Called when the Ending of the current Element is reached. For example in the
* above explanation, this method is called when </Title> tag is reached
*/
#Override
public void endElement(final String uri, final String localName, final String name) throws SAXException {
checkForAbort();
if (level > 1) {
if (tagWithNoData) {
// No data. Make the > into a />
xml.insert(xml.length() - 1, "/");
// I've closed this one but the enclosing one has data (i.e. this one).
tagWithNoData = false;
} else {
// Had data, finish properly.
xml.append("</").append(name).append(">");
}
}
// Done with that level.
level -= 1;
if (level == 1) {
// Finished and at level 1.
try {
// Enqueue the results.
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
}
/*
* Called when the data part is encountered.
*/
#Override
public void characters(final char buf[], final int offset, final int len) throws SAXException {
checkForAbort();
// I want it trimmed.
final String chs = new String(buf, offset, len).trim();
if (chs.length() > 0) {
// Grab that data.
xml.append(XML.to(chs));
tagWithNoData = false;
}
}
/*
* Called when the Parser starts parsing the Current XML File.
*/
#Override
public void startDocument() throws SAXException {
checkForAbort();
tagWithNoData = false;
}
/*
* Called when the Parser Completes parsing the Current XML File.
*/
#Override
public void endDocument() throws SAXException {
checkForAbort();
try {
// Enqueue the results.
enqueue();
} catch (InterruptedException ex) {
Throwables.rethrow(ex);
}
}
private void enqueue() throws InterruptedException, SAXException {
// We may have been closed while blocking on the queue.
checkForAbort();
final String x = xml.toString().trim();
if (x.length() > 0) {
// Add it to the queue.
queue.put(x);
// Clear out.
xml.setLength(0);
tagWithNoData = false;
}
// We may have been closed while blocking on the queue.
checkForAbort();
}
private void checkForAbort() throws XMLInnerDocumentIteratorAbortedException {
if (iteratorFinished) {
LOGGER.debug("Aborting!!!");
throw new XMLInnerDocumentIterator.XMLInnerDocumentIteratorAbortedException("Aborted!");
}
}
}
}

Why is the main method not covered?

main method:
public static void main(String[] args) throws Exception
{
if (args.length != EXPECTED_NUMBER_OF_ARGUMENTS)
{
System.err.println("Usage - java XFRCompiler ConfigXML PackageXML XFR");
}
String configXML = args[0];
String packageXML = args[1];
String xfr = args[2];
AutoConfigCompiler compiler = new AutoConfigCompiler();
compiler.setConfigDocument(loadDocument(configXML));
compiler.setPackageInfoDoc(loadDocument(packageXML));
// compiler.setVisiblityDoc(loadDocument("VisibilityFilter.xml"));
compiler.compileModel(xfr);
}
private static Document loadDocument(String fileName) throws Exception
{
TXDOMParser parser = (TXDOMParser) ParserFactory.makeParser(TXDOMParser.class.getName());
InputSource source = new InputSource(new FileInputStream(fileName));
parser.parse(source);
return parser.getDocument();
}
testcase:
#Test
public void testCompileModel() throws Exception
{
// construct parameters
URL configFile = Thread.currentThread().getContextClassLoader().getResource("Ford_2008_Mustang_Config.xml");
URL packageFile = Thread.currentThread().getContextClassLoader().getResource("Ford_2008_Mustang_Package.xml");
File tmpFile = new File("Ford_2008_Mustang_tmp.xfr");
if(!tmpFile.exists()) {
tmpFile.createNewFile();
}
String[] args = new String[]{configFile.getPath(),packageFile.getPath(),tmpFile.getPath()};
try {
// test main method
XFRCompiler.main(args);
} catch (Exception e) {
assertTrue(true);
}
try {
// test args length is less than 3
XFRCompiler.main(new String[]{"",""});
} catch (Exception e) {
//ignore
}
tmpFile.delete();
}
Coverage outputs displayed as the lines from String configXML = args[0]; in main method
are not covered.

assertTrue(true); is a pointless no-op
Remove the try/catch around the call to XFRCompiler.main(args);, since all it does is swallow excpetions and make debugging harder; most likely you will then see an exception that tells you where the problem is.
There should be a call to fail() after the call to XFRCompiler.main(new String[]{"",""}); since you expect it to throw an exception
Put the two calls in separate test methods.

I'm worried about all those assertTrue(true). If there can't be an exception, then the assert is not necessary. If there is an unexpected exception, then this code will swallow it and you will get the behavior you see right now.
Then, if you expect an exception, you should code like this:
try {
... code that will throw an exception ...
fail("No exception was thrown");
} catch (SpecficTypeOfException e) {
assertEquals("message", e.getMessage());
}
That way, wrong types of exception and the exception message will be checked.
PS: Don't post questions with "urgent". We already help as fast as we can.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How do I determine the line number where invalid XML occurs? - java

Related

How can I write an XML string in Stax without duplicating namespaces

Jaxb generated corrupted XML(Intermittent) file during marshalling eventhough validated against XSD, but rerunning the job suceeded

How to Parse Big (50 GB) XML Files in Java

Java: SAX Parsing a huge XML file

Why is the main method not covered?

Categories

Resources