XSD validation for multiple entries in a XML - java

I am using SAXON parser for XSD validation in Java. If we use a XML with single element it works fine. Even if we have multiple elements it works fine. But we are unable to identify which element failed and which passed. To be more clear, we have an XSD to validate a simple xml file with a root element and the other elements within are <person> <employment></employment></person>. The is repeatable element.I have an xml like below with errors.
<person>
<employment>correct elements inside</employment>
<employment>wrong elements inside </employment>
</person>
I am performing a XSD validation for above xml. It fails overall due to error in second <employment> entry. But what I need is to identify that first employment passed and second one failed.
How can I achieve this using SAXON?

You haven't said how you are running the validation: From the command line? from the JAXP validation API? From XSLT or XQuery? From the s9api API?
If you are running from the command line then all validation errors are output to System.err with location information about where they were found.
If you are running from an application (via any of the APIs) then errors are notified to an ErrorListener. The default ErrorListener behaves like the command line - it writes details to System.err. If it's a GUI application then you probably won't see this unless you redirect it to some window. Given what you say about your requirements, you would probably be advised to write your own ErrorListener that formats the output in the way you want it. Some of the APIs provide an option to supply a List object into which objects are written representing the validation errors found.
In the next release (9.7) we will have an option to produce all the validation errors in an XML report format.
MORE INFORMATION BASED ON YOUR RESPONSE:
I would recommend using SchemaValidator.setErrorListener() to set your own ErrorListener. The exception passed to the ErrorListener will typically be an instance of net.sf.saxon.type.ValidationException.
If you are validating an in-memory tree, then ValidationException.getNode() on this exception object should give you the node that's invalid, or perhaps the node where the invalidity was detected, which is not quite the same thing.
If you are validating a stream of events, e.g. a SAXSource, then ValidationException.getPath() should give you a path to the node in the form of a string, while ValidationException.getAbsolutePath() should give you a path in structured form.

Related

JAXB skip invalid elements and continue processing

I have an application that needs to process XML files in the following format:
<records>
<record/>
<record/>
<record/>
...
</records>
I am using JAXB to parse these files. However I am trying to prepare my application for the inevitable occurrence for when it is unable to parse one of the records due to some invalid data (for example a character where an int should be).
The problem is that if JAXB is unable to parse an individual record, it halts processing on the entire file. This is not good - I need it to only skip the problematic record, report it, and move on. However I can't discover any way to do this. The only thing I've found is the ValidationEventHandler which lets me return true telling JAXB to continue processing the file in the event of an error, but the problem with that is that it doesn't actually SKIP the problematic record - it tries to parse it even though it's known to be invalid, which causes NumberFormatException and halts processing.
I found this answer How to skip a single jaxb element validation contained in a jaxb collection in Spring Batch Job? but it doesn't actually answer the question, just suggests to use ValidationEventHandler even though that functionality is not sufficient.
How can I skip the invalid records and continue processing? How can I solve this problem?
Typically I wouldn't use JAXB if I knew that the input data will likely contain errors and I need to gracefully recover... STAX might be better suited. But, Jaxb does have a "catch all" you can use: https://docs.oracle.com/javase/7/docs/api/javax/xml/bind/annotation/XmlAnyElement.html

JAXB - getting input Element after unmarshalling

I would like to work with unmarshalled document for comfort reasons, but I also need to have access to the original source XML Elements (to access empty text nodes, because some cryptography is involved). Is there a way to achieve this with JAXB2 (preferably using a maven plugin) or do I need to unmarshall the contents manually?
JAXB isn't going to differentiate between empty and missing nodes, the unmarshalled object is going to have nulls in either case, so if there is a semantic difference between the two, think you'll have to manually parse or use a different parser that can give you the insight you need (not sure, SAX parser might).

How to load XML tags order from XSD with Java?

I have a question:
I. QUESTION
Is there a way/Java-based library by which I can retrieve the order of the XML elements by reading/loading its XSD (in advance)?
II. BACKGROUND
The app I am working on should generate various types of XMLs (feeds), each of which based on a given schema (XSD).
The point is that I can't use the standard approach for serialization - JAXB, as I should generate/stream the XML gradually via Apache Abdera. Thus, I should "serialize" my Java domain objects into the XML (feed), creating from the information in them values for the different tags and writing these tags into the output stream one by one, following the order defined by the XSD.
III. NEEDED FUNCTIONALITY
At the moment, I have a serializer which is converting my domain objects into xml in the way described above, but soon it will need to support several types of schemas and it won't be easily maintainable (not to mention that it'd be very error prone).
IV. POTENTIAL SOLUTION
So, I want to make XML schema-agnostic serializer and to delegate the work for the creation of the values for the different xml elements to some dedicated builders or factories. The order by which they should be invoked though, should be defined by the order of the xml elements by the schema.
And here is coming my quesiton with which I started:
Is there a way/Java-based library by which I can retrieve the order of the XML elements by reading/loading its XSD (in advance)?
V. IF THERE IS SUCH LIBRARY...
Schematically, what the serializer need to do is:
load the types of the xml elements (tags) (with their restrictions of course) in the order they are defined in the XSD
iterate over the loaded types of the xml elements in the order they are loaded and
for each xml element type recognized, delegate the building the content for the corresponding element to an associated builder or factory.
having the value built by the builder/factory, the serializer just wrap it with the tag of the element and flush it into the output stream.
Thanks in advance!

Schema validation, how to display user friendly validation messages?

Is there a way to avoid or set up a schema to display better user friendly messages?
I am parsing the string and using reg ex to interpret them, but there might be a better way.
Ex.
"cvc-complex-type.2.4.b: The content of element 'node' is not complete. One of '{\"\":offer,\"\":links}' is expected."
Instead I want:
"The element 'node' is not complete. The child elements 'offer' and 'links' are expected."
Again, I've solved the problem by creating an extra layer that validates it. But when I have to use a XML tool with a schema validation, the crypt messages are the ones displayed.
Thanks
Not that I know of. You will probably have to create some custom code to adapt your error messages. One way might be to define a set of regular expressions that can pull out the relevant pieces of the validator's error messages and then plug them back into your own error messages. Something like this comes to mind (not optimized, doesn't handle general case, etc. but I think you'll get the idea):
String uglyMessage = "cvc-complex-type.2.4.b: The content of element 'node' is not complete. One of '{\"\":offer,\"\":links}' is expected.";
String findRegex = "cvc-complex-type\\.2\\.4\\.b: The content of element '(\\w+)' is not complete\\. One of '\\{\"\":(\\w+),\"\":(\\w+)}' is expected\\.";
String replaceRegex = "The element '$1' is not complete. The child elements '$2' and '$3' are expected.";
String userFriendlyMessage = Pattern.compile(findRegex).matcher(uglyMessage).replaceAll(replaceRegex);
System.out.println(userFriendlyMessage);
// OUTPUT:
// The element 'node' is not complete. The child elements 'offer' and 'links' are expected.
I suspect those validator error messages are vendor-specific so if you don't have control over the XML validator in your deployed app, this may not work for you.
We're using Schematron for displaying user a friendly error messages if XML he send us is wrong. Our current implementation is a bit simplistic, notably in the following points:
Error messages text is hadcoded into a schematron rules
For each new XML type (i.e. new XSD schema) there is a need to manually add schematron rules
This, however, can be easily fixed, by the following rework:
Schematron rules should contain a unique error message codes, while actual message text selection (including I18n issues) should be done out of validation framework scope
Basic rules can be generated from XSD schema using XSD to Schematron converter (available at http://www.schematron.com/resources.html)
I asked a similar question a while ago.
My conclusion was there is no provided way of mapping the errors, and that it's something you need to do yourself.
Hope someone out there can do better!

What does the org.apache.xmlbeans.XmlException with a message of "Unexpected element: CDATA" mean?

I'm trying to parse and load an XML document, however I'm getting this exception when I call the parse method on the class that extends XmlObject. Unfortunately, it gives me no ideas of what element is unexpected, which is my problem.
I am not able to share the code for this, but I can try to provide more information if necessary.
Not being able to share code or input data, you may consider the following approach. That's a very common dichotomic approach to diagnostic, I'm afraid, and indeed you may readily have started it...
Try and reduce the size of the input XML by removing parts of it, ensuring that the underlying XML document remains well formed and possibly valid (if validity is required in your parser's setup). If you maintain validity, this may require to alter [a copy of] the Schema (DTD or other), as manditory elements might be removed during the cut-and-try approach... BTW, the error message seems to hint more at a validation issue that a basic well-formedness assertion issue.
Unless one has a particular hunch as to the area that triggers the parser's complaint, we typically remove (or re-add, when things start working) about half of what was previously cut or re-added.
You may also start with trying a mostly empty file, to assert that the parser does work at all... There again is the idea to "divide to prevail": is the issue in the XML input or in the parser ? (remembering that there could be two issues, one in the input and one in the parser, and thtat such issues could even be unrelated...)
Sorry to belabor basic diagnostics techniques which you may well be fluent with...
You should check the arguments you are passing to the method parse();
If you are directly passing a string to parse or file or inputstream accordingly (File/InputStream/String) etc.
The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.
The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.
XMLBeans - Problem with XML files if length is exactly 8193bytes
Issue reported on XMLBean Jira

Categories