Schema validation, how to display user friendly validation messages? - java

Is there a way to avoid or set up a schema to display better user friendly messages?
I am parsing the string and using reg ex to interpret them, but there might be a better way.
Ex.
"cvc-complex-type.2.4.b: The content of element 'node' is not complete. One of '{\"\":offer,\"\":links}' is expected."
Instead I want:
"The element 'node' is not complete. The child elements 'offer' and 'links' are expected."
Again, I've solved the problem by creating an extra layer that validates it. But when I have to use a XML tool with a schema validation, the crypt messages are the ones displayed.
Thanks

Not that I know of. You will probably have to create some custom code to adapt your error messages. One way might be to define a set of regular expressions that can pull out the relevant pieces of the validator's error messages and then plug them back into your own error messages. Something like this comes to mind (not optimized, doesn't handle general case, etc. but I think you'll get the idea):
String uglyMessage = "cvc-complex-type.2.4.b: The content of element 'node' is not complete. One of '{\"\":offer,\"\":links}' is expected.";
String findRegex = "cvc-complex-type\\.2\\.4\\.b: The content of element '(\\w+)' is not complete\\. One of '\\{\"\":(\\w+),\"\":(\\w+)}' is expected\\.";
String replaceRegex = "The element '$1' is not complete. The child elements '$2' and '$3' are expected.";
String userFriendlyMessage = Pattern.compile(findRegex).matcher(uglyMessage).replaceAll(replaceRegex);
System.out.println(userFriendlyMessage);
// OUTPUT:
// The element 'node' is not complete. The child elements 'offer' and 'links' are expected.
I suspect those validator error messages are vendor-specific so if you don't have control over the XML validator in your deployed app, this may not work for you.

We're using Schematron for displaying user a friendly error messages if XML he send us is wrong. Our current implementation is a bit simplistic, notably in the following points:
Error messages text is hadcoded into a schematron rules
For each new XML type (i.e. new XSD schema) there is a need to manually add schematron rules
This, however, can be easily fixed, by the following rework:
Schematron rules should contain a unique error message codes, while actual message text selection (including I18n issues) should be done out of validation framework scope
Basic rules can be generated from XSD schema using XSD to Schematron converter (available at http://www.schematron.com/resources.html)

I asked a similar question a while ago.
My conclusion was there is no provided way of mapping the errors, and that it's something you need to do yourself.
Hope someone out there can do better!

Related

XSD validation for multiple entries in a XML

I am using SAXON parser for XSD validation in Java. If we use a XML with single element it works fine. Even if we have multiple elements it works fine. But we are unable to identify which element failed and which passed. To be more clear, we have an XSD to validate a simple xml file with a root element and the other elements within are <person> <employment></employment></person>. The is repeatable element.I have an xml like below with errors.
<person>
<employment>correct elements inside</employment>
<employment>wrong elements inside </employment>
</person>
I am performing a XSD validation for above xml. It fails overall due to error in second <employment> entry. But what I need is to identify that first employment passed and second one failed.
How can I achieve this using SAXON?
You haven't said how you are running the validation: From the command line? from the JAXP validation API? From XSLT or XQuery? From the s9api API?
If you are running from the command line then all validation errors are output to System.err with location information about where they were found.
If you are running from an application (via any of the APIs) then errors are notified to an ErrorListener. The default ErrorListener behaves like the command line - it writes details to System.err. If it's a GUI application then you probably won't see this unless you redirect it to some window. Given what you say about your requirements, you would probably be advised to write your own ErrorListener that formats the output in the way you want it. Some of the APIs provide an option to supply a List object into which objects are written representing the validation errors found.
In the next release (9.7) we will have an option to produce all the validation errors in an XML report format.
MORE INFORMATION BASED ON YOUR RESPONSE:
I would recommend using SchemaValidator.setErrorListener() to set your own ErrorListener. The exception passed to the ErrorListener will typically be an instance of net.sf.saxon.type.ValidationException.
If you are validating an in-memory tree, then ValidationException.getNode() on this exception object should give you the node that's invalid, or perhaps the node where the invalidity was detected, which is not quite the same thing.
If you are validating a stream of events, e.g. a SAXSource, then ValidationException.getPath() should give you a path to the node in the form of a string, while ValidationException.getAbsolutePath() should give you a path in structured form.

how to create a new word from template with docx4j

I have the following scenario, and need some advice:
The user will input a word document as a template, and provide some parameters in runtime so i can query my database and get data to fill the document.
So, there are two basic things i need to do:
Replace every key in the document with it´s respective result from the current query line.
"Merge" (copy? duplicate?) the existing document unchanged into itself (append) depending on how many rows i got from the query, and replacing the keys from this new copy with the next row values.
What´s is the best aprroach to do this? I´ve managed to do the replace part for now, by using the unmarshallfromtemplate providing it a hashmap.
But this way is a little bit tricky, because i need to add "${variable_name}" in the document, and sometimes word separates "${" and "}" in different tags, causing issues.
I´ve read about the custom xml binding, but didn´t understand it completely. I need to generate a custom XML, inject it in the document (all of this un runtime) and call the applybindings?? If this is true, how would i bind the fields in the document to the xml ? By name?
docx4j includes VariablePrepare, which can tidy up your input docx so that your keys are not split across separate runs.
But, you would still be better off switching to content control data binding, particularly if you have repeated data (think for example of line items in an invoice). Disclosure: I champion this approach in docx4j.
To adopt the content control data binding approach:
dream up an XML format which makes sense for your data, and write some code to convert the results of your database query into that format.
modify your template, so that the content controls are bound to elements in your XML document. ordinarily you'd use an authoring add-in for Word to help with this. (The technology Microsoft uses for binding is XPath, so how you bind depends on your XML structure, but, yes, you'd typically bind to the element name or ID).
now you have your XML file and a suitable intput docx, ContentControlsMergeXML contains the code you need to create an instance document at run time. There's also a version of this for a servlet environment at https://github.com/plutext/OpenDoPE-WAR
As an alternative to 1 & 2, there is also org.docx4j.model.datastorage.migration.FromVariableReplacement in current nightlies, which can convert your existing "${" document. Only to a standardised target XML format though.
If you have further questions, there is a forum devoted to this topic at http://www.docx4java.org/forums/data-binding-java-f16/

Can I use ANTLR for both two-way parsing/generating?

I need to both parse incoming messages and generate outgoing messages in EDIFACT format (basically a structured delimited format).
I would like to have a Java model that will be generated by parsing a message. Then I would like to use the same model to create an instance and generate a message.
The first half is fine, I've used ANTLR before to go from raw -> Java objects. But I've never done the reverse, or if I have it's been custom.
Does ANTLR support generating using a grammar or is it really just a parse-only tool?
EDIT:
Expansion - I want to define two things ideally. A grammar that describes the raw message (EDIFACT in this case but pretend it's CSV if you like). And a Java object model.
I know I can write an ANTLR grammar to get from the raw -> Java model. e.g. Parsing a SQL string -> Java model which I've done before. But I need to go the other way as well ideally without changing the grammar.
If you liken it to JAXB (XML world), I really want JAXB for EDIFACT (rather than XML).
Can ANTLR do what you are asking, YES. Although it might require multiple grammers.
To me, this sounds like you want to create a AST from your parser. Have one tree walker doing all the java object creation required (second grammer possibly). And then a second tree walker to create the output messages (third grammer), and you can even use StringTemplate if you want. Maybe you can get away with two grammers.
But at this point actual details would have to be given for any more help, what the AST will look like for a specific input and what the output message should be.
I have never done it myself (I also used ANTLR for parsing only) but I know for sure that ANRLR can be used as a generator as well.
in fact, it's using a library called stringtemplates for it's own code generation (by the same author).

What does the org.apache.xmlbeans.XmlException with a message of "Unexpected element: CDATA" mean?

I'm trying to parse and load an XML document, however I'm getting this exception when I call the parse method on the class that extends XmlObject. Unfortunately, it gives me no ideas of what element is unexpected, which is my problem.
I am not able to share the code for this, but I can try to provide more information if necessary.
Not being able to share code or input data, you may consider the following approach. That's a very common dichotomic approach to diagnostic, I'm afraid, and indeed you may readily have started it...
Try and reduce the size of the input XML by removing parts of it, ensuring that the underlying XML document remains well formed and possibly valid (if validity is required in your parser's setup). If you maintain validity, this may require to alter [a copy of] the Schema (DTD or other), as manditory elements might be removed during the cut-and-try approach... BTW, the error message seems to hint more at a validation issue that a basic well-formedness assertion issue.
Unless one has a particular hunch as to the area that triggers the parser's complaint, we typically remove (or re-add, when things start working) about half of what was previously cut or re-added.
You may also start with trying a mostly empty file, to assert that the parser does work at all... There again is the idea to "divide to prevail": is the issue in the XML input or in the parser ? (remembering that there could be two issues, one in the input and one in the parser, and thtat such issues could even be unrelated...)
Sorry to belabor basic diagnostics techniques which you may well be fluent with...
You should check the arguments you are passing to the method parse();
If you are directly passing a string to parse or file or inputstream accordingly (File/InputStream/String) etc.
The exception is caused by the length of the XML file. If you add or remove one character from the file, the parser will succeed.
The problem occurs within the 3rd party PiccoloLexer library that XMLBeans relies on. It has been fixed in revision 959082 but has not been applied to xbean 2.5 jar.
XMLBeans - Problem with XML files if length is exactly 8193bytes
Issue reported on XMLBean Jira

JAXB Compiler and Attribute Order [duplicate]

This question already has answers here:
Using XSL to sort attributes
(2 answers)
Closed 2 years ago.
I would like to control the attribute order in .java files generated by the JAXB compiler.
I'm aware that attribute order is not important for xml validation. The order is important for textual comparison of marshalled xml in a regression test environment. The order of attributes in a file directly affects the order of the attributes in marshalled xml tags.
Every time the JAXB compiler is run attribute groups appear in a different order, even with no changes to the schema. There is no apparent option available on the compiler to prevent this behavior.
I would like to avoid running a post-compilation script to alphabetically reorder attributes in the generated .java files since this breaks up the attribute groups, but I'm not sure there is another option.
Any suggestions are much appreciated.
Thanks,
Dave
Apparently, in JAXB 2.0 you can use the annotation #XmlAccessorOrder or #XmlType(propOrder=)
I'd recommend using an XML parser to validate the output instead of doing textual comparisons. If you're going to be parsing the xml to re-order it anyway, you may as well just do the comparison using XML tools.
Edit:
Attempting to control the generated XML by manipulating the Java source code order seems like a fragile way of doing things. Granted, this is for testing only, so if something breaks the code might still work properly. People change source code order all the time, sometimes by accident, and it will be annoying or a subtle source of problems if you have to rely on a certain ordering.
As for ways of comparing the XML data using XML tools, I've never personally done this on a large scale, but this link mentions a few free tools. For me the extension to JUnit that provides XML-related assertions would be my first step, as that could integrate well with my existing tests. Otherwise, since you're mainly looking for exact equivalence, you could just parse the two XML files, then iterate over the nodes in the 'expected' file and see if those nodes are present in the 'actual' file. Then just check for any other nodes that you don't expect to see.
If you need to perform textual comparison of XML documents, there are better ways of doing it than by trying to control the output of an XML framework that does not distinguish between attribute ordering.
For example, there's XMLUnit, which is a junit extension specifically for XML assertions, and it handles whitespace and ordering quite nicely.
A more general solution is XOM's Canonicalizer, which outputs XML DOMs such that the attribute ordering and whitespace is predictable. Very handy.
So... let JAXB (or whatever) generate the XML as it sees fit, then run the outputs through XMLUnit or XOM, and compare. This has the added advantage of not depending on JAXB, it'll work with any generated XML.

Categories