Issues with JDK 1.8 and java.xml.transform.Transformer

Issues with JDK 1.8 and java.xml.transform.Transformer - java

I'm using java.xml.transform.Transformer to add authentication header to a SOAP request. My application is running on JDK1.8.
When the line of code below is executed I get the subsequent warning message logged to the console
Code snippet:
transformer.transform(authenticationHeader, header.getResult());
Logged warning:
XML Parser does not recognize the feature http://xml.org/sax/features/validation
The output is a warning and does not prevent the request completing successfully however i would like to remove from log files.
I'm using Maven and do not explicitly specify Xerces or Saxon, Xalan etc. My application is using code which was generated using the Maven cxf-codegen-plugin
When I debug this issue i can see that the warning is generated in net.sf.saxon.event.Sender
Can anyone either;
A solution whereby my code uses a valid feature name or
A means to suppress the warning message from my log output

We may need to attack this one from both ends: (a) why is the validation feature being requested, and (b) why is the parser not recognizing it.
(a) Why is it being requested? Saxon will request this feature if the application requests DTD-based validation, for example by doing Configuration.setValidation(true). There are probably various other places this request can be made.
(b) Why is it not recognized? The name http://xml.org/sax/features/validation is documented as a feature of Apache Xerces, but I've no idea if the built-in JDK parser supports it or not: it's not easy to find that documentation. In the XMLReader javadoc, it's not described as a feature that every parser must recognize, but it is used as an example feature name. I always have my environment configured to use Apache Xerces by default, so it require some effort to run tests to see what features are supported if Apache Xerces isn't present.
Perhaps the parser isn't actually the JDK default but some other parser - perhaps a user-written filter that isn't the "real" parser, but filters the output from the real parser (this is quite common, and it's a common mistake for such filters not to pass on configuration settings to the underlying parser). Unfortunately there are some paths where Saxon reports which parser has rejected the request, but that's not the case for this particular path. I'll fix that.
It's not as simple as ignoring the warning. If the application is requesting DTD validation and it isn't happening, that could have serious consequences. Children could die.

Related

Best Practice for large XML file builder

I have to build an XML file for an input to a SOAP service in Java. The input xml can consist of at least 1000 tags. What is the best way to build the XML? I have the XSD files but it is a bit complicated to use JAXB. Is XMLStreamWriter a good option for that?

XMLStreamWriter is one of the better APIs to use for writing XML from a Java application, but it has a few quirks (e.g. its namespace handling is a bit bizarre) and you may find it worthwhile to wrap it in a convenience API that knows about the kind of document you are writing, e.g. what namespaces it uses.
One of the advantages of the XMLStreamWriter interface is that there are plenty of implementations to choose from. For example Saxon has an implementation that gives you full control over all the XSLT/XQuery serialization options plus Saxon extensions (for example, you can even control the output order of attributes!)
One of the problems I hit with all event-based APIs is that sooner or later you find yourself forgetting to write an end tag, and that can be quite tricky to debug. Using a wrapper API that forces you to include the element name in a call on endElement() can be useful for debugging; if debugging is switched on you can keep a stack of element names and check that endElement() is writing the right tag; with debugging switched off you just drop this check.
Serializing using JAXB is higher-level, of course, but the downside is that it gives you less control.

deserialization of java object message read through subscriber

My application reads an XML request from WebSphere MQ and responds with single or multiple java object(s). While i can use JMS point-to-point sampler to post the XML request and subscriber sampler to catch the java object posted back by my application. Now i want the deserialization of the java objects to be able to assert the same. I have the required jar(s) that can help me in deserialization but i am not aware as to how i can perform this in jmeter. Can someone please provider directions as to how i can proceed?

You will need to have all the necessary dependencies in your JMeter's /lib folder.
You can then just add a JSR-223 sampler/post-processor that executes the Java code that you want using those dependencies. You can choose any of the scripting languages there, but be aware of the performance problems that some of them have (BeanShell caused GC lag for me).

Add JSR223 PostProcessor as a child of the JMS P2P Sampler and put the deserializing code into it. When you convert binary response to String you will be able to assign the value to a JMeter Variable as:
vars.put("variableName", variableValue);
and use it in Assertion (JMeter Assertions can target JMeter Variables).
It is recommended to use Groovy as JSR223 element language as JavaScript, Beanshell, etc. interpreters have some performance issues, besides they're quite out of date and Groovy scripts can be compiled into bytecode (assuming test element being properly configured) providing maximum performance.
See Beanshell vs JSR223 vs Java JMeter Scripting: The Performance-Off You've Been Waiting For! guide for instructions on how to setup groovy scripting engine support, best practices in regards to caching/using variables/etc. and some form of different scripting engines benchmark.

Random error with OpenSAML XML Parser Configuration

I'm running a webapp in Tomcat 8 that uses OpenSAML. I've endorsed Xerces within Tomcat, I've checked that the endorsed dir path is set right, it appears that everything is working fine:
[ajp-apr-8009-exec-22] DEBUG org.opensaml.xml.Configuration - VM using JAXP parser org.apache.xerces.jaxp.DocumentBuilderFactoryImpl
I get several requests that work just fine, everything seems great, I can run through that section of code without error, then all of a sudden, I start getting this error:
OpenSAML requires an xml parser that supports JAXP 1.3 and DOM3.
The JVM is currently configured to use the Sun XML parser, which is known
to be buggy and can not be used with OpenSAML. Please endorse a functional
JAXP library(ies) such as Xerces and Xalan. For instructions on how to endorse
a new parser see http://java.sun.com/j2se/1.5.0/docs/guide/standards/index.html
at org.opensaml.xml.Configuration.validateNonSunJAXP(Configuration.java:278)
at org.opensaml.xml.parse.BasicParserPool.<init>(BasicParserPool.java:126)
Once I start getting the error, I will get an error every time but I haven't been able to isolate what it takes to trigger the problem. (Edit: it appears that this may be related in some way to docx4j usage, the errors start after a request that uses docx4j to generate a file as a word document. Since docx4j is so reliant on XML, this maybe makes some sense.)
Basically, what validateNonSunJAXP() does is pretty simple. All it does is check the class name for the DocumentBuilderFactory and if it starts with "com.sun", it throws the error.
Any ideas on what could be going on that would cause the VM to stop using the endorsed library?

docx4j manipulates:
javax.xml.parsers.SAXParserFactory
javax.xml.parsers.DocumentBuilderFactory
javax.xml.transform.TransformerFactory
You can see what it does, at https://github.com/plutext/docx4j/blob/master/src/main/java/org/docx4j/XmlUtils.java
javax.xml.parsers.SAXParserFactory
In summary, you can prevent docx4j from touching this value via a docx4j properties setting.
We found Crimson fails to parse docx4j XSLT files, which is why docx4j by default tries to use Xerces, where it is included in the JDK. (Things may be better in more recent JDKs)
If you don't want this, you can specify different behaviour via docx4j.properties:
docx4j.javax.xml.parsers.SAXParserFactory.donotset=true stops docx4j from changing the setting, or
javax.xml.parsers.SAXParserFactory allows you to specify what you want
Note that we don't restore the value to its original setting since we want to avoid Crimson being used for the life of the application.
javax.xml.parsers.DocumentBuilderFactory
This works similarly to SAXParserFactory
The relevant docx4j properties are as follows:
docx4j.javax.xml.parsers.DocumentBuilderFactory.donotset
javax.xml.parsers.DocumentBuilderFactory
We don't restore the value to its original setting (though maybe we could; would need to review whether docx4j always uses XmlUtils.getNewDocumentBuilder() )

Configuration file with custom settings?

I'm having difficulty attempting to explain what I mean but here I go..
I am looking for a configuration parser that does the following..
Allows me to configure REQUIRED settings and if they are not set then the program should not start.
Allow users to define as many 'custom settings' as they need. So for example, say you want to add some sort of redirect to the configuration, I'd like it to look like this.
redirect-1: 100->200
And that would hopefully 'redirect' 100 to 200. I would obviously build this logic into my program, but is there a library that will read sequential settings like redirect-1, redirect-2, etc. until it reaches the EOF? Hopefully I'm making sense.

I think what you are looking for is a configuration file parser that also provides a "schema validation" engine. Within the syntax of the schema validation language, you would specify which configuration variables are required (and their types), so the validation engine could verify that those variables are present and have values that comply with their types (for example, "42" is a valid integer, but "hello, world" is not).
XML has several competing schema validation languages, including XML Schema and RELAX NG. If you look on Amazon, you can find books on those.
JSON also has its own schema validation language.
The only other configuration language with its own schema validation language I can think of Config4*, which is one that I developed. If you want to read more about Config4*, then I suggest you read Chapters 2 and 3 of the Config4* Getting Started Guide (PDF and HTML) to get an overview of its syntax and API. Then skip to Chapter 9 to find full details of its schema validation language. The ignore rules (discussed in Section 9.2.6) can be used to specify that the schema validation engine should ignore some configuration variables/scopes.

Best XML format for log events in terms of tool support for data mining and visualization?

We want to be able to create log files from our Java application which is suited for later processing by tools to help investigate bugs and gather performance statistics.
Currently we use the traditional "log stuff which may or may not be flattened into text form and appended to a log file", but this works the best for small amounts of information read by a human.
After careful consideration the best bet has been to store the log events as XML snippets in text files (which is then treated like any other log file), and then download them to the machine with the appropriate tool for post processing.
I'd like to use as widely supported an XML format as possible, and right now I am in the "research-then-make-decision" phase. I'd appreciate any help both in terms of XML format and tools and I'd be happy to write glue code to get what I need.
What I've found so far:
log4j XML format: Supported by chainsaw and Vigilog.
Lilith XML format: Supported by Lilith
Uninvestigated tools:
Microsoft Log Parser: Apparently supports XML.
OS X log viewer:
plus there is a lot of tools on http://www.loganalysis.org/sections/parsing/generic-log-parsers/
Any suggestions?

Unfortunately, I can't give you the answer you are looking for, but I would like to warn you of something to consider when logging to XML. For example:
<log>
<msg level="info">I'm a log message</msg>
<msg level="info">I'm another message</msg>
<!-- maybe you won't even get here -->
<msg level="fatal">My server just ate a flaming death
In the above snippet of a potential XML log you can see the biggest drawback of logging to XML. When a catastrophic failure happens, your log format becomes broken because it requires closing tags. However, if you are using a program that parses your primary log output, this shouldn't be too much of a problem.

If you are defining your own XML log file writing, you do not need to worry about having a closing and opening tag in order to produce valid XML. Elijah's answer is right in that you do have the issue if you want to create an XML document, but that is not necessary straight off. The W3 standard also defines XML Entities (see section 4.3 of the W3's XML 1.0 spec, second edition, which unfortunately I cannot link to for you because I do not have enough points), which would be more suitable for log-style continual appending to a file without rewriting parts of it. You can then create a referencing XML wrapper document if you need to work with an actual XML document rather than an XML entity (see http://www.perlmonks.org/?node_id=217788#217797 for an example)

One of the nice things in log4j is that it offers nice possibilities for customizing the log formats and where those are written to.
So instead of choosing some log file format, I'd choose some logging library that allows to change the format and allows also getting the log directly to some program.

I'd advise you consider logback-access for events.
Other than that, anything using JMX, as it was made to match the feature set of SNMP.

It appears that the Lilith log viewer contains an XML-format which is well suited for dealing with the extra facilities available in logback and not only the log4j things.
It is - for now - the best bet so far :)
I adapted the log4j xmllayout class to logback, which works with chainsaw.
As I have not been able to find a suitable log viewer capable of visualizing event information (instead of just presenting all events in a table) I have for now decided to create a very terse xml layout containing machine parsable information based on the above which can then be postprocessed by the Microsoft LogParser to any format I need.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.