Java XSLT transformer with default namepace without xmlns - java

I'm working on some Java code that takes XML in DOM, with no namespace prefixes declared, yet each element has a namespace of http://www.w3.org/1999/xhtml. (This is equivalent to the HTML DOM a browser gets.) The code uses the following to serialize the DOM to a string:
TransformerFactory tf = TransformerFactory.newInstance();
Transformer transformer = tf.newTransformer();
The resulting string looks like this:
…
<html xmlns="http://www.w3.org/1999/xhtml">
…
Note the presence of xmlns="http://www.w3.org/1999/xhtml", which the DOM did not have. In terms of XML, this is entirely correct: if the element uses a namespace (even without a prefix), the namespace must be declared on that element or a an ancestor element; and this being the document element, the namespace declaration must go here.
However HTML is a little different story. The WHATWG HTML5 Specification § 2.1.3 XML compatibility says:
To ease migration from HTML to XML, user agents conforming to this specification will place elements in HTML in the http://www.w3.org/1999/xhtml namespace, at least for the purposes of the DOM and CSS.
In other words, HTML browsers will assume a namespace of http://www.w3.org/1999/xhtml namespace even without a namespace declaration. And typical clean HTML will not have a namespace declaration. And for this particular use case, a namespace declaration is not required.
How can I tell a transformer not to add a default namespace declaration for the document? Alternatively, how can I remove it later without resorting to brute force such as regular expression matching?
Internally the com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImpl creates a com.sun.org.apache.xalan.internal.xsltc.trax.DOM2TO instance, which eventually calls com.sun.org.apache.xml.internal.serializer.ToStream.startPrefixMapping(String prefix, String uri, boolean shouldFlush). Here is the "offending" code that adds the xmlns="http://www.w3.org/1999/xhtml" on the document element:
if (EMPTYSTRING.equals(prefix))
{
name = "xmlns";
addAttributeAlways(XMLNS_URI, name, name, "CDATA", uri, false);
}
But to be more precise, I see that the actual adding of the attribute is done by com.sun.org.apache.xml.internal.serializer.AttributesImplSerializer.addAttribute(String uri, String local, String qname, String type, String val). This class extends org.xml.sax.helpers.AttributesImpl and implements org.xml.sax.Attributes.
Is there some way I can splice my own customized Attributes implementation into a Transformer, so that I can check this special case and forgo adding the xmlns="http://www.w3.org/1999/xhtml" attribute in the appropriate context?
I suppose as a last resort, is there a way to tell the Transformer to be namespace aware, but never to add xmlns declarations that weren't already in the DOM?
(For those who insist in asking where I got a DOM with an HTML namespace without an xmlns declaration, it's irrelevant. Let's assume that I constructed an XML DOM instance programmatically but want to output it as "clean" HTML5, so I remove the default xmlns attribute, but the Transformer is putting it back.)

(Full disclosure: I'm actually fixing a bug in jsoup, which is an HTML parser that accepts dirty HTML as in the wild, and presents it to the application in DOM as a browser would. I fixed a bug that didn't assign the HTML namespace even without a namespace declaration. Now the existing W3CDom.asString(Document doc) serializer method tries to add the xmlns namespace declaration, but users are accustomed to it returning an HTML serialization without the xmlns (which for HTML5 isn't wrong). So I'm trying to keep from breaking code that relies on the original "clean" HTML serialization without rewriting the serializer.)
The following is an ugly kludge, but given the constraints I don't see an alternative. I welcome a better approach!
/**
* Pattern to detect the <code>xmlns="http://www.w3.org/1999/xhtml"</code> default namespace
* declaration when serializing the DOM to HTML. This pattern is "good enough", relying in part
* on the output of the {#link Transformer} used in the implementation, but is not a complete
* solution for all the serializations possible; that is, if one constructed an XML string
* manually, it might be possible to find an obscure variation that this pattern would not
* match.
*/
static final Pattern HTML_DEFAULT_NAMESPACE_PATTERN =
Pattern.compile("<html[^>]*(\\sxmlns=['\"]http://www.w3.org/1999/xhtml['\"])");
/**
* Removes the default <code>xmlns="http://www.w3.org/1999/xhtml"</code> HTML namespace
* declaration if present in the string.
*
* #param html The serialized HTML.
* #return A string without the default <code>xmlns="http://www.w3.org/1999/xhtml"</code> HTML
* namespace declaration.
* #see <a href="https://github.com/jhy/jsoup/issues/1837">Issue #1837: Bug: DOM elements not
* being placed in (X)HTML namespace.</a>
*/
static String removeDefaultHtmlNamespaceDeclaration(String html) {
Matcher matcher = HTML_DEFAULT_NAMESPACE_PATTERN.matcher(html);
if (matcher.find()) {
html = html.substring(0, matcher.start(1)) + html.substring(matcher.end(1));
}
return html;
}

It looks to me as if the DOM was created by an application that put the nodes in the XHTML namespace, and therefore the serializer is entirely correct to serialize them in that namespace. From your description, the application did that because it was parsing HTML5 and that's what the HTML5 specification says it should do.
Part of the problem is that you're using an XSLT 1.0 serializer, and XSLT 1.0 predates XHTML and certainly predates HTML5. Unfortunately, just because W3C or WHATWG issues a proclamation doesn't mean that everyone changes their software. You may have better luck using an XSLT 3.0 serializer (Saxon) with the HTML5 output method, but I don't know what your project constraints are.

It seems to me that since you're relying on some other code to perform the serialization, you will actually need to modify all the elements so that their names are not in the XHTML namespace (if you want a non-kludge solution).
You can recursively traverse all the elements in the document starting from the root element, and use the renameNode method of the DOM Document object to rename them all to have a name which is the same as their old local name, but with no namespace URI.
document.renameNode(element, null, element.getLocalName());
You could do the same renaming in XSLT, but I'm guessing you're probably more comfortable doing it in Java.

Related

Can't register namespace with XMLUnit

I can't seem to work out how to set namespace when comparing XML's using xmlunit-2
Tried like:
#Test
public void testDiff_withIgnoreWhitespaces_shouldSucceed() {
// prepare testData
String controlXml = "<a><text:b>Test Value</text:b></a>";
String testXml = "<a>\n <text:b>\n Test Value\n </text:b>\n</a>";
Map<String, String> namespaces = new HashMap<String, String>();
namespaces.put("text","urn:oasis:names:tc:opendocument:xmlns:text:1.0");
// run test
Diff myDiff = DiffBuilder.compare(Input.fromString(controlXml).build())
.withTest(Input.fromString(testXml).build())
.withNamespaceContext(namespaces)
.ignoreWhitespace()
.build();
// validate result
Assert.assertFalse("XML similar " + myDiff.toString(), myDiff.hasDifferences());
}
but always get
org.xmlunit.XMLUnitException: The prefix "text" for element "text:b"
is not bound.
stripping away the namespace prefix from elements make it work, but I would like to learn how register properly the namespace with DiffBuilder.
The same problem/ignorence I experience with xmlunit-1.x so hints using that library I would appreciate as well.
EDIT, based on the answer
By adding the namespace attribute to the root node I managed to bind the namespace
<a xmlns:text="urn:oasis:names:tc:opendocument:xmlns:text:1.0">
thanks Stefan
NamespaceContext is only used for the XPath associated with the "targets" of comparisions. It is not intended to be used to provide mappings for the XML documents you compare.
There is no way of binding XML namespaces to prefixes outside of the documents themselves in XMLUnit. This means you either must use xmlns attributes or not use prefixes at all.

Using Saxon's DocumentBuilder with JAXB emits a warning about validation

I have an application that uses JAXB (Moxy) and Saxon for running XPath expressions. Everything works as expected, but Saxon's DocumentBuilder emits this warning:
XML Parser does not recognize the feature http://xml.org/sax/features/validation
Code:
Processor proc = new Processor(false);
DocumentBuilder builder = proc.newDocumentBuilder();
XdmNode doc = builder.build(new JAXBSource(jaxbContext, jaxbObject));//The warning occurs here
...
I think what's going on is JAXB is using a StaX parser and Saxon uses SAX. So when Saxon attempts to set the above property on the StaX parser, it fails.
Is there a way to prevent Saxon from setting that property when building the document or, at the very least, suppress that warning? I've tried setting a bunch of different properties on the Processor, but none of them worked. I don't need validation anyway since the document has already been validated and read into a JAXB object.
EDIT: I've been trying to override the errorListener on the Processor, DocumentBuilder, and the JAXBSource, but that message is not going through any of them.
I'll look into this a bit more closely, but the first thing to say is that JAXBSource extends SAXSource, and Saxon treats it exactly like it treats any other SAXSource. The warning in the JAXB documentation "Thus in general applications are strongly discouraged from accessing methods defined on SAXSource" is vacuous - applications (like Saxon) when they are defined to accept an object of class X, don't in general take any notice of advice given in the specifications of a subclass of X that they have never heard of.
The status of the property "http://xml.org/sax/features/validation" is a little unclear. IIRC the SAX specifications don't say that every XMLReader must recognize this property, but they do define it as the only way of requesting DTD expansion, and an XSLT processor needs DTD expansion to take place; if the XMLReader doesn't do DTD expansion then the transformation may fail in unpredictable ways, so a warning is justified.
The cleanest way of suppressing the warning is probably to supply an XMLReader that does recognise this property, and delegate from there to an XMLReader that doesn't recognise it.
private class SaxonLogger extends StandardLogger {
#Override
public void warning(String message) {
if(!message.contains("http://xml.org/sax/features/validation")) {
System.err.println(message);
}
}
}
...
proc.getUnderlyingConfiguration().setLogger(logger);
This will at least suppress those pesky messages. However, I would still like to find a better solution.

JAXB Ignore 'extra' elements from Response XML

I am getting a XML response and it keeps on changing very frequently (nodes keep on increasing or reducing). After each updation in response xml my code breaks as my mapped Java class does not have all fileds.
Is there any way to avoid my code breaking if any changes occurs in response XML.
Any help will be appreciated.
Thanks.
Use JAXB.unmarshal() to simply create Java objects from XML.
By default it is very liberal.
Quoting from the javadoc:
In addition, the unmarshal methods have the following characteristic:
Schema validation is not performed on the input XML. The processing will try to continue even if there are errors in the XML, as much as possible. Only as the last resort, this method fails with DataBindingException.
So what JAXB.unmarshal() does is it tries to "transfer" as much data from XML to Java as possible, and it doesn't care if there is no Java field for an XML element or attribute, and it also doesn't care if there is a Java field for which there is no XML element or attribute.
Example
Let's try to unmarshal the following XML to an instance of java.awt.Point:
<p hi="Yo">
<y>123</y>
<peach>weor</peach>
</p>
The Java code:
String s = "<p hi=\"Yo\"><y>123</y><peach>weor</peach></p>";
Point p = JAXB.unmarshal(new StringReader(s), Point.class);
System.out.println(p); // Prints "java.awt.Point[x=0,y=123]"
We told JAXB.unmarshal() to parse a java.awt.Point instance. The input XML contains an element <y> which can be matched with Point.y so an int was parsed and set to Point.y. No XML data was found for Point.x so it was not touched. There were no match for the attribute hi and the XML element <peach>, so they were simply not used for anything.
We got absolutely no Exception here, and the most that was possible was parsed and transferred from XML to Java.
To cope with unknown fields, you can add a List<Object> property annotated #XmlAnyElement(lax=true)
#XmlAnyElement(lax = true)
private List<Object> anything;
Any elements in the input that do not correspond to explicit properties of the class will be swept up into this list. If the element is known to the JAXBContext you'll get the unmarshalled form (the #XmlRootElement annotated class or a JAXBElement<Foo>), if the element is not known to the context you'll get an org.w3c.dom.Element.
Full details in Blaise's blog.
For nodes that get removed you should be fine as long as you use types that can be null (Integer rather than int, Boolean rather than boolean, etc).

How to create multiple xml request by only changing one field?

I need help to find the approach of how to create xml multiple times while I will be changing only two fields everytime and rest of the fields would be same as now. Please tell me the way to do it in java?
This is the sample xml below:
I would be changing the value of <Id> and <Originator>
<TransactionBlk>
<Id>NIK</Id>
<CorrelationId />
<Originator>NIK</Originator>
<Service>GetIns</Service>
<VersionNbr>1</VersionNbr>
<VersionNbrMin>0</VersionNbrMin>
<MsgNm>Req</MsgNm>
<MsgFormatCd>XML</MsgFormatCd>
</TransactionBlk>
You can crate one class contain all this parameter as class variable, create getter and setter method. Create object of class set value by using setter method.
You can use JAXB API's class to convert your java object into XML format.
The JAXBContext class provides the client's entry point to the JAXB API.
It provides an abstraction for managing the XML/Java binding information
necessary to implement the JAXB binding framework operations: unmarshal,
marshal and validate.
Here is Doc reference for convert your Java object into XML.
Here are tutorial for same Tutorial Link
Sample Code :
#XmlRootElement(name="TransactionBlk_REQ",namespace="http://TransactionBlk.com")
#XmlAccessorType(XmlAccessType.FIELD)
public class TransactionBlk
{
#XmlElement(name = "Id")
private String id;
#XmlElement(name = "Originator")
private String Originator;
//Your getter and setter method.
}
TransactionBlk bean = new TransactionBlk();
//Set your parameter value here
StringWriter responseWriter = new StringWriter();
JAXBContext jaxbContext = JAXBContext.newInstance(TransactionBlk.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.marshal(bean, responseWriter);
String xmlStr = responseWriter!=null?responseWriter.toString():null;
You can use XSLT to transform XML.
If all you're doing is printing a "boilerplate" document with changes in those two values, a, you could use the DPH (Desperate Perl Hacker) approach: simply assemble it as text, pausing to print the values at the appropriate places. To be safe, you should pre-scan the values to make sure they don't contain the <, >, or & characters and escape those if you find them.
For something more complex, or if you want to start learning how to do it "properly", look at the standard XML APIs for Java: DOM (the Document Object Model, an in-memory tree model of a document), SAX (an event-stream view of a document), and JAXP (tools to take an XML document and parse it into DOM or SAX so you can read it, and to take DOM or SAX and write those out as XML syntax). JAXP also provides standard APIs for invoking XPath to search a document and XSLT to apply a stylesheet to a document, so taken together these cover a huge percentage of the basic operations on XML.
You might want to look at some tutorials on using Java to manipulate XML. I'm certainly biased, not least because they published one of my articles, but in my experience IBM's DeveloperWorks website (https://www.ibm.com/developerworks/xml/) has had better-than-average material for learning about XML and other standards.

Setting Namespace Attributes on an Element

I'm trying to create an XML document in Java that contains the following Element:
<project xmlns="http://www.imsglobal.org/xsd/ims_qtiasiv1p2"
xmlns:acme="http://www.acme.com/schemas"
color="blue">
I know how to create the project Node. I also know how to set the color attribute using
element.setAttribute("color",
"blue")
Do I set the xmlns and xmlns:acme attributes the same way using setAttribute() or do I do it in some special way since they are namespace attributes?
I believe that you have to use:
element.setAttributeNS("http://www.w3.org/2000/xmlns/", "xmlns:acme", "http://www.acme.com/schemas");
I do not think below code will serve the question!
myDocument.createElementNS("http://www.imsglobal.org/xsd/ims_qtiasiv1p2","project");
This will create an element as below (using DOM)
<http://www.imsglobal.org/xsd/ims_qtiasiv1p2:project>
So this will not add an namespace attribute to an element. So using DOM we can do something like
Element request = doc.createElement("project");
Attr attr = doc.createAttribute("xmlns");
attr.setValue("http://www.imsglobal.org/xsd/ims_qtiasiv1p2");
request.setAttributeNode(attr);
So it will set the first attribute like below, you can set multiple attributes in the same way
<project xmlns="http://www.imsglobal.org/xsd/ims_qtiasiv1p2>
The short answer is: you do not create xmlns attributes yourself. The Java XML class library automatically creates those. By default, it will auto-create namespace mappings and will choose prefixes based on some internal algorithm.
If you don't like the default prefixes assigned by the Java XML serializer, you can control them by creating your own namespace resolver, as explained in this article:
https://www.intertech.com/Blog/jaxb-tutorial-customized-namespace-prefixes-example-using-namespaceprefixmapper/
You can simply specify the namespace when you create the elements. For example:
myDocument.createElementNS("http://www.imsglobal.org/xsd/ims_qtiasiv1p2","project");
Then the java DOM libraries will handle your namespace declarations for you.
The only way that worked for me, in 2019, was using the attr() method:
Element element = doc.createElement("project");
element.attr("xmlns","http://www.imsglobal.org/xsd/ims_qtiasiv1p2");

Categories