XML UnMarshalling based on child nodes

XML UnMarshalling based on child nodes - java

I recently got an old XML over HTTP API. It has few response types and all those responses have no namespace or type attributes. They all have the same root node and then different set of child nodes.
Is there a way in java to UnMarshall such XMLs ? It would be like using child nodes as discriminator fields. Two sample responses are given below.
<Response>
<A1/>
<A2/>
</Response>
<Response>
<B1/>
<B2/>
</Response>

The best approach really depends on what you want to do. If you just want to unmarshal the data you could define a model using JAXB, for instance, which includes all the potential child elements. Then when you unmarshalled an instance documents only the child elements actually present in the document would have values.
If you instead want to have separate models for the different response variations your best approach would be to use a BufferedInputStream and call mark() at the start, then read enough of the document with a pull parser such as XMLStreamReader to determine the actual response type. Then you can reset() the stream to the start of the document and start over using JAXB with the appropriate data model.

Related

Parse xml with same properties but different parent elements using JAXB

I'm using JAXB to parse some xml provided by a separate service. They provide this xml as different elements but I treat them the same. I.E.
<ElementA>
<name>foo</name>
<type>bar</type>
</ElementA>
<ElementB>
<name>foo1</name>
<type>bar1</type>
</ElementB>
This is how in the data but when I parse/unmarshall it with JAXB I want it to know that ElementA and ElementB are both just instances of the Element class and unmarshall it as such. Currently I am doing this by using the XmlRegistry to say that ElementA is an Element and ElementB is an Element.
However at some point we might receive ElementC and ElementD so I would like a way for JAXB to know those are also just Element without having to add an entry to the XmlRegistry everytime.

Querying XSD-valid XML for the original XML schema

Given a schema document (XSD foramt) such as the MODS 3.5 schema (US Library of Congress, LoC), and a document (XML) known to be valid according to that schema, such as the metadata for the Antitrust & Competition Policy Blog archives 2007 (HTML view) from the LoC Law Blawgs Web Archive, is there a Java API such that would allow a Java program to query the XML document for the XML schema data types that elements of the document are instances of?
It may seem as though I have may XML schemas and UML models confused. I'm thinking of an XML schema as it representing something like a UML model (M1), and an XML document then, like user data (M0) representing instances of UML model elements. If it may be possible, similarly, to query an XML element, to determine the XML schema data type or element definition that the element either derives from or is conformant to in the parse tree, I've thought it could make for a nice feature for a sequencer for ModeShape.
I think, the idea is essentially: That it may be possible to reference the JCR nodes representing XML elements of a sequenced XML document, in a ModeShape JCR repository, to reference each element to a JCR node representing an XML schema data type, such the type's representative JCR node would be defined in the sequencing of the schema used by the document, such as would have been sequenced by the ModeShape XSD sequencer.
I'm simply not certain if there may be an API, in Java, for determining the XML schema element than a valid XML document element -- when the XML document is validated according to an XML schema -- such that the element is conformant to in the parse tree. I'm of an impression that it would be possible to perform such a computation. Simply, I wonder, might there already be an API for that?
Alternately, there is UML...

The answer is yes.
In terms of standards, validating an XML document against a schema produces a PSVI, (post schema validation infoset), and the PSVI decorates nodes in the parse tree with information about what types they were validated against.
In terms of concrete implementation, if you use the JAXP Validation API you can either generate a DOM augmented with TypeInfo that tells you the type of each node, or you can use a SAX-based validation pipeline in which type information is notified to a TypeInfoProvider.
You can also do this using schema-aware XSLT and XQuery; after a validation operation, nodes are augmented with a "type annotation", which you can interrogate using the "instance of" test. If you use Saxon, you can use the extension functions saxon:type() or saxon:type-annotation() to explore further:
http://www.saxonica.com/documentation/#!functions/saxon/type
http://www.saxonica.com/documentation/#!functions/saxon/type-annotation
A limitation of the XSLT/XQuery approach is that it only works if validation succeeds. The DOM/SAX interfaces also provide information in cases where validation fails.

XML formatted Java string manipulation

I'm developing a plugin that has node(computer) objects with attributes like:
String name
String description
String labels
Launcher computerLauncher
...
I can convert the node(computer) object to an XML-formated String like:
String xml = jenkins.instance.toXML(node);
Which gives me a string:
<name>Computer1</name>
<description>This is a description</description>
<labels>label1 label2</labels>
<launcher>windows.object.launcher.12da1</launcher>
Then I can go the other way back:
Node node = jenkins.instance.fromXML(xml);
I have no methods for changing attributes in a Node so I want to convert it to XML, change som attributes and then make it a Node again.
I see two options
Manipulate the XML with some String methods to replace everything in between the <> tags.
Try to cast the XML string to something like a real Object and manipulate it that way.
Not sure what would be the best approach.

Why invent something new when there already is support for all that using Java's DOM (Document Object Model) API?
Use a DocumentBuilderFactory to get a DocumentBuilder and create a Document instance. With this you can create the 'Node' objects (please note that the example you posted is actually not valid XML, it's missing a root node) in your toXML method, serializing the Document to a String could be done by using a Transformer.
With the DOM API you can also modify the attributes of your existing elements.
Parsing the Document instance from an XML string is realized again with the help of the DocumentBuilder, using DocumentBuilder#parse.
If your DOM operations are not too complex this should be a nice, quick way to accomplish your goal.

It makes sense to me to use a DOM-like approach. But don't use DOM itself: there are much better alternatives like JDOM and XOM that have much friendlier APIs.

recenty I had to manipulate large XML files (my software had to create some XML files dynamically and get input data from some other XML files). To do this I've used JAXB, which is a very neat API that marshalls XML files into Java objects and Java objects into XML files automatically.
However to do this I had to create a XSD file to specify the XMLs that I would need to read and write from.
Therefore JAXB requires more work to set up than DOM, so if your needs are simple I suggest that you use DOM, however if your needs are more complex, then I would suggest JAXB.

XML without root element in JAXB

I was wondering whether there is a way to create an object such that a list of such object does not need a root element. For example, if I wanted to create an XML like
<Dogs>
<Dog>A</Dog>
<Dog>B</Dog>
<Dog>C</Dog>
</Dogs>
I could have the class Dogs which would be the root element and has a List<Dog>. Now supposed I want to get rid of the encapsulating element <Dogs>. So that the list of dog would look like
<Dog>A</Dog>
<Dog>B</Dog>
<Dog>C</Dog>
how should I construct my classes?

In XML this is not possible. The specification at http://www.w3.org/TR/xml/#NT-document clearly says that a document has one root element.
Your second XML-like code is therefore not an XML document, but a concatenation of three XML documents. But parsers aren't usually prepared for this kind of input.

How can I get the xml Node type based on schema definition in Java?

Let's say I have a doc.xml and corresponding doc.xsd. I use xpath to retrieve some nodes, so I get a list of org.w3c.dom.Node. How can I get type of each node from schema, eg. xs:integer, xs:string etc ?
Some solution would be to parse schema with xpath query "//NodeName[#type]" using node.getNodeName() as NodeName, but that's not perfect. I can't be sure that schema is elegant - what if NodeName exists in many places in schema and has not been extracted as a separate type?
So generally I am looking for a reliable solution to get the node type for ANY valid xml & xsd.

You should consider using JAXB. It will create Java classes for you based on the schema type. Then your XML docs are read into those classes, which are typed according to how you defined your XSD. Therefore xsd:int maps to java int(or Integer wrapper class, I can't recall), etc.

Cast your DOM Elements to TypeInfo: from there, you can access the type information you're looking for.

Unfortunately types as defined in an XML Schema (XSD) or Document Type Definition (DTD) are not directly tied to XML document they validate. The elements and attributes in an XML document do not inherently have a type they are just text. Think of an XSD as a script that validates an XML document rather than a set of type annotations for elements and attributes.
The XML specification does not define types as you are thinking of them here. Even Document Type Definitions (DTD) which can be embedded inside XML documents more about the structure of the document not the type of the data contained in elements and attributes.
The type system described in XML Schema is an optional layer of validation that can be applied to XML documents. Since this validation optional the standard XML APIs do not provide a way to bind the validation rules in an XSD to the actual attributes and elements.
I think it would be possible for an XML API to provide a mechanism to bind an XSD to a specific XML document, but I am not aware of an XML parser that does this. One reason why this is not so easy is that the type system that is defined in XML Schema is much richer than is supported in most mainstream programming languages. In your example you may only be interested in xs:integer, xs:string and the like but in XML Schema you can create types that specify ranges, patterns and other things that are just not possible with data types in most programming languages. To represent this complex type system in Java or any programming language would have to be done through a fairly complex API. The the question becomes it is really worth it? I would say probably not.

As per David Ds answer, slightly cleaner, call getSchemaTypeInfo() on an element or attribute

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.