to split a String from an XML file - java

i'm reading an xml file which contain the attributes of many objects of a Class; I'm using a DOM xml parser; in this Class there is also an array field (NEARBOXID), so I would know if is a good way to read it from the xml file like a single String and then split its content, or is there any better way to do this?
the file is like this:
<CONFIGURATION>
<CONFIG>
<BOXID>1</BOXID>
<LENGTH>100</LENGTH>
<NEARBOXID>2,3,4,5</NEARBOXID>
</CONFIG>
<CONFIG>
<BOXID>2</BOXID>
<LENGTH>200</LENGTH>
<NEARBOXID>1,8</NEARBOXID>
</CONFIG>

You should read as string and split it. Using a loop convert the numbers into integer using ParseInt

No, it's up to you to split that field. The String.split method will do it just fine.

You need to extract the complete data within the required tag using XPath, and later use String.split(), to get the desired values out of the complete string.

Since you are converting your XML to Java objects, I will demonstrate how this could be done using a JAXB (JSR-222) implementation. A JAXB implementation is included in the JDK/JRE starting with Java SE 6.
I would recommend changing the contents of the NEARBOXID element to be space separated.
<NEARBOXID>2 3 4 5</NEARBOXID>
The corresponds to the following entry in an XML schema. This means that you could validate that the element contains space separated int values instead of space separated Strings.
<xs:element name="NEARBOXID" minOccurs="0">
<xs:simpleType>
<xs:list itemType="xs:int"/>
</xs:simpleType>
</xs:element>
Config
Then you could map the element using JAXB's #XmlList annotation (see: http://blog.bdoughan.com/2010/09/jaxb-collection-properties.html).
import javax.xml.bind.annotation.*;
#XmlAccessorType(XmlAccessType.FIELD)
public class Config {
#XmlElement(name="BOXID")
private int boxId;
#XmlElement(name="LENGTH")
private int length;
#XmlElement(name="NEARBOXID")
#XmlList
private int[] nearBoxIds;
}
Configuration
The object below would map to the root of your XML document.
import java.util.List;
import javax.xml.bind.annotation.*;
#XmlRootElement(name="CONFIGURATION")
#XmlAccessorType(XmlAccessType.FIELD)
public class Configuration {
#XmlElement(name="CONFIG")
private List<Config> configs;
}
Demo
Below is some demo code to prove that everything works.
import java.io.File;
import javax.xml.bind.*;
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Configuration.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
File xml = new File("src/forum14305301/input.xml");
Configuration configuration = (Configuration) unmarshaller.unmarshal(xml);
Marshaller marshaller = jc.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.marshal(configuration, System.out);
}
}
input.xml/Output
Below is the input to and output from running the demo code.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<CONFIGURATION>
<CONFIG>
<BOXID>1</BOXID>
<LENGTH>100</LENGTH>
<NEARBOXID>2 3 4 5</NEARBOXID>
</CONFIG>
<CONFIG>
<BOXID>2</BOXID>
<LENGTH>200</LENGTH>
<NEARBOXID>1 8</NEARBOXID>
</CONFIG>
</CONFIGURATION>

I think it is a matter of choice. You can use it that way as:
<NEARBOXID>2,3,4,5</NEARBOXID>
or you use it that way:
<NEARBOXID>2</NEARBOXID>
<NEARBOXID>3</NEARBOXID>
<NEARBOXID>4</NEARBOXID>
<NEARBOXID>5</NEARBOXID>
Second way requires you to have different parsing code than the first one, but you can save some code on splitting and dealing with string.

Extract the data as a String and perform split() on it.
Better approach for it is to maintain in individual tags like
<NEARBOXID>2</NEARBOXID>
<NEARBOXID>3</NEARBOXID>
<NEARBOXID>4</NEARBOXID>
<NEARBOXID>5</NEARBOXID>
For parsing the xml query you can use XPath, XQuery or JAXB depending on your requirement.

Related

JAXB java class generation from schema: how to get a custom XML element name (keep class name)

I have an XSD with an element - it is the XMLRootElement if that makes a difference - like this:
<xsd:element name= "SomeElement">
I need to have the generated Java class have a custom XML element name while keeping the default Java class name, so the generated class needs to look like this:
#XmlRootElement(name = "fo:SomeElement")
public class SomeElement
So that marshal/unmarshalled xml elements will show as
<fo:SomeElement>
Can someone help me out with what I need to change to either the XSD file or the binding file?
First of all, with your question you opened a big can of worms.
Things are more complicated than you thought they would be.
To fully understand the rest of this answer
you will surely need to learn more about the namespace concept in XML,
for example at w3schools.com - XML Namespaces.
Having said that, the following stuff should give a quick entry into that topic.
Note that fo:SomeElement is not directly an XML element name.
The fo: is a so-called namespace-prefix.
The namespace-prefix needs to be mapped to a namespace-URI by xmlns:fo="...",
By convention fo: is the namespace-prefix used for XSL Formatting Objects.
Therefore most probably your XML file will look like this:
<fo:SomeElement xmlns:fo="http://www.w3.org/1999/XSL/Format" ...>
...
</fo:SomeElement>
Note that "http://www.w3.org/1999/XSL/Format" is the namespace-URI
as specified in the XSL Formatting Objects specification.
Note also, that namespace-prefixes (here fo) by themselves are irrelevant
and were only invented to make the XML content easier to read for humans.
So instead of fo you might as well have used bla in all places as the namespace-prefix,
and the XML content still would have the exact same meaning.
The only relevant things are the namespace-URIs (here "http://www.w3.org/1999/XSL/Format").
With JAXB the correct Java root class would look like this.
Note the namespace given in the #XmlRootElement annotation.
#XmlRootElement(name="SomeElement", namespace="http://www.w3.org/1999/XSL/Format")
public class SomeElement {
...
}
You would need to specify this namespace-URI not only in #XmlRootElement,
but also for nested Java properties corresponding to any <fo:something> XML content.
Fur this purpose most JAXB annotations (#XmlElement, #XmlAttribute, ...)
can accept a namespace parameter as well.
The XML schema definition (XSD) consistent with the XML example and
the Java class above would look like this.
Note the targetNamespace given in the <xs:schema> element.
<xs:schema version="1.0" targetNamespace="http://www.w3.org/1999/XSL/Format"
xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="SomeElement">
...
</xs:element>
</xs:schema>

JAXB marshalling dateTime to blank value when time not specified

I've got an XMLGregorianCalendar object that I'm trying to marshal into an xml string. I received this object by unmarshalling another xml object. Both are of type "dateTime", so they should be exactly the same...
And yet, when I marshal it, it shows up blank in the xml.
To illustrate this issue, I stripped everything down to the bare bones and made it generic in this example here. 2 java files, copy, paste, run as-is. The output you should receive would be:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TheObject>
<DOB>2016-09-16</DOB>
</TheObject>
But, alas, it returns:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<TheObject>
<DOB></DOB>
</TheObject>
Note: in the pastebin example, I create a xmlGregorianCalendar on the fly rather than grab one from another object like the code below, so technically it's not the same thing, but I think ultimately it illustrates the exact same issue... correct me if I'm wrong...
To add more context to my specific issue:
//Here are the objects themselves (names changed to protect the innocent)
//complete with annotations...
public class Object1{
...
#XmlElement(name = "DOB")
#XmlSchemaType(name = "dateTime")
protected XMLGregorianCalendar dob;
...
}
public class Object2{
...
#XmlElement(name = "DOB")
#XmlSchemaType(name = "dateTime")
protected XMLGregorianCalendar dob;
...
}
//and here's the snippet where the date object(date of birth) gets set
//from one object to another.
object2.setDOB(object1.getDOB());
//and finally, marshalling it to an xml string
private String marshallTheObject(Object2 theObject) throws JAXBException{
JAXBContext jaxbContext = JAXBContext.newInstance(Object2.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
StringWriter sw = new StringWriter();
jaxbMarshaller.marshal(object2, sw);
String output = sw.toString();
return output;
}
//the xml output shows: <DOB></DOB> instead of the date
I'm using the jaxb version that java 8 comes bundled with...
So my question is: is this some kind of bug? And if not, what am I doing wrong? How can I get around this issue without having to modify the generated java code? I cannot edit the xsd file used to generate it either...
Edit: for reference, the xsd file lists the DOB as:
<xs:element name="DOB" type="xs:dateTime" />
While I wasn't allowed to modify the xsd file directly, what I didn't realize was that I could change how the classes themselves were generated through a bindings file. I wasn't using maven in this case, so it was a bit difficult to determine the solution. I was generating the classes by using a feature built into eclipse that generates jaxb classes from an xsd file.
I learned more about bindings files here
I'm not exactly sure where that file needs to be placed in respect to maven, but in doing it the eclipse way it doesn't matter - you get to specify where the bindings file is during the jaxb class generation wizard.
Once generated, you'll need to code your own xml adapter.
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import javax.xml.bind.annotation.adapters.XmlAdapter;
public class CalendarDateTimeAdapter extends XmlAdapter<String, Date> {
//Sadly my specific situation requires me to strip the time off of all dateTime objects
//It's bad, but I didn't get to design the system, so this is the best compromise...
private final SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
#Override public Date unmarshal(String value) throws ParseException {
synchronized (sdf){
return sdf.parse(value);
}
}
#Override public String marshal(Date value) {
if(value == null) { return null; }
synchronized(sdf){
return sdf.format(value);
}
}
}
Make sure your classes match what you specified in your bindings file...
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE jaxb:bindings>
<jaxb:bindings version="2.1"
xmlns:jaxb="http://java.sun.com/xml/ns/jaxb"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xjc="http://java.sun.com/xml/ns/jaxb/xjc">
<!-- Prevent wrapping data types into JAXBElements. -->
<jaxb:globalBindings generateElementProperty="false">
<!-- Use java.util.Date instead of XMLGregorianCalendar. -->
<xjc:javaType name="java.util.Date" xmlType="xs:dateTime"
adapter="com.package.location.adapter.CalendarDateTimeAdapter"/>
<xjc:javaType name="java.util.Date" xmlType="xs:date"
adapter="com.package.location.adapter.CalendarDateAdapter"/>
<xjc:javaType name="java.util.Date" xmlType="xs:time"
adapter="com.package.location.adapter.CalendarTimeAdapter"/>
</jaxb:globalBindings>
</jaxb:bindings>
Then, when doing your marshalling, use the setAdapter function to use it.
private String marshallObject(MyObject myObject) throws JAXBException{
JAXBContext jaxbContext = JAXBContext.newInstance(MyObject.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
jaxbMarshaller.setAdapter(new CalendarDateTimeAdapter());
StringWriter sw = new StringWriter();
jaxbMarshaller.marshal(myObject, sw);
String output = sw.toString();
return output;
}
Now they'll resolve to Date objects instead of XMLGregorianCalendar objects. And apparently, Date objects work better than XMLGregorianCalendars...
I'm still perturbed that the unmarshalled xml that went in doesn't marshal to become the same xml going out, but at any rate, this is precisely what I did to make it all start working. I'm sure I've done something against convention here, and if I have, please let me know.
Again I remind readers that I'm not using maven, nor any kind of framework (e.g. SpringMVC).
JAXB doesn't like that you're setting the time to DatatypeConstants.FIELD_UNDEFINED for a dateTime. If you comment out that line:
// calendar.setTime(DatatypeConstants.FIELD_UNDEFINED, DatatypeConstants.FIELD_UNDEFINED, DatatypeConstants.FIELD_UNDEFINED);
it will marshal something like:
<DOB>2016-09-16T00:00:00.000</DOB>
or you can change the generated annotation (or used customized bindings) to change dateTime to date like:
#XmlSchemaType(name = "date")
protected XMLGregorianCalendar dob;
if you want to see:
<DOB>2016-09-16</DOB>
If you want to go deeper, something like this may be a starting point, or maybe that's all you need.

Annotation of XML Schema in Java

I'm working on an xml standard that requires that the following root element must be defined:
<ClinicalDocument xsi:schemaLocation=”urn:hl7 org:v3 CDA.xsd” xmlns=”urn:hl7-
org:v3” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”>
Now, I'm using java.xml.bind. Usually I annotate each class and then I use Marshallers and Unmarshallers to write/read valid xml files.
"My idea" was to annotate the package-info.java to specify the xsi:schemaLocation, xmlns and xmlns:xsi properties of ClinicalDocument. However, I can only insert the last property (xmlns:xsi), while I have no idea of how to render the first and furthermore the second is rendered as xmlns:ns3.
Here is my code in package-info.java:
#javax.xml.bind.annotation.XmlSchema (
xmlns = {
#javax.xml.bind.annotation.XmlNs(prefix="",
namespaceURI="urn:hl7-org:v3"),
#javax.xml.bind.annotation.XmlNs(prefix="xsi",
namespaceURI="http://www.w3.org/2001/XMLSchema-instance")
}
)
package foo;
Here is my class ClinicalDocument.java in package foo:
package foo;
#XmlRootElement(name="ClinicalDocument")
public class ClinicalDocument {....}
And finally is what I get with the Marshaller:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ClinicalDocument xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns3="urn:hl7-org:v3">
...
</ClinicalDocument>
So, I have to create and read valid xml file under the three properties shown above. Any idea?
The only valid solution that I found is to add:
#XmlAttribute(name="xsi:schemaLocation")
protected final String xsi_schemaLocation="urn:hl7 org:v3 CDA.xsd";
#XmlAttribute(name="xmlns")
protected final String xmlns="urn:hl7-org:v3";
#XmlAttribute(name="xmlns:xsi")
protected final String xmlns_xsi="http://www.w3.org/2001/XMLSchema instance";
in class ClinicalDocument.
It works, but I don't like it! I would like to use annotation at package level.
Supporting the annotations is only the beginning of the requirements for reading and writing CDA documents - I would recommend the use of MDHT, open source project with an API to create, consume and validate CDA documents.
You can find the project here
https://www.projects.openhealthtools.org/sf/projects/mdht/

Jettison or Kryo

I currently use JAXB for a project i'm working on and looking to convert my libraries archived xml to archived json, To act in my project. I figured I would use Jettison as it seems it would be easier to implement since it actually works with JAXB; however, looking at Older benchmarks in which Jettison was not included I have found Kryo produces smaller files and Serialized and DeSerialized quicker than some alternatives.
Can anyone inform me of the key difference or otherwise how Jettison stacks up to Kryo, especially for future projects such as android applications.
EDIT:
I guess i'm looking for what produces smaller files and operates faster. Human readability can be sacrificed since I don't plan on reading the files only processing them
They are for somewhat different purposes:
Jettison is for reading/writing JSON. Use it if you need to interoperte with a JSON (human-readable) data format
Kryo is for efficient binary serialisation. Use it if you need high performance and small encoded object sizes (e.g. communication of messages in a realtime game).
Since it sounds like you are using the format to archive data, human-readability and the use of a standard long-lived format is probably more important than efficiency, so I suspect you will want to choose the JSON route.
Note: I'm the EclipseLink JAXB (MOXy) lead and a member of the JAXB (JSR-222) expert group.
Since you already have established JAXB mappings and are converting XML to JSON, you may be interested in EclipseLink JAXB (MOXy) which offers both object-to-XML and object-to-JSON mapping using the same JAXB metadata.
Customer
Below is a sample model with JAXB annotations.
package forum11599191;
import java.util.List;
import javax.xml.bind.annotation.*;
#XmlRootElement
#XmlAccessorType(XmlAccessType.FIELD)
public class Customer {
#XmlAttribute
private int id;
private String firstName;
#XmlElement(nillable=true)
private String lastName;
private List<String> email;
}
jaxb.properties
To use MOXy as your JAXB provider you need to include a file called jaxb.properties in the same package as your domain model with the following entry (see: http://blog.bdoughan.com/2011/05/specifying-eclipselink-moxy-as-your.html).
javax.xml.bind.context.factory=org.eclipse.persistence.jaxb.JAXBContextFactory
input.xml
<?xml version="1.0" encoding="UTF-8"?>
<customer id="123" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<firstName>Jane</firstName>
<lastName xsi:nil="true"/>
<email>jdoe#example.com</email>
</customer>
Demo
The following demo code will populate the objects from XML and then output JSON. Note how there are no compile time dependencies on MOXy.
package forum11599191;
import java.io.File;
import javax.xml.bind.*;
public class Demo {
public static void main(String[] args) throws Exception {
JAXBContext jc = JAXBContext.newInstance(Customer.class);
// Unmarshal from XML
Unmarshaller unmarshaller = jc.createUnmarshaller();
File xml = new File("src/forum11599191/input.xml");
Customer customer = (Customer) unmarshaller.unmarshal(xml);
// Marshal to JSON
Marshaller marshaller = jc.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty("eclipselink.media-type", "application/json");
marshaller.marshal(customer, System.out);
}
}
JSON Ouput
Below is the output from running the demo code.
{
"customer" : {
"id" : 123,
"firstName" : "Jane",
"lastName" : null,
"email" : [ "jdoe#example.com" ]
}
}
A few things to note about the output:
Since the id field is a numeric type it was marshalled to JSON without quotes.
Even though the id field was mapped with #XmlAttribute there are no special indication of this in the JSON message.
The email property had a List of size one, this is properly represented in the JSON output.
The xsi:nil mechanism was used to specify that the lastName field had a null value, this has been translated to the proper null representation in the JSON output.
For More Information
http://blog.bdoughan.com/2011/08/json-binding-with-eclipselink-moxy.html
http://blog.bdoughan.com/2012/04/binding-to-json-xml-handling-null.html
http://blog.bdoughan.com/2011/04/jaxb-and-json-via-jettison.html

JAXB - Ignore element

Is there any way to just ignore an element from Jaxb parsing?
I have a large XML file, and if I could ignore one of the large, complex elements, then it would probably parse a lot quicker.
It would be even better if it could not even validate the element contents at all and parse the rest of the document even if that element is not correct.
ex:this should only generate Foo.element1 and Foo.element2
<foo>
<element1>I want this</element1>
<element2>And this</element2>
<bar>
<a>ALL of bar should be ignored</a>
<b>this also should be ignored</b>
<c>
<x>a lot of C that take time to process</x>
</c>
<c>
<x>a lot of C that take time to process</x>
</c>
<c>
<x>a lot of C that take time to process</x>
</c>
<c>
<x>a lot of C that take time to process</x>
</c>
</bar>
</foo>
Assuming your JAXB model looks like this:
#XmlRootElement(name="foo")
public class Foo {
#XmlElement(name="element1")
String element1;
#XmlElement(name="element2")
String element2;
#XmlElement(name="bar")
Bar bar;
}
then simply removing the bar field from Foo will skip the <bar/> element in the input document.
Alternatively, annotated the field with #XmlTransient instead of #XmlElement, and it will also be skipped.
JAXB will ignore any unmapped properties.
Implementation wise (atleast in EcliseLink JAXB (MOXy), which I lead). When we are processing the contents via a SAX parser (i.e. the input was a SAXSource) then we swap out our ContentHandler that is responsible for building objects to one that does no processing for that section (org.eclipse.persistence.oxm.unmapped.UnmappedContentHandler). When a we are using processing the contents via a StAX parser we just advance to the next mapped event.
If you do have a property that corresponds to that node you can annotate it with #XmlTransient to make it an unmapped property.
All what you need it's mark field as #XmlTransient (#XmlTransient annotation which should hide fields that are not required). Example below
JavaEE:
#XmlAccessorType(XmlAccessType.FIELD)
#XmlRootElement(name = "DeletedIds")
public class DeletedIds {
#XmlElement(name = "DeletedId")
private List<DeletedId> id;
#XmlTransient
#XmlElement(name = "success")
private String success;
//getters&setters
}
#XmlAccessorType(XmlAccessType.FIELD)
public class DeletedId {
private int id;
//getters&setters
}
Xml:
<DeletedIds>
<DeletedId>
<id>1</id>
</DeletedId>
<DeletedId>
<id>2</id>
</DeletedId>
</DeletedIds>
You have to use a SAX parser, and a document handler that effectively "skips" the node of non-interest. You can't get around reading the bytes, but at least you can get around having them waste extra resources.
If your code requires a DOM tree, then you basically use a SAX document handler that generates DOM nodes, but "skips" the nodes of non-interest. Definitely less convenient that using the supplied DOM tree generators, but a decent trade-off is you can't live with the extra memory overhead of the unwanted nodes but you need the DOM tree.

Categories