I have some very simple XML that I wish to unmarshall. I'm only interested in the values for one of the elements which repeats. Here's some sample XML:
<Document>
<HeaderGuff>Whatever</HeaderGuff>
<Foos>
<FooId>1</FooId>
<FooId>2</FooId>
</Foos>
</Document>
I would like to use JAXB to allow me to iterate of the FooId's as a Long.
The usual examples require creating a data class with setFooId and getFooId methods. Is there a way to unmarshall directly to Long such that I can do this:
for ( Long fooId : <something JAXB> )
I do not want to load all the identifiers into memory at once as there are potentially many of them, and they are only needed one at a time for individual processing.
Since you are only interested in one of the elements and don't want to bring all the data into memory at once I would use a StAX parser (included in the JDK/JRE since Java SE 6) instead of JAXB for this use case.
You would then advance your XMLStreamReader to the FooId element, process it and then advance it to the next element.
import javax.xml.stream.*;
import javax.xml.transform.stream.StreamSource;
public class Demo {
public static void main(String[] args) throws Exception {
XMLInputFactory xif = XMLInputFactory.newFactory();
StreamSource source = new StreamSource("input.xml");
XMLStreamReader xsr = xif.createXMLStreamReader(source);
while(xsr.hasNext()) {
if(xsr.isStartElement() && "FooId".equals(xsr.getLocalName())) {
long value = Long.valueOf(xsr.getElementText());
System.out.println(value);
}
xsr.next();
}
xsr.close();
}
}
Related
I am generating a custom XML export in DSpace 5.2. The item that is to be exported as an XML file has an array of metadata values. The values must appear in the XML file as the given XSD file defines their hierarchy. I add the values based on the XSD order into the XML, but some XML tags are in an order different from the insertion order.
More details
The approach I am using is, at first, move the array of metadata values into a map. The keys of the map are the metadata field names. Then, based on the XSD, I get an appropriate value from the map and generate an XML element like this:
import org.dspace.content.Metadatum;
import org.w3c.dom.Element;
import org.w3c.dom.Document;
public class DSpaceXML implements Serializable {
// A member variable
private Document doc;
// A DSpace built-in function used to export an item to an XML
public final void addItem(Item item) throws Exception {
// Initialize this.doc
Element rootElement = doc.createElement("root");
Element higherElement = doc.createElement("higher-element");
Element lowerElement = doc.createElement("lower-element");
insertMetadataAsChildOfElement(higherElement, "child-of-higher", "dc.childOfHigher");
rootElement.appendChild(higherElement);
insertMetadataAsChildOfElement(lowerElement, "child-of-lower", "dc.childOfLower");
rootElement.appendChild(lowerElement);
// stuff to generate other elements of the XML
}
private void insertMetadataAsChildOfElement(Element parentElement, String childElementName,
String key) {
Element childElement;
<Metadatum> metadatumList = (<Metadatum>) metadataMap.get(key);
childElement = createElement(childElementName, metadatum.value);
parentElement.appendChild(childElement);
}
private Element createElement(String name, String value) {
Element el = doc.createElement(name);
el.appendChild(doc.createTextNode(value));
return el;
}
}
I expect an XML like this:
<root>
<higher-element>
<child-of-higher>Value1</child-of-higher>
</higher-element>
<lower-element>
<child-of-lower>Value2</child-of-lower>
</lower-element>
<another-element-1/>
....
<another-element-n/>
</root>
What I get is like this (<lower-element> is before <higher-element>):
<root>
<lower-element>
<child-of-lower>Value2</child-of-lower>
</lower-element>
<another-element-1/>
....
<another-element-k/>
<higher-element>
<child-of-higher>Value1</child-of-higher>
</higher-element>
<another-element-k-plus-1/>
....
<another-element-n/>
</root>
I cannot figure out why this happens while rootElement.appendChild(higherElement) is called before rootElement.appendChild(lowerElement). Also, I would appreciate if someone let me know if my approach is the best one for generating an XML from an XSD.
I figured out that I had a bug in my code. Due to checking a lot of metadata values, many lines after the line rootElement.appendChild(lowerElement), I had a line rootElement.appendChild(higherElement), so it overrode the former hierarchy of XML elements. As a result <higher-element> appeared after <lower-element>. But about the second part of my question, I will be happy if someone would tell me about the best practices of generating an XML based on an XSD regarding the limitations of DSpace 5.
I am trying to parse xml using SAX. I want all the tags and their values from xml in nested way. Is it possible with SAX parser. Can anyone provide me an example. (I think SAX is efficient than w3 document builder, So I chose it. And I want to know weather I'm on the right path)
I'm attaching my java program
class MySAXApp extends DefaultHandler
{
public MySAXApp ()
{
super();
}
public void startDocument ()
{
System.out.println("Start document");
}
public void endDocument ()
{
System.out.println("End document");
}
public void startElement (String uri, String name,
String qName, Attributes atts)
{
System.out.println(atts.getLength());
if ("".equals (uri))
System.out.println("Start element: " + qName);
else
System.out.println("Start element: {" + uri + "}" + name);
}
}
Here is my XML.
Is this a valid xml? Are there any errors in writing xml like this
<?xml version="1.0" encoding="utf-8"?>
<CustomerReport xsi:schemaLocation="Customer.xsd">
<Customer>
<CustomerName>str1234</CustomerName>
<CustomerStatus>str1234</CustomerStatus>
<PurchaceOrders>
<PurchaceOrder>
<PurchaceOrderName>str1234</PurchaceOrderName>
</PurchaceOrder>
</PurchaceOrders>
</Customer>
</CustomerReport>
I'm new to XML. Can someone help me on this
When you say SAX is "more efficient", what you actually mean is that a SAX parser does the minimum amount of work, leaving most of the work to the application. That means you (the application writer) have more code to write, and it's quite tricky code as you are discovering. Because the people who write XML parsers are much more experienced Java coders than you are, it's likely that the more work you do in your code, and the less you do within the parser, the less efficient your overall application will be. So given your level of experience, my advice would be to use a parsing approach where the library does as much as possible of the work. I would suggest using JDOM2.
The only attribute you have in the XML you posted is for the attribute with the xsi prefix. For the rest the attribute length should be 0.
Attributes are key-value pairs inside a tag. Most of your xml content is inside of elements.
The efficiency advantage of SAX (or STAX) over something like JDOM is due to the sax parser not maintaining all the data it reads in memory. If you use the contentHandler to retrieve data and save it as it gets read then your program doesn't have to consume that much memory.
Read this tutorial or this Javaworld article. You need to implement a characters method in order to get any element text. Both linked articles have good examples of how to implement your characters method so that you can retrieve element text.
There are a lot of bad examples for this that you are likely to find if you google around (bad example) or search on stackoverflow (bad example here), but the example implementations in the linked articles are correct, because they buffer the output from the characters method until all characters have been found:
Parsers are not required to return any particular number of characters at one time. A parser can return anything from a single character at a time up to several thousand and still be a standard-conforming implementation. So if your application needs to process the characters it sees, it is wise to have the characters() method accumulate the characters in a java.lang.StringBuffer and operate on them only when you are sure that all of them have been found.
Here is the ContentHandler from the JavaWorld article's hello world example changed to use your xml:
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.*;
public class Example2 extends DefaultHandler {
// Local variables to store data
// found in the XML document
public String name = "";
public String status = "";
public String orderName = ""
// Buffer for collecting data from // the "characters" SAX event.
private CharArrayWriter contents = new CharArrayWriter();
// Override methods of the DefaultHandler class
// to gain notification of SAX Events.
//
// See org.xml.sax.ContentHandler for all available events.
//
public void startElement( String namespaceURI,
String localName,
String qName,
Attributes attr ) throws SAXException {
contents.reset();
}
public void endElement( String namespaceURI,
String localName,
String qName ) throws SAXException {
if ( localName.equals( "CustomerName" ) ) {
name = contents.toString();
}
if ( localName.equals( "CustomerStatus" ) ) {
status = contents.toString();
}
if (localName.equals("PurchaceOrderName")) {
orderName = contents.toString();
}
}
public void characters( char[] ch, int start, int length )
throws SAXException {
contents.write( ch, start, length );
}
}
I have the following sample XML file:
<a xmlns="http://www.foo.com">
<b>
</b>
</a>
Using the XPath expression /foo:a/foo:b (with 'foo' properly configured in the NamespaceContext) I can correctly count the number of b nodes and the code works both when Saxon-HE-9.4.jar is on the CLASSPATH and when it's not.
When, however, I parse the same file with a namespace-unaware DocumentBuilderFactory, the XPath expression "/a/b" correctly counts the number of b nodes only when Saxon-HE-9.4.jar is not on the CLASSPATH.
Code below:
import java.io.*;
import java.util.*;
import javax.xml.xpath.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import javax.xml.namespace.NamespaceContext;
public class FooMain {
public static void main(String args[]) throws Exception {
String xmlSample = "<a xmlns=\"http://www.foo.com\"><b></b></a>";
{
XPath xpath = namespaceUnawareXpath();
System.out.printf("[NS-unaware] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/a/b").evaluate(stringToXML(xmlSample, false),
XPathConstants.NODESET)).getLength());
}
{
XPath xpath = namespaceAwareXpath("foo", "http://www.foo.com");
System.out.printf("[NS-aware ] Number of 'b' nodes is: %d\n",
((NodeList) xpath.compile("/foo:a/foo:b").evaluate(stringToXML(xmlSample, true),
XPathConstants.NODESET)).getLength());
}
}
public static XPath namespaceUnawareXpath() {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
return xpath;
}
public static XPath namespaceAwareXpath(final String prefix, final String nsURI) {
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
NamespaceContext ctx = new NamespaceContext() {
#Override
public String getNamespaceURI(String aPrefix) {
if (aPrefix.equals(prefix))
return nsURI;
else
return null;
}
#Override
public Iterator getPrefixes(String val) {
throw new UnsupportedOperationException();
}
#Override
public String getPrefix(String uri) {
throw new UnsupportedOperationException();
}
};
xpath.setNamespaceContext(ctx);
return xpath;
}
private static Document stringToXML(String s, boolean nsAware) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(nsAware);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new ByteArrayInputStream(s.getBytes("UTF-8")));
}
}
Running the above with:
java -classpath dist/foo.jar FooMain
.. produces:
[NS-unaware] Number of 'b' nodes is: 1
[NS-aware ] Number of 'b' nodes is: 1
Running with:
java -classpath Saxon-HE-9.4.jar:dist/foo.jar FooMain
... produces:
[NS-unaware] Number of 'b' nodes is: 0
[NS-aware ] Number of 'b' nodes is: 1
Correct observation. Saxon doesn't work with a namespace-unaware DOM. There's no reason why it should. If you can find an XSLT/XPath processor that works with a namespace-unaware DOM, then go ahead and use it if you want, but its behaviour isn't defined by any standard.
If it were possible for Saxon to detect that the DOM is namespace-unaware, then it would throw an error rather than giving spurious results. Sadly, one of DOM's many design failings is that if you didn't create the DOM yourself, you can't tell whether it's namespace-aware or not.
Your comment "I need to be lenient on namespaces since I have to handle 3rd-party XML instances that are not always XSD valid." is a complete non-sequitur. It's true that a document can't be XSD-valid unless it is namespace-valid, but the converse is not true; loads of documents are namespace-valid without being XSD-valid.
Finally, as your experience shows, relying on the JAXP mechanism to load whatever XPath processor happens to be lying around on the classpath is very error-prone. You can't even control whether you get an XPath 1.0 or 2.0 processor by this mechanism (and again, you can't find out easily which you have got). If your code is dependent on the quirks of a particular XPath implementation then you need to load that implementation explicitly rather than relying on the JAXP search.
UPDATE (Sep 2015): Saxon 9.6 no longer includes the meta-inf services file that advertises it as a JAXP XPath provider. This means you will never pick up Saxon as your XPath processor simply because it is on the classpath: you have to ask for it explicitly.
The XPath language is only defined on namespace-well-formed XML, so the behaviour of different processors on a non-namespace-aware DOM tree (even one like <a><b/></a> that, had it been parsed in a namespace-aware manner, would not actually use any namespaces) is at best implementation-specific and at worst completely undefined.
Saxon 10 now supports XPaths without namespaces, you can configure it like this:
XPath xPath = new net.sf.saxon.xpath.XPathFactoryImpl().newXPath();
((XPathEvaluator)xPath).getStaticContext().setUnprefixedElementMatchingPolicy(UnprefixedElementMatchingPolicy.ANY_NAMESPACE);
Please suggest me how to do the following utility in java,
Want to create the generic java class to generate the XML with JDOM.
Class is capable enough to generate the any xml structure in runtime depends on parameter pass -- How?
For example, In my module I need to create the XML which having 3 different child with one root i.e.
<Child>
<A> This is normal text </A>
<B> This is normal text </B>
<C> This is normal text </C>
</Child>
But in another module we required another XML file which having the 10 child with some attribute.
So we decided to go for generic XML utility which generate the XML file in runtime in specific folder.
Utility will help us to avoid the redundant code in the application and easy to manage as well...
Please help your friend...
Thanks
Gladiator
You can do via XStream like this:
public static String getXMLFromObject(Object toBeConverted, String classNameAlias, Map<String, String> fieldAlias,
List<String> fieldsToBeOmitted) {
StringBuilder objectAsXML = new StringBuilder();
if(toBeConverted != null){
XStream xStream = new XStream(new DomDriver());
if(classNameAlias != null && classNameAlias != "" && classNameAlias.trim().length() > 0) {
xStream.alias(classNameAlias, toBeConverted.getClass());
}
if(fieldAlias != null && !fieldAlias.isEmpty()){
for (Entry<String, String> entry : fieldAlias.entrySet()) {
xStream.aliasField(entry.getKey(), toBeConverted.getClass(), entry.getValue());
}
}
if(fieldsToBeOmitted != null && fieldsToBeOmitted.size() > 0){
for (String fieldToBeOmitted : fieldsToBeOmitted) {
xStream.omitField(toBeConverted.getClass(), fieldToBeOmitted);
}
}
objectAsXML.append(xStream.toXML(toBeConverted));
}
return objectAsXML.toString();
}
If you have control over the classes which you are going to convert into XML then I would suggest to have an interface something like XMLConvertable with some structure like
public interface XMLConvertable {
public String getClassAlias();
public List<String> getFieldToBeOmitted();
public Map<String, String> getFieldAliases();
}
In that case you don't need to send last three parameters in the above method just get it from the objectToBeConverted and also it makes more sense as every object in the system can declare itself whether it can be converted to XML or not.
I'm using the XMLStreamReader interface from javax.xml to parse an XML file. The file contains huge data amounts and single text nodes of several KB.
The validating and reading generally works very good, but I'm having trouble with text nodes that are larger than 15k characters. The problem occurs in this function
String foo = "";
if (xsr.getEventType() == XMLStreamConstants.CHARACTERS) {
foo = xsr.getText();
xsr.next(); // read next tag
}
return foo;
xsr being the stream reader. The text in the text node is 53'337 characters long in this particular case (but varies), however the xsr.getText() method only returns the first 15'537 of them. Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...
I did not find anything in the documentation or anywhere else about this. Is it intended behavior or can someone confirm/deny it? Am I using it the wrong way somehow?
Thanks
Of course I could loop over the function and concatenate the strings, but somehow I don't think that's the idea...
Actually, that is the idea :)
The parser is permitted to break up the event stream however it wishes, as long as it's consistent with the original document. That means it can, and often will, break up your text data into multiple events. How and when it chooses to do so is an implementation detail internal to the parser, and is essentially unpredictable.
So yes, if you receive multiple sequential CHARACTERS events, you need to append them manually. This is the price you pay for a low-level API.
Another option is the javax.xml.stream.isCoalescing option (documented in XMLStreamReader.next() or Using StAX), which automatically concatenates long text into a single string. The following JUint3 test passes.
Warning: isCoalescing probably shouldn't be used in production because if the document has lots of character references ( ) or entity references (<), it will cause a StackOverflowError!
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import javax.xml.stream.XMLInputFactory;
import javax.xml.stream.XMLStreamException;
import javax.xml.stream.XMLStreamReader;
import junit.framework.TestCase;
public class XmlStreamTest extends TestCase {
public void testLengthInXMlStreamReader() throws XMLStreamException {
StringBuilder b = new StringBuilder();
b.append("<root>");
for (int i = 0; i < 65536; i++)
b.append("hello\n");
b.append("</root>");
InputStream is = new ByteArrayInputStream(b.toString().getBytes());
XMLInputFactory inputFactory = XMLInputFactory.newFactory();
inputFactory.setProperty("javax.xml.stream.isCoalescing", true);
XMLStreamReader reader = inputFactory.createXMLStreamReader(is);
reader.nextTag();
reader.next();
assertEquals(6 * 65536, reader.getTextLength());
}
}