Im parsing XML documents with java. Every document has root tag (it is a string) and a number of tags with text(unknown number) for example(check code in codebox). <AnyStrYouwant> tags have a string of characters in its body.
<anyRoot>
<AnyStrYouwant1>anyTextYouWant1</AnyStrYouwant1>
<AnyStrYouwant2>anyTextYouWant2</AnyStrYouwant2>
...
</anyRoot>
How programically(in java) chek if some file suits this structure. I can parse XML, I know that there is DTD(for example) that can check XML file with known format (tag names and content). What shall I use in this case?
PS: some people advice me to use XSD. But if I want to validate elements I need to know root element name. I dont know root element name (every file has own root element).
I cant comment with my new account but yes you can use DTD, Schematron
Schematron is much more flexible and it is industry standart where DTD is really a legacy technology but still widely used. DTD will check for allowed tags (in short) where Schematron is able to check the structure of the file for example that some special tags should be in first 10 lines of XML etc.
I would use DTD if you are only checking for existing tags and attributes allowed values.
If you do something more complex I would recommend using Schematron with its rules based validation.
You can use DTD or XSD to validate XML, take a look at :
http://www.w3schools.com/xml/xml_dtd.asp
http://www.journaldev.com/895/how-to-validate-xml-against-xsd-in-java
XSD is the advanced technique to validate XML, it's more flexible than DTD but you can use one of those technologies to solve your problem.
You can check XML with XSD using this sample code.
import javax.xml.transform.sax.SAXSource;
import javax.xml.transform.stream.StreamSource;
import javax.xml.validation.SchemaFactory;
import org.xml.sax.InputSource;
public boolean isValidXML(InputStream is) {
InputSource isrc;
try {
isrc = new InputSource(new FileInputStream("path/your-xsd-file.xsd")));
SAXSource sourceXSD = new SAXSource(isrc);
SchemaFactory
.newInstance("http://www.w3.org/2001/XMLSchema")
.newSchema(sourceXSD).newValidator()
.validate(new StreamSource(is));
} catch (Exception e) {
return false;
}
return true;
}
Related
How can I get all XML branches using Java.
For example if i have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<addresses xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='test.xsd'>
<address>
<name>Joe Tester</name>
<street>Baker street 5</street>
</address>
<person>
<name>Joe Tester</name>
<age>44</age>
</person>
</addresses>
I want to obtain the following branches:
addresses
addresses_address
addresses_address_name
addresses_address_street
addresses_person
addresses_person_name
addresses_person_age
Thanks.
You can get XML root, its' node and sub node names easily using any template engine. i.e Velocity, FreeMarker and other, FreeMarker have powerful new facilities for XML processing. You can drop XML documents into the data model, and templates can pull data from them in a variety of ways, such as with XPath expressions. FreeMarker, as an XML transformation tool with the much better-known XSLT stylesheet approach promulgated by the Worldwide Web Consortium (W3C).
FrerMarker support XPath to using jaxen,XPath expression needs Jaxen. downlaod
FreeMarker will use Xalan, unless you choose Jaxen by calling freemarker.ext.dom.NodeModel.useJaxenXPathSupport() from Java.
Just you need One Template, that will generate all XML branches according to input XML. really Put any XML on run-time to data model freemarker will process the template and generate XML branches corresponding to that XML structure. If your XML structure will change then no need of to change your Java code. Even if you want to change the output then changes will comes in template file hence no need recompilation Java code.
Just change in template, get get changes on the fly.
FTL File [One template for multiple XML document for creating xml branch names]
<#list doc ['/*' ] as rootNode>
<#assign rootNodeValue="${rootNode?node_name}">
${rootNodeValue}
<#list doc ['/*/*' ] as childNodes>
<#if childNodes?is_node==true>
${rootNodeValue}-${childNodes?node_name}
<#list doc ['/*/${childNodes?node_name}/*' ] as subNodes>
${rootNodeValue}-${childNodes?node_name}-${subNodes?node_name}
</#list>
</#if>
</#list>
</#list>
XMLTest.Java for process template
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import freemarker.ext.dom.NodeModel;
import freemarker.template.Configuration;
import freemarker.template.DefaultObjectWrapper;
import freemarker.template.ObjectWrapper;
import freemarker.template.Template;
import freemarker.template.TemplateException;
public class XMLTest {
public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TemplateException {
Configuration config = new Configuration();
config.setClassForTemplateLoading(XMLTest.class, "");
config.setObjectWrapper(new DefaultObjectWrapper());
config.setObjectWrapper(ObjectWrapper.BEANS_WRAPPER);
Map<String, Object> dataModel = new HashMap<String, Object>();
//load xml
InputStream stream = XMLTest.class.getClassLoader().getResourceAsStream(xml_path);
// if you xml sting then then pass it from InputSource constructor, no need of load xml from dir
InputSource source = new InputSource(stream);
NodeModel xmlNodeModel = NodeModel.parse(source);
dataModel.put("doc", xmlNodeModel);
Template template = config.getTemplate("test.ftl");
StringWriter out = new StringWriter();
template.process(dataModel, out);
System.out.println(out.getBuffer().toString());
}
}
Final OutPut
addresses
addresses-address
addresses-address-name
addresses-address-street
addresses-person
addresses-person-name
addresses-person-age
See doc for 1.XML Node Model 2.XML Node MOdel
Download FreeMarker from here
Downlaod Jaxen from here
There are many ways that you can extract data from XML and use it in Java. The one you choose will depend on how you want to use the data.
Some scenarios are:
You might want to manipulate nodes, order, remove and add others and transform the XML.
You might just want to read (and possibly change) the text contained in elements and attributes.
You might have a very large file and you just want to find some particular data and ignore the rest of the file.
For scenario #3, the best option is some memory-efficient stream-based parser, such as SAX or XML reader with the StAX API.
You can also use that for scenario #2, if you do mostly reading (and not writing), but DOM-based APIs might be easier to work with. You can use the standard DOM org.w3c.dom API or a more Java-like API such as JDOM or DOM4J. If you wish to synchronize XML files with Java objects you also might want to use a full Java-XML mapping framework such as JAXB.
DOM APIs are also great for scenario #1, but in many cases it might be simpler to use XSLT (via the javax.xml.transform TrAX API in Java). If you use DOM you can also use XPath to select the nodes.
I will show you an example on how to extract the individual nodes of your file using the standard DOM API (org.w3c.dom) and also using XPath (javax.xml.xpath).
1. Setup
Initialize the parser:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Parse file into a Document Object Model:
Document source = builder.parse(new File("src/main/resources/addresses.xml"));
2. Selecting nodes with J2SE DOM
You get the root element using getDocumentElement():
Element addresses = source.getDocumentElement();
From there you can get the child nodes using getChildNodes() but that will return all child nodes, which includes text nodes (the whitespace between elements). addresses.getChildNodes().item(0) returns the whitespace after the <addresses> tag and before the <address> tag. To get the element you would have to go for the second item. An easier way to do that is use getElementsByTagName, which returns a node-set and get the first item:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Many of the DOM methods return org.w3c.dom.Node objects, which you have to cast. Sometimes they might not be Element objects so you have to check. Node sets are not automatically converted into arrays. They are org.w3c.dom.NodeList so you have to use .item(0) and not [0] (if you use other DOM APIs such as JDOM or DOM4J, it will seem more intuitive).
You could use addresses.getElementsByTagName to get all the elements you need, but you would have to deal with the context for the two <name> elements. So a better way is to call it in the appropriate context:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Element addresses_address_name = (Element)addresses_address.getElementsByTagName("name").item(0);
Element addresses_address_street = (Element)addresses_address.getElementsByTagName("street").item(0);
Element addresses_person = (Element)addresses.getElementsByTagName("person").item(0);
Element addresses_person_name = (Element)addresses_person.getElementsByTagName("name").item(0);
Element addresses_person_age = (Element)addresses_person.getElementsByTagName("age").item(0);
That will give you all the Element nodes (or branches as you called them) for your file. If you want the text nodes (as actual Node objects) you need to get it as the first child:
Node textNode = addresses2_address_street.getFirstChild();
And if you want the String contents you can use:
String street = addresses2_address_street.getTextContent();
3. Selecting nodes with XPath
Another way to select nodes is using XPath. You will need the DOM source and you also need to initialize the XPath processor:
XPath xPath = XPathFactory.newInstance().newXPath();
You can extract the root node like this:
Element addresses = (Element)xPath.evaluate("/addresses", source, XPathConstants.NODE);
And all the other nodes using a path-like syntax:
Element addresses_address = (Element)xPath.evaluate("/addresses/address", source, XPathConstants.NODE);
Element addresses_address_name = (Element)xPath.evaluate("/addresses/address/name", source, XPathConstants.NODE);
Element addresses_address_street = (Element)xPath.evaluate("/addresses/address/street", source, XPathConstants.NODE);
You can also use relative paths, choosing a different element as the root:
Element addresses_person = (Element)xPath.evaluate("person", addresses, XPathConstants.NODE);
Element addresses_person_name = (Element)xPath.evaluate("person/name", addresses, XPathConstants.NODE);
Element addresses_person_age = (Element)xPath.evaluate("age", addresses_person, XPathConstants.NODE);
You can get the text contents as before, since you have Element objects:
String addressName = addresses_address_name.getTextContent();
But you can also do it directly using the same methods above without the last argument (which defaults to string). Here I'm using different relative and absolute XPath expressions:
String addressName = xPath.evaluate("name", addresses_address);
String addressStreet = xPath.evaluate("address/street", addresses);
String personName = xPath.evaluate("name", addresses_person);
String personAge = xPath.evaluate("/addresses/person/age", source);
Hi Can anyone help me in html to xml conversion using xslt in java.I converted xml to html using xslt in java.This is the code i used for that converstion:
import javax.xml.transform.*;
import java.net.*;
import java.io.*;
public class HowToXSLT {
public static void main(String[] args) {
try {
TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer =
tFactory.newTransformer
(new javax.xml.transform.stream.StreamSource
("howto.xsl"));
transformer.transform
(new javax.xml.transform.stream.StreamSource
("howto.xml"),
new javax.xml.transform.stream.StreamResult
( new FileOutputStream("howto.html")));
}
catch (Exception e) {
e.printStackTrace( );
}
}
}
But i dont know the reverse process of this program that is to convert html to xml? Is there is any jar files available to do that? please help me...
Generally, it isn't possible to "reverse" a transformation, because a transformation in the general case isn't a 1:1 mapping.
For example, if the transformation does this:
<xsl:value-of select= "/x * /x"/>
and we get as result: 16
(and we know that the source XML document had only one element),
it isn't possible to determine from the value 16 whether the source XML document was:
<x>4</x>
or whether it was:
<x>-4</x>
And the above was only a simple example! :)
This will depend on what you wish to do exactly.
Apparently, howto.xsl contains the rules to be applied on the xml to get the html.
You will have to write another xsl file to do the reverse.
I believe it is not possible. XLST input must be XML conforming and HTML is not conforming to XML (unless you talk about XHTML).
May be you need to first make your html xhtml complaint, then use a xsl (reverse of the original xsl)which has instruction to convert the xhtml file to xml.
Its not possible, you can use Microsoft.XMLDOM for converting from HTML to XML.
I am currently writing a tool, using Java 1.6, that brings together a number of XML files. All of the files validate to the DocBook 4.5 DTD (I have checked this using xmllint and specifying the DocBook 4.5 DTD as the --dtdvalid parameter), but not all of them include the DOCTYPE declaration.
I load each XML file into the DOM to perform the required manipulation like so:
private Document fileToDocument( File input ) throws ParserConfigurationException, IOException, SAXException {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
factory.setIgnoringElementContentWhitespace(false);
factory.setIgnoringComments(false);
factory.setValidating(false);
factory.setExpandEntityReferences(false);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse( input );
}
For the most part this has worked quite well, I can use he returned object to navigate the tree and perform the required manipulations and then write the document back out. Where I am encountering problems is with files which:
Do not include the DOCTYPE declaration, and
Do include entities defined in the DTD (for example — / —).
Where this is the case an exception is thrown from the builder.parse(...) call with the message:
[Fatal Error] :5:15: The entity "mdash" was referenced, but not declared.
Fair enough, it isn't declared. What I would ideally do in this instance is set the DocumentBuilderFactory to always use the DocBook 4.5 DTD regardless of whether one is specified in the file.
I did try validation using the DocBook 4.5 schema but found that this produced a number of unrelated errors with the XML. It seems like the schema might not be functionally equivalent to the DTD, at least for this version of the DocBook specification.
The other option I can think of is to read the file in, try and detect whether a doctype was set or not, and then set one if none was found prior to actually parsing the XML into the DOM.
So, my question is, is there a smarter way that I have not seen to tell the parser to use a specific DTD or ensure that parsing proceeds despite the entities not resolving (not just the &emdash; example but any entities in the XML - there are a large number of potentials)?
Could using an EntityResolver2 and implementing EntityResolver2.getExternalSubset() help?
... This method can also be used with documents that have no DOCTYPE declaration. When the root element is encountered, but no DOCTYPE declaration has been seen, this method is invoked. If it returns a value for the external subset, that root element is declared to be the root element, giving the effect of splicing a DOCTYPE declaration at the end the prolog of a document that could not otherwise be valid. ...
I'm writing the xsd and the code to validate, so I have great control here.
I would like to have an upload facility that adds stuff to my application based on an xml file. One part of the xml file should be validated against different schemas based on one of the values in the other part of it. Here's an example to illustrate:
<foo>
<name>Harold</name>
<bar>Alpha</bar>
<baz>Mercury</baz>
<!-- ... more general info that applies to all foos ... -->
<bar-config>
<!-- the content here is specific to the bar named "Alpha" -->
</bar-config>
<baz-config>
<!-- the content here is specific to the baz named "Mercury" -->
</baz>
</foo>
In this case, there is some controlled vocabulary for the content of <bar>, and I can handle that part just fine. Then, based on the bar value, the appropriate xml schema should be used to validate the content of bar-config. Similarly for baz and baz-config.
The code doing the parsing/validation is written in Java. Not sure how language-dependent the solution will be.
Ideally, the solution would permit the xml author to declare the appropriate schema locations and what-not so that s/he could get the xml validated on the fly in a sufficiently smart editor.
Also, the possible values for <bar> and <baz> are orthogonal, so I don't want to do this by extension for every possible bar/baz combo. What I mean is, if there are 24 possible bar values/schemas and 8 possible baz values/schemas, I want to be able to write 1 + 24 + 8 = 33 total schemas, instead of 1 * 24 * 8 = 192 total schemas.
Also, I'd prefer to NOT break out the bar-config and baz-config into separate xml files if possible. I realize that might make all the problems much easier, as each xml file would have a single schema, but I'm trying to see if there is a good single-xml-file solution.
I finally figured this out.
First of all, in the foo schema, the bar-config and baz-config elements have a type which includes an any element, like this:
<sequence>
<any minOccurs="0" maxOccurs="1"
processContents="lax" namespace="##any" />
</sequence>
In the xml, then, you must specify the proper namespace using the xmlns attribute on the child element of bar-config or baz-config, like this:
<bar-config>
<config xmlns="http://www.example.org/bar/Alpha">
... config xml here ...
</config>
</bar-config>
Then, your XML schema file for bar Alpha will have a target namespace of http://www.example.org/bar/Alpha and will define the root element config.
If your XML file has namespace declarations and schema locations for both of the schema files, this is sufficient for the editor to do all of the validating (at least good enough for Eclipse).
So far, we have satisfied the requirement that the xml author may write the xml in such a way that it is validated in the editor.
Now, we need the consumer to be able to validate. In my case, I'm using Java.
If by some chance, you know the schema files that you will need to use to validate ahead of time, then you simply create a single Schema object and validate as usual, like this:
Schema schema = factory().newSchema(new Source[] {
new StreamSource(stream("foo.xsd")),
new StreamSource(stream("Alpha.xsd")),
new StreamSource(stream("Mercury.xsd")),
});
In this case, however, we don't know which xsd files to use until we have parsed the main document. So, the general procedure is to:
Validate the xml using only the main (foo) schema
Determine the schema to use to validate the portion of the document
Find the node that is the root of the portion to validate using a separate schema
Import that node into a brand new document
Validate the brand new document using the other schema file
Caveat: it appears that the document must be built namespace-aware in order for this to work.
Here's some code (this was ripped from various places of my code, so there might be some errors introduced by the copy-and-paste):
// Contains the filename of the xml file
String filename;
// Load the xml data using a namespace-aware builder (the method
// 'stream' simply opens an input stream on a file)
Document document;
DocumentBuilderFactory docBuilderFactory =
DocumentBuilderFactory.newInstance();
docBuilderFactory.setNamespaceAware(true);
document = docBuilderFactory.newDocumentBuilder().parse(stream(filename));
// Create the schema factory
SchemaFactory sFactory = SchemaFactory.newInstance(
XMLConstants.W3C_XML_SCHEMA_NS_URI);
// Load the main schema
Schema schema = sFactory.newSchema(
new StreamSource(stream("foo.xsd")));
// Validate using main schema
schema.newValidator().validate(new DOMSource(document));
// Get the node that is the root for the portion you want to validate
// using another schema
Node node= getSpecialNode(document);
// Build a Document from that node
Document subDocument = docBuilderFactory.newDocumentBuilder().newDocument();
subDocument.appendChild(subDocument.importNode(node, true));
// Determine the schema to use using your own logic
Schema subSchema = parseAndDetermineSchema(document);
// Validate using other schema
subSchema.newValidator().validate(new DOMSource(subDocument));
Take a look at NVDL (Namespace-based Validation Dispatching Language) - http://www.nvdl.org/
It is designed to do what you want to do (validate parts of an XML document that have their own namespaces and schemas).
There is a tutorial here - http://www.dpawson.co.uk/nvdl/ - and a Java implementation here - http://jnvdl.sourceforge.net/
Hope that helps!
Kevin
You need to define a target namespace for each separately-validated portions of the instance document. Then you define a master schema that uses <xsd:include> to reference the schema documents for these components.
The limitation with this approach is that you can't let the individual components define the schemas that should be used to validate them. But it's a bad idea in general to let a document tell you how to validate it (ie, validation should something that your application controls).
You can also use a "resource resolver" to allow "xml authors" to specify their own schema file, at least to some extent, ex: https://stackoverflow.com/a/41225329/32453 at the end of the day, you want a fully compliant xml file that can be validatable with normal tools, anyway :)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Currently our Java application uses the values held within a tab delimited *.cfg file. We need to change this application so that it now uses an XML file.
What is the best/simplest library to use in order to read in values from this file?
There are of course a lot of good solutions based on what you need. If it is just configuration, you should have a look at Jakarta commons-configuration and commons-digester.
You could always use the standard JDK method of getting a document :
import java.io.File;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
[...]
File file = new File("some/path");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document document = db.parse(file);
XML Code:
<?xml version="1.0"?>
<company>
<staff id="1001">
<firstname>yong</firstname>
<lastname>mook kim</lastname>
<nickname>mkyong</nickname>
<salary>100000</salary>
</staff>
<staff id="2001">
<firstname>low</firstname>
<lastname>yin fong</lastname>
<nickname>fong fong</nickname>
<salary>200000</salary>
</staff>
</company>
Java Code:
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
public class ReadXMLFile {
public static void main(String argv[]) {
try {
File fXmlFile = new File("/Users/mkyong/staff.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(fXmlFile);
doc.getDocumentElement().normalize();
System.out.println("Root element :" + doc.getDocumentElement().getNodeName());
NodeList nList = doc.getElementsByTagName("staff");
System.out.println("----------------------------");
for (int temp = 0; temp < nList.getLength(); temp++) {
Node nNode = nList.item(temp);
System.out.println("\nCurrent Element :" + nNode.getNodeName());
if (nNode.getNodeType() == Node.ELEMENT_NODE) {
Element eElement = (Element) nNode;
System.out.println("Staff id : "
+ eElement.getAttribute("id"));
System.out.println("First Name : "
+ eElement.getElementsByTagName("firstname")
.item(0).getTextContent());
System.out.println("Last Name : "
+ eElement.getElementsByTagName("lastname")
.item(0).getTextContent());
System.out.println("Nick Name : "
+ eElement.getElementsByTagName("nickname")
.item(0).getTextContent());
System.out.println("Salary : "
+ eElement.getElementsByTagName("salary")
.item(0).getTextContent());
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
Output:
----------------
Root element :company
----------------------------
Current Element :staff
Staff id : 1001
First Name : yong
Last Name : mook kim
Nick Name : mkyong
Salary : 100000
Current Element :staff
Staff id : 2001
First Name : low
Last Name : yin fong
Nick Name : fong fong
Salary : 200000
I recommended you reading this: Normalization in DOM parsing with java - how does it work?
Example source.
What is the best/simplest library to
use in order to read in values from
this file?
As you're asking for the simplest library, I feel obliged to add an approach quite different to that in Guillaume's top-voted answer. (Of the other answers, sjbotha's JDOM mention is closest to what I suggest).
I've come to think that for XML handling in Java, using the standard JDK tools is certainly not the simplest way, and that only in some circumstances (such as not being able to use 3rd party libraries, for some reason) it is the best way.
Instead, consider using a good XML library, such as XOM. Here's how to read an XML file into a nu.xom.Document object:
import nu.xom.Builder;
import nu.xom.Document;
import java.io.File;
[...]
File file = new File("some/path");
Document document = new Builder().build(file);
So, this was just a little bit simpler, as reading the file into org.w3c.dom.Document wasn't very complicated either, in the "pure JDK" approach. But the advantages of using a good library only start here! Whatever you're doing with your XML, you'll often get away with much simpler solutions, and less of your own code to maintain, when using a library like XOM. As examples, consider this vs. this, or this vs. this, or this post containing both XOM and W3C DOM examples.
Others will provide counter-arguments (like these) for why sticking to Java's standard XML APIs may be worth it - these probably have merit, at least in some cases, although personally I don't subscribe to all of them. In any case, when choosing one way or the other, it's good to be aware of both sides of the story.
(This answer is part of my evaluation of XOM, which is a strong contender in my quest for finding the best Java XML library to replace dom4j.)
Is there a particular reason you have chosen XML config files? I have done XML configs in the past, and they have often turned out to be more of a headache than anything else.
I guess the real question is whether using something like the Preferences API might work better in your situation.
Reasons to use the Preferences API over a roll-your-own XML solution:
Avoids typical XML ugliness (DocumentFactory, etc), along with avoiding 3rd party libraries to provide the XML backend
Built in support for default values (no special handling required for missing/corrupt/invalid entries)
No need to sanitize values for XML storage (CDATA wrapping, etc)
Guaranteed status of the backing store (no need to constantly write XML out to disk)
Backing store is configurable (file on disk, LDAP, etc.)
Multi-threaded access to all preferences for free
JAXB is simple to use and is included in Java 6 SE. With JAXB, or other XML data binding such as Simple, you don't have to handle the XML yourself, most of the work is done by the library. The basic usage is to add annotation to your existing POJO. These annotation are then used to generate an XML Schema for you data and also when reading/writing your data from/to a file.
I've only used jdom. It's pretty easy.
Go here for documentation and to download it: http://www.jdom.org/
If you have a very very large document then it's better not to read it all into memory, but use a SAX parser which calls your methods as it hits certain tags and attributes. You have to then create a state machine to deal with the incoming calls.
Look into JAXB.
The simplest by far will be Simple http://simple.sourceforge.net, you only need to annotate a single object like so
#Root
public class Entry {
#Attribute
private String a
#Attribute
private int b;
#Element
private Date c;
public String getSomething() {
return a;
}
}
#Root
public class Configuration {
#ElementList(inline=true)
private List<Entry> entries;
public List<Entry> getEntries() {
return entries;
}
}
Then all you have to do to read the whole file is specify the location and it will parse and populate the annotated POJO's. This will do all the type conversions and validation. You can also annotate for persister callbacks if required. Reading it can be done like so.
Serializer serializer = new Persister();
Configuration configuraiton = serializer.read(Configuration.class, fileLocation);
Depending on your application and the scope of the cfg file, a properties file might be the easiest. Sure it isn't as elegant as xml but it certainly easier.
Use java.beans.XMLDecoder, part of core Java SE since 1.4.
XMLDecoder input = new XMLDecoder(new FileInputStream("some/path.xml"));
MyConfig config = (MyConfig) input.readObject();
input.close();
It's easy to write the configuration files by hand, or use the corresponding XMLEncoder with some setup to write new objects at run-time.
This is what I use. http://marketmovers.blogspot.com/2014/02/the-easy-way-to-read-xml-in-java.html It sits on top of the standard JDK tools, so if it's missing some feature you can always use the JDK version.
This really makes things easier for me. It's especially nice when I'm reading a config file that was saved by and older version of the software, or was manually edited by a user. It's very robust and won't throw an exception if some data is not exactly in the format you expect.
Here's a really simple API that I created for reading simple XML files in Java. It's incredibly simple and easy to use. Hope it's useful for you.
http://argonrain.wordpress.com/2009/10/27/000/