I am writing a Java program that will create an XML file from various pieces of data.
There are attribute strings containing URLs that are reused throughout the XML. I wanted to reuse these attributes (i.e., copy them from one element to another). I currently have something like this:
public class copyAttributes {
public static final String google_url = "http://www.google.com";
DocumentBuilderFactor docFac = DocumentBuilderFactory.newInstance();
DocumentBuilder build = docFactory.newDocumentBuilder();
Document doc = docBuilder.newDocument();
public static Attr googleAttr = doc.createAttribute("ref:GoogleMainSite");
googleAttr.setValue(google_url);
Element rootElement = doc.createElement("Root_Element");
rootElement.setAttributeNode(googleAttr);
...this is fine so far if I don't have any other elements.
Now, I want to have multiple elements containing that same Google URL attribute node. I know that's redundant, but I'm following an XSD that specifically says attributes have to be reused. I know you can't just do this:
Element childElement = doc.createElement("Child_Element");
childElement.setAttributeNode(googleAttr);
rootElement.appendChild(childElement);
...because I know you will get an INUSE_ATTRIBUTE_ERR (I tried, that's how I know). But I want to reuse this attribute, as it will occur many times throughout the XML.
I did find this: sample code for copying attributes from one element to another, but when I included that sample "Utils" class in my package and called it this way:
Utils utils = new Utils();
utils.copyAttributes(rootElement, childElement);
...I receive a "NAMESPACE_ERR: An attempt is made to create or change an object in a way which is incorrect with regard to namespaces."
There isn't much information out there about this "Namespace_err" message.
The other solution I found is to simply clone an element. But this doesn't solve my problem either, since in some cases I won't want to reuse all of the attributes of another element, I will only want to use a couple of them.
Basically my question is: How do you go about reusing attribute nodes on multiple elements in an XML schema created via a Java program?
you can try this:
public void copyAttributes(Element from, Element to)
{
NamedNodeMap attributes = from.getAttributes();
for (int i = 0; i < attributes.getLength(); i++)
{
Attr node = (Attr) attributes.item(i);
to.setAttributeNode((Attr) node.cloneNode(false));
}
}
Related
Application Background:
Basically, I am building an application in which I am parsing the XML document using SAX PARSER for every incoming tag I would like to know its datatype and other information so I am using the XSD associated with that XML file to get the datatype and other information related to those tags. Hence, I am parsing the XSD file and storing all the information in Hashmap so that whenever the tag comes I can pass that XML TAG as key to my Hashmap and obtain the value (information associated with it which is obtained during XSD parsing) associated with it.
Problem I am facing:
As of now, I am able to parse my XSD using the DocumentBuilderFactory. But during the collection of elements, I am able to get only one type of element and store it in my NODELIST such as elements with tag name "xs:element". My XSD also has some other element type such as "xs:complexType", xs:any etc. I would like to read all of them and store them into a single NODELIST which I can later loop and push to HASHMAP. However I am unable to add any additional elements to my NODELIST after adding one type to it:
Below code will add tags with the xs:element
NodeList list = doc.getElementsByTagName("xs:element");
How can I add the tags with xs:complexType and xs:any to the same NODELIST?
Is this a good way to find the datatype and other attributes of the XSD or any other better approach available. As I may need to hit the HASHMAP many times for every TAG in XML will there be a performance issue?
Is DocumentBuilderFactory is a good approach to parse XML or are there any better libaraies for XSD parsing? I looked into Xerces2 but could not find any good example and I got struck and posted the question here.
Following is my code for parsing the XSD using DocumentBuilderFactory:
public class DOMParser {
private static Map<String, Element> xmlTags = new HashMap<String, Element>();
public static void main(String[] args) throws URISyntaxException, SAXException, IOException, ParserConfigurationException {
String xsdPath1 = Paths.get(Xerces2Parser.class.getClassLoader().getResource("test.xsd").toURI()).toFile().getAbsolutePath();
String filePath1 = Path.of(xsdPath1).toString();
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse(new File(filePath1));
NodeList list = doc.getElementsByTagName("xs:element");
System.out.println(list.getLength());
// How to add the xs:complexType to same list as above
// list.add(doc.getElementsByTagName("xs:complexType"));
// list = doc.getElementsByTagName("xs:complexType");
// Loop and add data to Map for future lookups
for (int i = 0; i < list.getLength(); i++) {
Element element = (Element) list.item(i);
if (element.hasAttributes()) {
xmlTags.put(element.getAttribute("name"), element);
}
}
}
}
I don't know what you are trying to achieve (you have described the code you are writing, not the problem it is designed to solve) but what you are doing seems misguided. Trying to get useful information out of an XSD schema by parsing it at the XML level is really hard work, and it's clear from the questions you are asking that you haven't appreciated the complexities of what you are attempting.
It's hard to advise you on the low-level detail of maintaining hash maps and node lists when we don't understand what you are trying to achieve. What information are you trying to extract from the schema, and why?
There are a number of ways of getting information out of a schema at a higher level. Xerces has a Java API for accessing a compiled schema. Saxon has an XML representation of compiled schemas called SCM (the difference from raw XSD is that all the work of expanding xs:include and xs:import, expanding attribute groups, model groups, and substitution groups etc has been done for you). Saxon also has an XPath API (a set of extension functions) for accessing compiled schema information.
Is there any way to take an unknown schema xml file and convert the values, tags, etc from the xml file into pojo? I looked at jaxb and it seems like the best way to use that is if you already know the xml schema. So basically, I want to be able to parse the tags from the xml put each into its own object maybe through the use of an arraylist. Did I just not fully understand what jaxb can or is there a better tool that can do this, or is it just too hard to implement?
In the hope that this helps you to understand your situation!
public static void dumpAllNodes( String path ) throws Exception {
DocumentBuilder parser =
DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = parser.parse(new File(path));
NodeList nodes = doc.getElementsByTagNameNS( "*", "*" );
for( int i = 0; i < nodes.getLength(); i++ ){
Node node = nodes.item( i );
System.out.println( node.getNodeType() + " " + node.getNodeName() );
}
}
The NodeList nodes contains all element nodes, in document order (opening tag). Thus, elements contained within elements will be in that list, all alike. To obtain the attributes of a node, call
NamedNodeMap map = node.getAttributes();
The text content of a node is available by
String text = node.getTextContent();
but be aware that calling this returns the text in all elements of a subtree.
OTOH, you may call
Element root = doc.getDocumentElement();
to obtain the root element and then descend the tree recursively by calling Element.getChildNodes() and process the nodes (Element, Attr, Text,...) one by one. Also, note that Node.getParentNode() returns the parent node, so you could construct the XPath for each node even from the flat list by repeating this call to the root.
It simply depends what you expect from the resulting data structure (what you call ArrayList). If, for instance, you create a generic element type (MyElement) containing one map for attributes and another one for child elements, the second map would have to be
Map<String,List<MyElement>> name2elements
to provide for repeated elements - which makes access to elements occurring only once a little awkward.
I hope that I have illustrated the problems of generic XML parsing, which is not a task where JAXB can help you.
How can I get all XML branches using Java.
For example if i have the following XML:
<?xml version="1.0" encoding="UTF-8"?>
<addresses xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation='test.xsd'>
<address>
<name>Joe Tester</name>
<street>Baker street 5</street>
</address>
<person>
<name>Joe Tester</name>
<age>44</age>
</person>
</addresses>
I want to obtain the following branches:
addresses
addresses_address
addresses_address_name
addresses_address_street
addresses_person
addresses_person_name
addresses_person_age
Thanks.
You can get XML root, its' node and sub node names easily using any template engine. i.e Velocity, FreeMarker and other, FreeMarker have powerful new facilities for XML processing. You can drop XML documents into the data model, and templates can pull data from them in a variety of ways, such as with XPath expressions. FreeMarker, as an XML transformation tool with the much better-known XSLT stylesheet approach promulgated by the Worldwide Web Consortium (W3C).
FrerMarker support XPath to using jaxen,XPath expression needs Jaxen. downlaod
FreeMarker will use Xalan, unless you choose Jaxen by calling freemarker.ext.dom.NodeModel.useJaxenXPathSupport() from Java.
Just you need One Template, that will generate all XML branches according to input XML. really Put any XML on run-time to data model freemarker will process the template and generate XML branches corresponding to that XML structure. If your XML structure will change then no need of to change your Java code. Even if you want to change the output then changes will comes in template file hence no need recompilation Java code.
Just change in template, get get changes on the fly.
FTL File [One template for multiple XML document for creating xml branch names]
<#list doc ['/*' ] as rootNode>
<#assign rootNodeValue="${rootNode?node_name}">
${rootNodeValue}
<#list doc ['/*/*' ] as childNodes>
<#if childNodes?is_node==true>
${rootNodeValue}-${childNodes?node_name}
<#list doc ['/*/${childNodes?node_name}/*' ] as subNodes>
${rootNodeValue}-${childNodes?node_name}-${subNodes?node_name}
</#list>
</#if>
</#list>
</#list>
XMLTest.Java for process template
import java.io.IOException;
import java.io.InputStream;
import java.io.StringWriter;
import java.util.HashMap;
import java.util.Map;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import freemarker.ext.dom.NodeModel;
import freemarker.template.Configuration;
import freemarker.template.DefaultObjectWrapper;
import freemarker.template.ObjectWrapper;
import freemarker.template.Template;
import freemarker.template.TemplateException;
public class XMLTest {
public static void main(String[] args) throws SAXException, IOException,
ParserConfigurationException, TemplateException {
Configuration config = new Configuration();
config.setClassForTemplateLoading(XMLTest.class, "");
config.setObjectWrapper(new DefaultObjectWrapper());
config.setObjectWrapper(ObjectWrapper.BEANS_WRAPPER);
Map<String, Object> dataModel = new HashMap<String, Object>();
//load xml
InputStream stream = XMLTest.class.getClassLoader().getResourceAsStream(xml_path);
// if you xml sting then then pass it from InputSource constructor, no need of load xml from dir
InputSource source = new InputSource(stream);
NodeModel xmlNodeModel = NodeModel.parse(source);
dataModel.put("doc", xmlNodeModel);
Template template = config.getTemplate("test.ftl");
StringWriter out = new StringWriter();
template.process(dataModel, out);
System.out.println(out.getBuffer().toString());
}
}
Final OutPut
addresses
addresses-address
addresses-address-name
addresses-address-street
addresses-person
addresses-person-name
addresses-person-age
See doc for 1.XML Node Model 2.XML Node MOdel
Download FreeMarker from here
Downlaod Jaxen from here
There are many ways that you can extract data from XML and use it in Java. The one you choose will depend on how you want to use the data.
Some scenarios are:
You might want to manipulate nodes, order, remove and add others and transform the XML.
You might just want to read (and possibly change) the text contained in elements and attributes.
You might have a very large file and you just want to find some particular data and ignore the rest of the file.
For scenario #3, the best option is some memory-efficient stream-based parser, such as SAX or XML reader with the StAX API.
You can also use that for scenario #2, if you do mostly reading (and not writing), but DOM-based APIs might be easier to work with. You can use the standard DOM org.w3c.dom API or a more Java-like API such as JDOM or DOM4J. If you wish to synchronize XML files with Java objects you also might want to use a full Java-XML mapping framework such as JAXB.
DOM APIs are also great for scenario #1, but in many cases it might be simpler to use XSLT (via the javax.xml.transform TrAX API in Java). If you use DOM you can also use XPath to select the nodes.
I will show you an example on how to extract the individual nodes of your file using the standard DOM API (org.w3c.dom) and also using XPath (javax.xml.xpath).
1. Setup
Initialize the parser:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Parse file into a Document Object Model:
Document source = builder.parse(new File("src/main/resources/addresses.xml"));
2. Selecting nodes with J2SE DOM
You get the root element using getDocumentElement():
Element addresses = source.getDocumentElement();
From there you can get the child nodes using getChildNodes() but that will return all child nodes, which includes text nodes (the whitespace between elements). addresses.getChildNodes().item(0) returns the whitespace after the <addresses> tag and before the <address> tag. To get the element you would have to go for the second item. An easier way to do that is use getElementsByTagName, which returns a node-set and get the first item:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Many of the DOM methods return org.w3c.dom.Node objects, which you have to cast. Sometimes they might not be Element objects so you have to check. Node sets are not automatically converted into arrays. They are org.w3c.dom.NodeList so you have to use .item(0) and not [0] (if you use other DOM APIs such as JDOM or DOM4J, it will seem more intuitive).
You could use addresses.getElementsByTagName to get all the elements you need, but you would have to deal with the context for the two <name> elements. So a better way is to call it in the appropriate context:
Element addresses_address = (Element)addresses.getElementsByTagName("address").item(0);
Element addresses_address_name = (Element)addresses_address.getElementsByTagName("name").item(0);
Element addresses_address_street = (Element)addresses_address.getElementsByTagName("street").item(0);
Element addresses_person = (Element)addresses.getElementsByTagName("person").item(0);
Element addresses_person_name = (Element)addresses_person.getElementsByTagName("name").item(0);
Element addresses_person_age = (Element)addresses_person.getElementsByTagName("age").item(0);
That will give you all the Element nodes (or branches as you called them) for your file. If you want the text nodes (as actual Node objects) you need to get it as the first child:
Node textNode = addresses2_address_street.getFirstChild();
And if you want the String contents you can use:
String street = addresses2_address_street.getTextContent();
3. Selecting nodes with XPath
Another way to select nodes is using XPath. You will need the DOM source and you also need to initialize the XPath processor:
XPath xPath = XPathFactory.newInstance().newXPath();
You can extract the root node like this:
Element addresses = (Element)xPath.evaluate("/addresses", source, XPathConstants.NODE);
And all the other nodes using a path-like syntax:
Element addresses_address = (Element)xPath.evaluate("/addresses/address", source, XPathConstants.NODE);
Element addresses_address_name = (Element)xPath.evaluate("/addresses/address/name", source, XPathConstants.NODE);
Element addresses_address_street = (Element)xPath.evaluate("/addresses/address/street", source, XPathConstants.NODE);
You can also use relative paths, choosing a different element as the root:
Element addresses_person = (Element)xPath.evaluate("person", addresses, XPathConstants.NODE);
Element addresses_person_name = (Element)xPath.evaluate("person/name", addresses, XPathConstants.NODE);
Element addresses_person_age = (Element)xPath.evaluate("age", addresses_person, XPathConstants.NODE);
You can get the text contents as before, since you have Element objects:
String addressName = addresses_address_name.getTextContent();
But you can also do it directly using the same methods above without the last argument (which defaults to string). Here I'm using different relative and absolute XPath expressions:
String addressName = xPath.evaluate("name", addresses_address);
String addressStreet = xPath.evaluate("address/street", addresses);
String personName = xPath.evaluate("name", addresses_person);
String personAge = xPath.evaluate("/addresses/person/age", source);
My XML file looks like this:
<Messages>
<Contact Name="Robin" Number="8775454554">
<Message Date="24 Jan 2012" Time="04:04">this is report1</Message>
</Contact>
<Contact Name="Tobin" Number="546456456">
<Message Date="24 Jan 2012" Time="04:04">this is report2</Message>
</Contact>
<Messages>
I need to check whether the 'Number' attribute of Contact element is equal to 'somenumber' and if it is, I'm required to insert one more Message element inside Contact element.
How can it be achieved using DOM? And what are the drawbacks of using DOM?
The main drawback to using a DOM is it's necessary to load the whole model into memory at once, rather than if your simply parsing the document, you can limit the data you keep in memory at one point. This of course isn't really an issue until your processing very large XML documents.
As for the processing side of things, something like the following should work:
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document dom = db.parse(is);
NodeList contacts = dom.getElementsByTagName("Contact");
for(int i = 0; i < contacts.getLength(); i++) {
Element contact = (Element) contacts.item(i);
String contactNumber = contact.getAttribute("Number");
if(contactNumber.equals(somenumber)) {
Element newMessage = dom.createElement("Message");
// Configure the message element
contact.appendChild(newMessage);
}
}
DOM has two main disadvantages:
It requires reading of the complete XML into a Java representation in memory. That can be both time and memory consuming
It is a pretty verbose API, so you need to write a lot of code to achieve simple things like you're asking for.
If time and memory consumption is OK for you, but verbosity is not, you could still use jOOX, a library that I have created to wrap standard Java DOM objects to simplify manipulation of XML. These are some examples of how you would implement your requirement with jOOX:
// With css-style selectors
String result1 = $(file).find("Contact[Number=somenumber]").append(
$("<Message Date=\"25 Jan 2012\" Time=\"23:44\">this is report2</Message>")
).toString();
// With XPath
String result2 = $(file).find("//Contact[#Number = somenumber]").append(
$("<Message Date=\"25 Jan 2012\" Time=\"23:44\">this is report2</Message>")
).toString();
// Instead of file, you can also provide your source XML in various other forms
Note that jOOX only wraps standard Java DOM. The underlying operations (find() and append(), as well as $() actually perform various DOM operations).
You will do something to this effect.
Get the NodeList of Contact element.
Iterate through the NodeList and get Contact element.
Get Number through contact.getAttribute("Number") where contact is of type Element.
If your number equals someNumber, then add Message by calling contact.appendChild(). Message must be an element.
Use the Element class to create a new element
Element message = doc.createElement("Message");
message.setAttribute("message", strMessage);
Now add this element after whatever element you want using
elem.getParentNode().insertBefore(message, elem.getNextSibling());
You might want to take a look at this tutorial its about exactly what you want to do
I've seen numerous examples about how to read XML files in Java. But they only show simple XML files. For example they show how to extract first and last names from an XML file. However I need to extract data from a collada XML file. Like this:
<library_visual_scenes>
<visual_scene id="ID1">
<node name="SketchUp">
<instance_geometry url="#ID2">
<bind_material>
<technique_common>
<instance_material symbol="Material2" target="#ID3">
<bind_vertex_input semantic="UVSET0" input_semantic="TEXCOORD" input_set="0" />
</instance_material>
</technique_common>
</bind_material>
</instance_geometry>
</node>
</visual_scene>
</library_visual_scenes>
This is only a small part of a collada file. Here I need to extract the id of visual_scene, and then the url of instance_geometry and last the target of instance_material. Of course I need to extract much more, but I don't understand how to use it really and this is a place to start.
I have this code so far:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = null;
try {
builder = factory.newDocumentBuilder();
}
catch( ParserConfigurationException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
Document document = null;
try {
document = builder.parse( string );
}
catch( IOException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
catch( SAXException error ) {
Log.e( "Collada", error.getMessage() ); return;
}
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );
It seems like most examples on the web is similar to this one: http://www.easywayserver.com/blog/java-how-to-read-xml-file/
I need help figuring out what to do when I want to extract deeper tags or find a good tutorial on reading/parsing XML files.
Really, your parsing per se is already done when you call builder.parse(string). What you need to know now is how to select/query information from the parsed XML document.
I would agree with #khachik regarding how to do that. Elaborating a little (since no one else has posted an answer):
XPath is the most convenient way to extract information, and if your input document is not huge, XPath is fast enough. Here is a good starting tutorial on XPath in Java. XPath is also recommended if you need random access to the XML data (i.e. if you have to go back and forth extracting data from the tree in a different order than it appears in the source document), since SAX is designed for linear access.
Some sample XPath expressions:
extract the id of visual_scene: /*/visual_scene/#id
the url of instance_geometry: /*/visual_scene/node/instance_geometry/#url
the url of instance_geometry for node whose name is Sketchup: /*/visual_scene/node[#name = 'Sketchup']/instance_geometry/#url
the target of instance_material: /*/visual_scene/node/instance_geometry/bind_material/technique_common/instance_material/#target
Since COLLADA models can be really large, you might need to do a SAX-based filter, which will allow you to process the document in stream mode without having to keep it all in memory at once. But if your existing code to parse the XML is already performing well enough, you may not need SAX. SAX is more complicated to use for extracting specific data than XPath.
You are using DOM in your code.
DOM creates a tree structure of the xml file it parsed, and you have to traverse the tree to get the information in various nodes.
In your code all you did is create the tree representation. I.e.
document = builder.parse( string );//document is loaded in memory as tree
Now you should reference the DOM apis to see how to get the information you need.
NodeList library_visual_scenes = document.getElementsByTagName( "library_visual_scenes" );
For instance this method returns a NodeList of all elements with the specified name.
Now you should loop over the NodeList
for (int i = 0; i < library_visual_scenes.getLength(); i++) {
Element element = (Element) nodes.item(i);
Node visual_scene = element.getFirstChild();
if(visual_scene.getNodeType() == Node.ELEMENT_NODE)
{
String id = ((Element)visual_scene).getAttribute(id);
System.out.println("id="+id);
}
}
DISCLAIMER: This is a sample code. Have not compiled it. It shows you the concept. You should look into DOM apis.
EclipseLink JAXB (MOXy) has a useful #XmlPath extension for leveraging XPath to populate an object. It may be what you are looking for. Note: I am the MOXy tech lead.
The following example maps a simple address object to Google's representation of geocode information:
package blog.geocode;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.bind.annotation.XmlType;
import org.eclipse.persistence.oxm.annotations.XmlPath;
#XmlRootElement(name="kml")
#XmlType(propOrder={"country", "state", "city", "street", "postalCode"})
public class Address {
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:Thoroughfare/ns:ThoroughfareName/text()")
private String street;
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:LocalityName/text()")
private String city;
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:AdministrativeAreaName/text()")
private String state;
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:CountryNameCode/text()")
private String country;
#XmlPath("Response/Placemark/ns:AddressDetails/ns:Country/ns:AdministrativeArea/ns:SubAdministrativeArea/ns:Locality/ns:PostalCode/ns:PostalCodeNumber/text()")
private String postalCode;
}
For the rest of the example see:
http://bdoughan.blogspot.com/2010/09/xpath-based-mapping-geocode-example.html
Nowadays, several java RAD tools have java code generators from given DTDs, so you can use them.