I have a requirement to unmarshall a subset of Unknown XML content, with that unmarshalled object, I need modify some contents and re-bind the same XML content(subset) with the Original XML.
Sample Input XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin</Name>
<Role>SM</Role>
<Status>Active</Status>
</Content>
.....
</Message>
Need to unmarshall the <Content> tag alone, by keeping the other XML part as same. Need to modify the elements in <Content> tag and bind the modified XML part with the original as shown below:
Expected Output XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin_123</Name>
<Role>Senior Member</Role>
<Status>1</Status>
</Content>
.....
</Message>
My Questions:
What is the possible solution for this Requirement ? (Except DOM parsing - as XML contnet is very huge)
Is there any option to do this in JAXB2.0 ?
Please provide your suggestions on this.
Consider cutting your source document down to size using the StAX API.
For the given sample, this code creates a DOM document with a root element of the Content element:
class ContentFinder implements StreamFilter {
private boolean capture = false;
#Override public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "Content".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "Content".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}
XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Source src = new StAXSource(reader);
DOMResult res = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(src, res);
Document doc = (Document) res.getNode();
This can then be passed to JAXB as a DOMSource.
Similar techniques can be used when rewriting the XML on output.
JAXB doesn't seem to accept a StreamSource directly, at least in the Oracle 1.7 implementation.
You can annotate an Object property on your class with #XmlAnyElement and by default the unmapped content will be captured as a DOM nodes. If you specify a DomHandler on the #XmlAnyElement then you can control the format. Here is a link to an example where the content is kept as a String.
JAXB use String as it is
Related
A real example of the XML data I have to parse through and how the file is configured. this is how the file is presented to me.
<?xml version="1.0"?>
<session>
<values>
<value id="FILE_CREATE_DATE">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
<value id="LAST_ACCESSED">
<timestamp>2012-09-17T17:15:23Z</timestamp>
</value>
<value id="VERSION_TIMESTAMP">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
</values>
</session>
I need to go into this file and retrieve the FILE_CREATE_DATE data.
My code so far:
File xmlFile = new File(XMLFileData[i].getPath());
FileInputStream myXMLStream = new FileInputStream(xmlFile);
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLReader.hasText())
{
System.out.println(XMLReader.getText());
break;
}
}
XMLReader.next();
}
the 'getLocalName()' function returns 'Sessions' then 'value' then 'values' but never returns the actual name of the element. I need to test to see if I am at the right element then retrieve the data from that element...
I use Jsoup which is a library for parsing HTML. But it can be used for xml too. you would first have to load the XML file into a Document object then simply call
doc.getElementById("FILE_CREATE_DATE");
This will return an Element object that will have the timestamp as a child. Here's a link to the library: https://jsoup.org/
This is my first StackOverflow answer so let me know if it helps !
Your id is not an element - it's element attribute.
You should read attribute of your value node, see the javadoc for getAttributeValue method:
http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getAttributeValue(java.lang.String,%20java.lang.String)
Returns the normalized attribute value of the attribute with the
namespace and localName If the namespaceURI is null the namespace is
not checked for equality
So it will be:
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value")) {
String idValue = XMLReader.getAttributeValue(null, "id");
//here idValue will be equal to FILE_CREATE_DATE, LAST_ACCESSED or VERSION_TIMESTAMP
}
try something like
if(XMLReader.getAttributeValue(0).equalIgnorecase("FILE_CREATE_DATE"))
getAttributeValue : Return value of the given index of the attribute. for
<value id="FILE_CREATE_DATE">
id is the first attribute. So XMLReader.getAttributeValue(0)
but before calling this you have to validate whether element has the first attribute. Because all the tags does not have at least 1 attribute.
in jsoup you can query like this
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://www.dropbox.com/public/xml/yourfile.xml").userAgent("Mozilla").get();
//<value id="FILE_CREATE_DATE">
Elements links = doc.select("value[id=FILE_CREATE_DATE]");
for (Element link : links) {
if(link.attr("id").contains("FILE_CREATE_DATE"))//find the link with some texts
{
System.out.println("here is the element you need");
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals("FILE_CREATE_DATE"))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
So this code is the final result of all my anguish on the topic of recovering specific data from a XML data file. I want to thank everyone who helped me out with answers - regardless on if it was what I was looking for they got me thinking and that led to the solution...
I'm parsing a xml string with dom4j and I'm using xpath to select some element from it, the code is :
String test = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><epp xmlns=\"urn:ietf:params:xml:ns:epp-1.0\"><response><result code=\"1000\"><msg lang=\"en-US\">Command completed successfully</msg></result><trID><clTRID>87285586-99412370</clTRID><svTRID>52639BB8-1-ARNES</svTRID></trID></response></epp>";
SAXReader reader = new SAXReader();
reader.setIncludeExternalDTDDeclarations(false);
reader.setIncludeInternalDTDDeclarations(false);
reader.setValidation(false);
Document xmlDoc;
try {
xmlDoc = reader.read(new StringReader(test));
xmlDoc.getRootElement();
Node nodeStatus = xmlDoc.selectSingleNode("//epp/response/result");
System.out.print(nodeStatus.getText());
} catch (DocumentException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I always get null for the nodeStatus variable. I actualy nead to read the code from the result noad from the xml
<result code="1000">
This is the XML that I am reading from the String test:
<?xml version="1.0" encoding="UTF-8"?>
<epp xmlns="urn:ietf:params:xml:ns:epp-1.0">
<response>
<result code="1000">
<msg lang="en-US">Command completed successfully</msg>
</result>
<trID>
<clTRID>87285586-99412370</clTRID>
<svTRID>52639BB8-1-ARNES</svTRID>
</trID>
</response>
</epp>
Any hints?
Your XML has a namespace. DOM4J returns null because it won't find your nodes.
To make it work, you first have to register the namespaces you are using. You will need a prefix. Any one. And you will have to use that prefix in your XPath.
You could use tns for "target namespace". Then you have to create a xpath object with it like this:
XPath xpath = new DefaultXPath("/tns:epp/tns:response/tns:result");
To register the namespaces you will need to create a Map, add the namespace with the prefix you used in the xpath expression, and pass it to the setNamespaceURIs() method.
namespaces.put("tns", "urn:ietf:params:xml:ns:epp-1.0");
xpath.setNamespaceURIs(namespaces);
Now you can call selectSingleNode, but you will call it on your XPath object passing the document as the argument:
Node nodeStatus = xpath.selectSingleNode(xmlDoc);
From there you can extract the data you need. getText() won't give you the data you want. If you want the contents of the result node as XML, you can use:
nodeStatus.asXML()
Edit: to retrieve just the code, change your XPath to:
/tns:epp/tns:response/tns:result/#code
And retrieve the result with
nodeStatus.getText();
I replaced the double slash // (which means descendant-or-self) with / since the expression contains the full path and / is more efficient. But if you only have one result node in your whole file, you can use:
//result/#code
to extract the data. It will match all descendants. If there is more than one result, it will return a node-set.
I have a Maven and Spring based Java web application.
I have a config.xml file in src/main/resources.
<?xml version="1.0" encoding="UTF-8"?>
<sourceConfigs>
<areas>
<area name="Defects">
<fileHeaders>ID,Issue Key,Fields
</fileHeaders>
</area>
<area name="Organization">
<fileHeaders>ID,Org Key,Fields
</fileHeaders>
</area>
</areas>
<sourceTypes>
<source name="source1">
<adapterObject>source1Adapter</adapterObject>
<resultObject>JsonObject</resultObject>
</source>
<source name="source2">
<adapterObject>source2Adapter</adapterObject>
<resultObject>sourceObject</resultObject>
</source>
</sourceTypes>
</sourceConfigs>
I want to parse the above XML to Java object based on the attributes.
I have created two classes.
Area.java
#XmlRootElement(name="area")
public class Area {
String name;
String fileHeaders;
#XmlAttribute
//get Name
//set Name
#XmlElement
//get fileHeaders
//set FileHeaders
}
Source.java
#XmlRootElement(name="source")
public class Source {
String name;
String adapterObject;
String resultObject;
#XmlAttribute
//get Name
//set Name
#XmlElement
//get adapterObject
//set adapterObject
#XmlElement
//get resultObject
//set resultObject
}
I want to parse the XML based on the attribute value.
ie; if the attribute value of area is Defects, the parsed object should have the values based on that, else if its Organization, then the values based on that and object type to Area object. Similarly for Source type also.
How can I do that?
When it was only a simple XML file like following
<?xml version="1.0" encoding="UTF-8"?>
<sourceConfig area="Defects">
<adapterObject>jAdapter</adapterObject>
<resultObject>jsonObject</resultObject>
</sourceConfig>
My POJO was based on that and the code I used to parse is
public SourceConfig getConfigObject() throws JAXBException, IOException {
JAXBContext jaxbContext = JAXBContext.newInstance(SourceConfig.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Resource resource=new ClassPathResource("config.xml");
File file=resource.getFile();
SourceConfig sourceConfig = (SourceConfig) jaxbUnmarshaller.unmarshal(file);
return sourceConfig;
}
But for this complex I don't know how to parse based on attribute values, and also multiple list of data.
How to parse based on the attribute values?
I have created two POJOs for parsing different kind. If it's a single class also it's fine.
UPDATE 1
My expected output.
When I pass the value "Defects" when unmarshalling the Area object, The values should be
name= Defects
fileHeaders=ID,Issue Key,Fields
And If I pass "Organization", The area object values should be based on that. Similarly when unmarshalling source.
Now I am getting as List of Areas and List of SourceTypes, then I take the corresponding object by checking the value.
May be is there any way to parse only selected one based on attribute value instead of getting list and then checking value and returning the object?
I have to parse an XML file with following structure:
<root>
<object_1>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
<object_a>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
.
.
.
</children>
</object_a>
</children>
</object_1>
<object_2>
.
.
.
</object_n>
</root>
Aim is to parse this multilevel nesting. A few classes are defined in Java.
Class Object_1
Class Object_2
.
.
.
Class Object_N
with their respective properties.
The following code is working for me, but then this is not the best way of doing things.
File file = new File(fileName);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
if(doc ==null) return;
Node node = doc.getFirstChild();
NodeList lst = node.getChildNodes();
Node children = null ;
int len = lst.getLength();
for(int index=0;index<len;index++)
{
Node child = lst.item(index);
String name = child.getNodeName();
if(name=="Name")
name = child.getNodeValue();
else if(name=="Comment")
comment = child.getNodeValue());
else if(name=="children")
children = child;
}
if(children==null) return;
lst = children.getChildNodes();
len = lst.getLength();
Class<?> obj=null;
AbsModel model = null;
for(int index=0;index<len;index++)
{
Node childNode = lst.item(index);
String modelName = childNode.getNodeName();
try {
obj = Class.forName(modelName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
if(obj!=null)
model = (AbsModel) obj.newInstance();
else
model = new GenericModel();
model.restoreDefaultPropFromXML(childNode);
addChild(model);
}
}
Is there a better way of parsing this XML.
Consider using JAXB, which is part of Java since version 6. You should be able to parse (“unmarshall”) your XML file into your own classes with almost no code, just adding a few annotations expliciting the mapping between your object structure and your XML structure.
StAX and or JAXB is almost always the way to go.
If the XML is really dynamic (like attributes specify the property name) ie <prop name="property" value="" /> then you will need to use StAX only or live with what JAXB will map it to (a POJO with name and value properties) and post process.
Personally I find combining StAX and JAXB the best. I parse to the elements I want and then use JAXB to turn the element into a POJO.
See Also:
My own utility library that will turn an XML Stream into an iterator of objects.
Parsing very large XML files and marshalling to Java Objects
http://tedone.typepad.com/blog/2011/06/unmarshalling-benchmark-in-java-jaxb-vs-stax-vs-woodstox.html
While JAXB may be the best choice I'd also like to mention jOOX which provides a JQuery-like API and makes working with XML documents really pleasant.
i have some user defined tag. for example data here , jssj .I have a file(not xml) which contains some data embeded in tags.I need a parser for this which will identify my tags and will extract the data in proper format.
Eg
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
tags can also have some attributes as simlar to html tags. Eg
<mytag height="f" width ="d" > bla bla bla </mytag>
<mytag attribute="val"> bla bla bla</mytag>
You could look at a parser generator like antlr.
Unless your tag syntax can be represented with a (simple) regular grammar (in which case you could try to scan the file with regexes), you will need a proper parser. It is actually not very hard to do at all - just the first time tastes like biting bullets...
You can use JAXB, already included in Java. It's quite simple.
First you need to create a binding to your XML code. The binding provides a map between Java objects and the XML code.
An example would be:
#XmlRootElement(name = "YourRootElement", namespace ="http://someurl.org")
#XmlAccessorType(XmlAccessType.FIELD)
#XmlType(name = "", propOrder = {
"intValue",
"stringArray",
"stringValue"}
)
public class YourBindingClass {
protected int intValue;
#XmlElement(nillable = false)
protected List<String> stringArray;
#XmlElement(name = "stringValue", required = true)
protected String stringValue;
public int getIntValue() {
return intValue;
}
public void setIntValue(int value) {
this.intValue = value;
}
public List<String> getStringArray() {
if (stringArray == null) {
stringArray = new ArrayList<String>();
}
return this.stringArray;
}
public String getStringValue() {
return stringValue;
}
public void setStringValue(String value) {
this.stringValue = value;
}
}
Then, to encode your Java objects into XML, you can use:
YourBindingClass yourBindingClass = ...;
JAXBContext jaxbContext = JAXBContext.newInstance(YourBindingClass.class);
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, false);
/** If you need to specify a schema */
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new URL("http:\\www.someurl.org"));
marshaller.setSchema(schema);
marshaller.setProperty(Marshaller.JAXB_SCHEMA_LOCATION, true);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
marshaller.marshal(yourBindingClass, stream);
System.out.println(stream);
To parse your XML back to objects:
InputStream resourceAsStream = ... // Your XML, File, etc.
JAXBContext jaxbContext = JAXBContext.newInstance(YourBindingClass.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Object r = unmarshaller.unmarshal(resourceAsStream);
if (r instanceof YourBindingClass) ...
Example starting from a Java object:
YourBindingClass s = new YourBindingClass();
s.setIntValue(1);
s.setStringValue("a");
s.getStringArray().add("b1");
s.getStringArray().add("b2");
// marshal ...
Result:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:YourRootElement xmlns:ns2="http://someurl.org">
<intValue>1</intValue>
<stringArray>b1</stringArray>
<stringArray>b2</stringArray>
<stringValue>a</stringValue>
</ns2:YourRootElement>
If you don't know the input format, that means you probably don't have a XML schema. If you don't have a schema you don't have some it's benefits such as:
It is easier to describe allowable document content
It is easier to validate the correctness of data
It is easier to define data facets (restrictions on data)
It is easier to define data patterns (data formats)
It is easier to convert data between different data types
Anyway, the previous code also works with XML code that contains 'unknown' tags. However your XML code still have to present the required fields and follow the declared patterns.
So the following XML code is also valid. The only restriction is: the tag 'stringValue' should be there. Note that 'stringArrayQ' was not previously declared.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:YourRootElement xmlns:ns2="http://someurl.org">
<stringValue>a</stringValue>
<stringArrayQ>b1</stringArrayQ>
</ns2:YourRootElement>
Are these XML tags? If so, look into one of the many Java XML libraries already available. If they're some kind of custom tagging format, then you're just going to have to write it yourself.
For xml tags - use DOM parser or SAX parser.
You example is XML with this modification:
<root>
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
</root>
You can use any XML parser you want to parse it.
Edit:
Attributes are a normal part of XML.
<root>
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
<mytag height="f" width ="d" > bla bla bla </mytag>
<mytag attribute="val"> bla bla bla</mytag>
</root>
Every XML parser can deal with them.
Edit:
If you were able to use Python, you could do something like this:
import lxml.etree
doc = lxml.etree.parse("foo.xml")
print doc.xpath("//mytag[1]/#width")
# => ['d']
That's what i call simple.