i have some user defined tag. for example data here , jssj .I have a file(not xml) which contains some data embeded in tags.I need a parser for this which will identify my tags and will extract the data in proper format.
Eg
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
tags can also have some attributes as simlar to html tags. Eg
<mytag height="f" width ="d" > bla bla bla </mytag>
<mytag attribute="val"> bla bla bla</mytag>
You could look at a parser generator like antlr.
Unless your tag syntax can be represented with a (simple) regular grammar (in which case you could try to scan the file with regexes), you will need a proper parser. It is actually not very hard to do at all - just the first time tastes like biting bullets...
You can use JAXB, already included in Java. It's quite simple.
First you need to create a binding to your XML code. The binding provides a map between Java objects and the XML code.
An example would be:
#XmlRootElement(name = "YourRootElement", namespace ="http://someurl.org")
#XmlAccessorType(XmlAccessType.FIELD)
#XmlType(name = "", propOrder = {
"intValue",
"stringArray",
"stringValue"}
)
public class YourBindingClass {
protected int intValue;
#XmlElement(nillable = false)
protected List<String> stringArray;
#XmlElement(name = "stringValue", required = true)
protected String stringValue;
public int getIntValue() {
return intValue;
}
public void setIntValue(int value) {
this.intValue = value;
}
public List<String> getStringArray() {
if (stringArray == null) {
stringArray = new ArrayList<String>();
}
return this.stringArray;
}
public String getStringValue() {
return stringValue;
}
public void setStringValue(String value) {
this.stringValue = value;
}
}
Then, to encode your Java objects into XML, you can use:
YourBindingClass yourBindingClass = ...;
JAXBContext jaxbContext = JAXBContext.newInstance(YourBindingClass.class);
Marshaller marshaller = jaxbContext.createMarshaller();
marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
marshaller.setProperty(Marshaller.JAXB_FRAGMENT, false);
/** If you need to specify a schema */
SchemaFactory sf = SchemaFactory.newInstance(XMLConstants.W3C_XML_SCHEMA_NS_URI);
Schema schema = sf.newSchema(new URL("http:\\www.someurl.org"));
marshaller.setSchema(schema);
marshaller.setProperty(Marshaller.JAXB_SCHEMA_LOCATION, true);
ByteArrayOutputStream stream = new ByteArrayOutputStream();
marshaller.marshal(yourBindingClass, stream);
System.out.println(stream);
To parse your XML back to objects:
InputStream resourceAsStream = ... // Your XML, File, etc.
JAXBContext jaxbContext = JAXBContext.newInstance(YourBindingClass.class);
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
Object r = unmarshaller.unmarshal(resourceAsStream);
if (r instanceof YourBindingClass) ...
Example starting from a Java object:
YourBindingClass s = new YourBindingClass();
s.setIntValue(1);
s.setStringValue("a");
s.getStringArray().add("b1");
s.getStringArray().add("b2");
// marshal ...
Result:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:YourRootElement xmlns:ns2="http://someurl.org">
<intValue>1</intValue>
<stringArray>b1</stringArray>
<stringArray>b2</stringArray>
<stringValue>a</stringValue>
</ns2:YourRootElement>
If you don't know the input format, that means you probably don't have a XML schema. If you don't have a schema you don't have some it's benefits such as:
It is easier to describe allowable document content
It is easier to validate the correctness of data
It is easier to define data facets (restrictions on data)
It is easier to define data patterns (data formats)
It is easier to convert data between different data types
Anyway, the previous code also works with XML code that contains 'unknown' tags. However your XML code still have to present the required fields and follow the declared patterns.
So the following XML code is also valid. The only restriction is: the tag 'stringValue' should be there. Note that 'stringArrayQ' was not previously declared.
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<ns2:YourRootElement xmlns:ns2="http://someurl.org">
<stringValue>a</stringValue>
<stringArrayQ>b1</stringArrayQ>
</ns2:YourRootElement>
Are these XML tags? If so, look into one of the many Java XML libraries already available. If they're some kind of custom tagging format, then you're just going to have to write it yourself.
For xml tags - use DOM parser or SAX parser.
You example is XML with this modification:
<root>
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
</root>
You can use any XML parser you want to parse it.
Edit:
Attributes are a normal part of XML.
<root>
<newpage> thix text </newpage>
<tagD>
<tagA> kk</tagA>
</tagD>
<mytag height="f" width ="d" > bla bla bla </mytag>
<mytag attribute="val"> bla bla bla</mytag>
</root>
Every XML parser can deal with them.
Edit:
If you were able to use Python, you could do something like this:
import lxml.etree
doc = lxml.etree.parse("foo.xml")
print doc.xpath("//mytag[1]/#width")
# => ['d']
That's what i call simple.
Related
A real example of the XML data I have to parse through and how the file is configured. this is how the file is presented to me.
<?xml version="1.0"?>
<session>
<values>
<value id="FILE_CREATE_DATE">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
<value id="LAST_ACCESSED">
<timestamp>2012-09-17T17:15:23Z</timestamp>
</value>
<value id="VERSION_TIMESTAMP">
<timestamp>2012-04-16T21:33:31Z</timestamp>
</value>
</values>
</session>
I need to go into this file and retrieve the FILE_CREATE_DATE data.
My code so far:
File xmlFile = new File(XMLFileData[i].getPath());
FileInputStream myXMLStream = new FileInputStream(xmlFile);
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLReader.hasText())
{
System.out.println(XMLReader.getText());
break;
}
}
XMLReader.next();
}
the 'getLocalName()' function returns 'Sessions' then 'value' then 'values' but never returns the actual name of the element. I need to test to see if I am at the right element then retrieve the data from that element...
I use Jsoup which is a library for parsing HTML. But it can be used for xml too. you would first have to load the XML file into a Document object then simply call
doc.getElementById("FILE_CREATE_DATE");
This will return an Element object that will have the timestamp as a child. Here's a link to the library: https://jsoup.org/
This is my first StackOverflow answer so let me know if it helps !
Your id is not an element - it's element attribute.
You should read attribute of your value node, see the javadoc for getAttributeValue method:
http://docs.oracle.com/javase/6/docs/api/javax/xml/stream/XMLStreamReader.html#getAttributeValue(java.lang.String,%20java.lang.String)
Returns the normalized attribute value of the attribute with the
namespace and localName If the namespaceURI is null the namespace is
not checked for equality
So it will be:
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value")) {
String idValue = XMLReader.getAttributeValue(null, "id");
//here idValue will be equal to FILE_CREATE_DATE, LAST_ACCESSED or VERSION_TIMESTAMP
}
try something like
if(XMLReader.getAttributeValue(0).equalIgnorecase("FILE_CREATE_DATE"))
getAttributeValue : Return value of the given index of the attribute. for
<value id="FILE_CREATE_DATE">
id is the first attribute. So XMLReader.getAttributeValue(0)
but before calling this you have to validate whether element has the first attribute. Because all the tags does not have at least 1 attribute.
in jsoup you can query like this
public static void main(String[] args) {
Document doc;
try {
doc = Jsoup.connect("http://www.dropbox.com/public/xml/yourfile.xml").userAgent("Mozilla").get();
//<value id="FILE_CREATE_DATE">
Elements links = doc.select("value[id=FILE_CREATE_DATE]");
for (Element link : links) {
if(link.attr("id").contains("FILE_CREATE_DATE"))//find the link with some texts
{
System.out.println("here is the element you need");
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals("FILE_CREATE_DATE"))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
So this code is the final result of all my anguish on the topic of recovering specific data from a XML data file. I want to thank everyone who helped me out with answers - regardless on if it was what I was looking for they got me thinking and that led to the solution...
I have a Maven and Spring based Java web application.
I have a config.xml file in src/main/resources.
<?xml version="1.0" encoding="UTF-8"?>
<sourceConfigs>
<areas>
<area name="Defects">
<fileHeaders>ID,Issue Key,Fields
</fileHeaders>
</area>
<area name="Organization">
<fileHeaders>ID,Org Key,Fields
</fileHeaders>
</area>
</areas>
<sourceTypes>
<source name="source1">
<adapterObject>source1Adapter</adapterObject>
<resultObject>JsonObject</resultObject>
</source>
<source name="source2">
<adapterObject>source2Adapter</adapterObject>
<resultObject>sourceObject</resultObject>
</source>
</sourceTypes>
</sourceConfigs>
I want to parse the above XML to Java object based on the attributes.
I have created two classes.
Area.java
#XmlRootElement(name="area")
public class Area {
String name;
String fileHeaders;
#XmlAttribute
//get Name
//set Name
#XmlElement
//get fileHeaders
//set FileHeaders
}
Source.java
#XmlRootElement(name="source")
public class Source {
String name;
String adapterObject;
String resultObject;
#XmlAttribute
//get Name
//set Name
#XmlElement
//get adapterObject
//set adapterObject
#XmlElement
//get resultObject
//set resultObject
}
I want to parse the XML based on the attribute value.
ie; if the attribute value of area is Defects, the parsed object should have the values based on that, else if its Organization, then the values based on that and object type to Area object. Similarly for Source type also.
How can I do that?
When it was only a simple XML file like following
<?xml version="1.0" encoding="UTF-8"?>
<sourceConfig area="Defects">
<adapterObject>jAdapter</adapterObject>
<resultObject>jsonObject</resultObject>
</sourceConfig>
My POJO was based on that and the code I used to parse is
public SourceConfig getConfigObject() throws JAXBException, IOException {
JAXBContext jaxbContext = JAXBContext.newInstance(SourceConfig.class);
Unmarshaller jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Resource resource=new ClassPathResource("config.xml");
File file=resource.getFile();
SourceConfig sourceConfig = (SourceConfig) jaxbUnmarshaller.unmarshal(file);
return sourceConfig;
}
But for this complex I don't know how to parse based on attribute values, and also multiple list of data.
How to parse based on the attribute values?
I have created two POJOs for parsing different kind. If it's a single class also it's fine.
UPDATE 1
My expected output.
When I pass the value "Defects" when unmarshalling the Area object, The values should be
name= Defects
fileHeaders=ID,Issue Key,Fields
And If I pass "Organization", The area object values should be based on that. Similarly when unmarshalling source.
Now I am getting as List of Areas and List of SourceTypes, then I take the corresponding object by checking the value.
May be is there any way to parse only selected one based on attribute value instead of getting list and then checking value and returning the object?
I have a requirement to unmarshall a subset of Unknown XML content, with that unmarshalled object, I need modify some contents and re-bind the same XML content(subset) with the Original XML.
Sample Input XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin</Name>
<Role>SM</Role>
<Status>Active</Status>
</Content>
.....
</Message>
Need to unmarshall the <Content> tag alone, by keeping the other XML part as same. Need to modify the elements in <Content> tag and bind the modified XML part with the original as shown below:
Expected Output XML:
<Message>
<x>
</x>
<y>
</y>
<z>
</z>
<!-- Need to unmarshall this content to "Content" - java Object -->
<Content>
<Name>Robin_123</Name>
<Role>Senior Member</Role>
<Status>1</Status>
</Content>
.....
</Message>
My Questions:
What is the possible solution for this Requirement ? (Except DOM parsing - as XML contnet is very huge)
Is there any option to do this in JAXB2.0 ?
Please provide your suggestions on this.
Consider cutting your source document down to size using the StAX API.
For the given sample, this code creates a DOM document with a root element of the Content element:
class ContentFinder implements StreamFilter {
private boolean capture = false;
#Override public boolean accept(XMLStreamReader xml) {
if (xml.isStartElement() && "Content".equals(xml.getLocalName())) {
capture = true;
} else if (xml.isEndElement() && "Content".equals(xml.getLocalName())) {
capture = false;
return true;
}
return capture;
}
}
XMLInputFactory inFactory = XMLInputFactory.newFactory();
XMLStreamReader reader = inFactory.createXMLStreamReader(inputStream);
reader = inFactory.createFilteredReader(reader, new ContentFinder());
Source src = new StAXSource(reader);
DOMResult res = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(src, res);
Document doc = (Document) res.getNode();
This can then be passed to JAXB as a DOMSource.
Similar techniques can be used when rewriting the XML on output.
JAXB doesn't seem to accept a StreamSource directly, at least in the Oracle 1.7 implementation.
You can annotate an Object property on your class with #XmlAnyElement and by default the unmapped content will be captured as a DOM nodes. If you specify a DomHandler on the #XmlAnyElement then you can control the format. Here is a link to an example where the content is kept as a String.
JAXB use String as it is
I have to parse an XML file with following structure:
<root>
<object_1>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
<object_a>
<pro1> abc </pro1>
<pro2> pqr </pro2>
<pro3> xyz </pro3>
<children>
.
.
.
</children>
</object_a>
</children>
</object_1>
<object_2>
.
.
.
</object_n>
</root>
Aim is to parse this multilevel nesting. A few classes are defined in Java.
Class Object_1
Class Object_2
.
.
.
Class Object_N
with their respective properties.
The following code is working for me, but then this is not the best way of doing things.
File file = new File(fileName);
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(file);
doc.getDocumentElement().normalize();
if(doc ==null) return;
Node node = doc.getFirstChild();
NodeList lst = node.getChildNodes();
Node children = null ;
int len = lst.getLength();
for(int index=0;index<len;index++)
{
Node child = lst.item(index);
String name = child.getNodeName();
if(name=="Name")
name = child.getNodeValue();
else if(name=="Comment")
comment = child.getNodeValue());
else if(name=="children")
children = child;
}
if(children==null) return;
lst = children.getChildNodes();
len = lst.getLength();
Class<?> obj=null;
AbsModel model = null;
for(int index=0;index<len;index++)
{
Node childNode = lst.item(index);
String modelName = childNode.getNodeName();
try {
obj = Class.forName(modelName);
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
if(obj!=null)
model = (AbsModel) obj.newInstance();
else
model = new GenericModel();
model.restoreDefaultPropFromXML(childNode);
addChild(model);
}
}
Is there a better way of parsing this XML.
Consider using JAXB, which is part of Java since version 6. You should be able to parse (“unmarshall”) your XML file into your own classes with almost no code, just adding a few annotations expliciting the mapping between your object structure and your XML structure.
StAX and or JAXB is almost always the way to go.
If the XML is really dynamic (like attributes specify the property name) ie <prop name="property" value="" /> then you will need to use StAX only or live with what JAXB will map it to (a POJO with name and value properties) and post process.
Personally I find combining StAX and JAXB the best. I parse to the elements I want and then use JAXB to turn the element into a POJO.
See Also:
My own utility library that will turn an XML Stream into an iterator of objects.
Parsing very large XML files and marshalling to Java Objects
http://tedone.typepad.com/blog/2011/06/unmarshalling-benchmark-in-java-jaxb-vs-stax-vs-woodstox.html
While JAXB may be the best choice I'd also like to mention jOOX which provides a JQuery-like API and makes working with XML documents really pleasant.
I have a XmlDocument in java, created with the Weblogic XmlDocument parser.
I want to replace the content of a tag in this XMLDocument with my own data, or insert the tag if it isn't there.
<customdata>
<tag1 />
<tag2>mfkdslmlfkm</tag2>
<location />
<tag3 />
</customdata>
For example I want to insert a URL in the location tag:
<location>http://something</location>
but otherwise leave the XML as is.
Currently I use a XMLCursor:
XmlObject xmlobj = XmlObject.Factory.parse(a.getCustomData(), options);
XmlCursor xmlcur = xmlobj.newCursor();
while (xmlcur.hasNextToken()) {
boolean found = false;
if (xmlcur.isStart() && "schema-location".equals(xmlcur.getName().toString())) {
xmlcur.setTextValue("http://replaced");
System.out.println("replaced");
found = true;
} else if (xmlcur.isStart() && "customdata".equals(xmlcur.getName().toString())) {
xmlcur.push();
} else if (xmlcur.isEnddoc()) {
if (!found) {
xmlcur.pop();
xmlcur.toEndToken();
xmlcur.insertElementWithText("schema-location", "http://inserted");
System.out.println("inserted");
}
}
xmlcur.toNextToken();
}
I tried to find a "quick" xquery way to do this since the XmlDocument has an execQuery method, but didn't find it very easy.
Do anyone have a better way than this? It seems a bit elaborate.
How about an XPath based approach? I like this approach as the logic is super-easy to understand. The code is pretty much self-documenting.
If your xml document is available to you as an org.w3c.dom.Document object (as most parsers return), then you could do something like the following:
// get the list of customdata nodes
NodeList customDataNodeSet = findNodes(document, "//customdata" );
for (int i=0 ; i < customDataNodeSet.getLength() ; i++) {
Node customDataNode = customDataNodeSet.item( i );
// get the location nodes (if any) within this one customdata node
NodeList locationNodeSet = findNodes(customDataNode, "location" );
if (locationNodeSet.getLength() > 0) {
// replace
locationNodeSet.item( 0 ).setTextContent( "http://stackoverflow.com/" );
}
else {
// insert
Element newLocationNode = document.createElement( "location" );
newLocationNode.setTextContent("http://stackoverflow.com/" );
customDataNode.appendChild( newLocationNode );
}
}
And here's the helper method findNodes that does the XPath search.
private NodeList findNodes( Object obj, String xPathString )
throws XPathExpressionException {
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expression = xPath.compile( xPathString );
return (NodeList) expression.evaluate( obj, XPathConstants.NODESET );
}
How about an object oriented approach? You could deserialise the XML to an object, set the location value on the object, then serialise back to XML.
XStream makes this really easy.
For example, you would define the main object, which in your case is CustomData (I'm using public fields to keep the example simple):
public class CustomData {
public String tag1;
public String tag2;
public String location;
public String tag3;
}
Then you initialize XStream:
XStream xstream = new XStream();
// if you need to output the main tag in lowercase, use the following line
xstream.alias("customdata", CustomData.class);
Now you can construct an object from XML, set the location field on the object and regenerate the XML:
CustomData d = (CustomData)xstream.fromXML(xml);
d.location = "http://stackoverflow.com";
xml = xstream.toXML(d);
How does that sound?
If you don't know the schema the XStream solution probably isn't the way to go. At least XStream is on your radar now, might come in handy in the future!
You should be able to do this with query
try
fn:replace(string,pattern,replace)
I am new to xquery myself and I have found it to be a painful query language to work with, but it does work quiet well once you get over the initial learning curve.
I do still wish there was an easier way which was as efficient?