I have an XML file that is valid against an XSD schema. I would like to sort it alphabetically by applying the following criteria (in order of priority):
- by element name
- by attribute names
- by attribute values
Furthermore, I would like the sorted XML file to be valid against the same XSD schema. Is there an existing XML sorting algorithm that would comply to my requirements? If not, what would be the best technical approach to write such an algorithm (eg: use XSLT)?
Based on my previous analysis, I tried to figure out a correct approach for the "sequence", "choice" and "all" elements in the XSD but failed to succeed. I am using dom4j 1.6.1 for my current processing tasks. Looking forward to your suggestions.
How to read data from a XML file in a generic way.generic way means in the sense idf I change the XML file at a later time no impact will be there to the out put format.
It should read the whole content of the XML file perfectly in key value pair.
You should try with a SAX Parser and put the key/value pairs in a Map.
http://docs.oracle.com/javase/7/docs/api/org/xml/sax/helpers/DefaultHandler.html
Using this Handler you can simply parse it from start to end.
See also here for an example.
<item>
<RelatedPersons>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
<RelatedPerson>
<Name>xy</Name>
<Title>asd</Title>
<Address>abc</Address>
</RelatedPerson>
</RelatedPersons>
</item>
I d like to parse this data with a SAXParser. How can i do this?
I know the tutorials about SAX, and i can parsing any normal RSS, but i can't parsing this datas only.
Define your Problem: What you can probably do is create a Value Object(POJO) called Person which has the properties: name, title and address. You aim of parsing this XML would then be to create an ArrayList<Person> object. Defining a definite data structure helps you build logic around it.
Choose a Parser : You can then use a SAX Parser or an XML Pull Parser to browse through the tags: see this lin for a tutorial on DOM, SAX and XML Pull Parser in Android.
Data Population Logic: Then while Parsing, whenever you encounter a <RelatedPersons> tag, instantiate a new Person object. When you encounter the respective Properties tag, read the value and populate it in this object. When you encounter a closing </RelatedPersons> dump this Person Object in the ArrayList. Depending on the Parser you use, you will have to use appropriate methods to browse to the child node/nested nodes.(Refer the link for details)
By the time you are done parsing the last item node you will have all the values in your ArrayList.
Note that this is more of a theoretical answer; I hope it helps.
So, I wish to parse an xml schema and list all the elements along with their annotation and type. I looked at some java possibilities - the closest was XSOM. It seems like driving a truck trailer to get some milk from the neighborhood store.
I looked at JAXB, but there's no parse and list all elements against schemata.
I don't want to validate- only want to list the elements/type/annotation.
Groovy's xmlsurper is a decent parser, but can't parse XSD. Anything you know in Java,Groovy (or python)?
thank you for your time.
The SAX parser is very simple.
I need to implement a Routing Table where there are a number of paramters.
For eg, i am stating five attributes in the incoming message below
Customer Txn Group Txn Type Sender Priority Target
UTI CORP ONEOFF ABC LOW TRG1
UTI GOV ONEOFF ABC LOW TRG2
What is the best way to represent this data in XML so that it can be queried efficiently.
I want to store this data in XML and using Java i would load this up in memory and when a message comes in i want to identify the target based on the attributes.
Appreciate any inputs.
Thanks,
Manglu
Here is a pure XML representation that can be processed very efficiently as is, without the need to be converted into any other internal data structure:
<table>
<record Customer="UTI" Txn-Group="CORP"
Txn-Type="ONEOFF" Sender="ABC1"
Priority="LOW" Target="TRG1"/>
<record Customer="UTI" Txn-Group="Gov"
Txn-Type="ONEOFF" Sender="ABC2"
Priority="LOW" Target="TRG2"/>
</table>
There is an extremely efficient way to query data in this format using the <xsl:key> instruction and the XSLT key() function:
This transformation:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:key name="kRec" match="record"
use="concat(#Customer,'+',#Sender)"/>
<xsl:template match="/">
<xsl:copy-of select="key('kRec', 'UTI+ABC2')"/>
</xsl:template>
</xsl:stylesheet>
when applied on the above XML document produces the desired result:
<record Customer="UTI"
Txn-Group="Gov" Txn-Type="ONEOFF"
Sender="ABC2" Priority="LOW"
Target="TRG2"/>
Do note the following:
There can be multiple <xsl:key>s defined that identify a record using different combinations of values to be concatenated together (whatever will be considered "keys" and/or "primary keys").
If an <xsl:key> is defined to use the concatenation of "primary keys" then a unique record (or no record) will be found when the key() function is evaluated.
If an <xsl:key> is defined to use the concatenation of "non-primary keys", then more than one record may be found when the key() function is evaluated.
The <xsl:key> instruction is the equivalent of defining an index in a database. This makes using the key() function extremely efficient.
In many cases it is not necessary to convert the above XML form to an intermediary data structure, due neither to reasons of understandability nor of efficiency.
If you're loading it into memory, it doesn't really matter what form the XML takes - make it the easiest to read or write by hand, I would suggest. When you load it into memory, then you should transform it into an appropriate data structure. (The exact nature of the data structure would depend on the exact nature of the requirements.)
EDIT: This is to counter the arguments made in comments by Dimitre:
I'm not sure whether you thought I was suggesting that people implement their own hashtable - I certainly wasn't. Just keep a straight hashtable or perhaps a MultiMap for each column which you want to use as a key. Developers know how to use hashtables.
As for the runtime efficiency, which do you think is going to be more efficient:
You build some XSLT (and bear in mind this is foreign territory, at least relatively speaking, for most developers)
XSLT engine parses it. This step may be avoidable if you're using an XSLT library which lets you just parameterise an existing query. Even so, you've got some extra work to do.
XSLT engine hits hashtables (you hope, at least) and returns a node
You convert the node into a more useful data structure
Or:
You look up appropriate entries in your hashtable based on the keys you've been given, getting straight to a useful data structure
I think I'd trust the second one, personally. Using XSLT here feels like using a screwdriver to bash in a nail...
That depends on what is repeating and what could be empty. XML is not known for its efficient queryability, as it is neither fixed-length nor compact.
I agree with the previous two posters - you should definitely not keep the internal representation of this data in XML when querying as messages come in.
The XML representation can be anything, you could do something like this:
<routes>
<route customer="UTI" txn-group="CORP" txn-type="ONEOFF" .../>
...
</routes>
My internal representation would depend on the format of the message coming in, and the language. A simple representation would be a map, mapping a structure of data (i.e. the key fields from which the routing decision is made) to the info on the target route.
Depending on your performance requirements, you could keep the key/target information as strings, though in any high performing system you'd probably want to do a straight memory comparison (in C/C++) or some form integer comparison.
Yeah, your basic problem is that you're using "XML" and "efficient" in the same sentence.
Edit: No, seriously, yer killin' me. The fact that several people in this thread are using "highly efficient" to describe anything to do with operations on a data format that require string parsing just to find out where your fields are shows that several people in this thread do not even know what the word "efficient" means. Downvote me as much as you like for saying it. I can take it, coach.