Sort XML alphabetically while mantaining XSD schema validity - java

I have an XML file that is valid against an XSD schema. I would like to sort it alphabetically by applying the following criteria (in order of priority):
- by element name
- by attribute names
- by attribute values
Furthermore, I would like the sorted XML file to be valid against the same XSD schema. Is there an existing XML sorting algorithm that would comply to my requirements? If not, what would be the best technical approach to write such an algorithm (eg: use XSLT)?
Based on my previous analysis, I tried to figure out a correct approach for the "sequence", "choice" and "all" elements in the XSD but failed to succeed. I am using dom4j 1.6.1 for my current processing tasks. Looking forward to your suggestions.

Related

renaming jcr nodes customly (in CQ/AEM)

Authors make some comments once a month.
It is stored in "content" in jcr under node "remarks". each comment
is stored in a child node which is named as"remarks_xxxx" where
xxxx are random alphabets and numbers.
I need to rename all the current nodes to "remarks_mmddyy"
and also assign future names in a similar fashion.
Thanks
The best approach is to write the date of the remark into a property (of type Date) instead of writing it into the node name. This will eliminate the need to rename nodes and also improve your chances to leverage jcr queries to your advantage.
In order to retrieve remarks for a certain date and time use the jcr query api, which allows to search for properties (including Date format of course). Since AEM 6 and jackrabbit oak, you can define a custom index to make sure that a given property query is blazing fast in terms of performance. Note that "order by" is supported as well, in case that ordering is an issue.
In case that you absolutely must stick with the detrimental data model of renaming nodes and sticking dates into node-names, check out the following article how to do it: How can you change the name of a JCR node?

Build in library's to perform effective searching on 100GB files

Is there any build-in library in Java for searching strings in large files of about 100GB in java. I am currently using binary-search but it is not that efficient.
As far as I know Java does not contain any file search engine, with or without an index. There is a very good reason for that too: search engine implementations are intrinsically tied to both the input data set and the search pattern format. A minor variation in either could result in massive changes in the search engine.
For us to be able to provide a more concrete answer you need to:
Describe exactly the data set: the number, path structure and average size of files, the format of each entry and the format of each contained token.
Describe exactly your search patterns: are those fixed strings, glob patterns or, say, regular expressions? Do you expect the pattern to match a full line or a specific token in each line?
Describe exactly your desired search results: do you want exact or approximate matches? Do you want to get a position in a file, or extract specific tokens?
Describe exactly your requirements: are you able to build an index beforehand? Is the data set expected to be modified in real time?
Explain why can't you use third party libraries such as Lucene that are designed exactly for this kind of work.
Explain why your current binary search, which should have a complexity of O(logn) is not efficient enough. The only thing that might be be faster, with a constant complexity would involve the use of a hash table.
It might be best if you described your problem in broader terms. For example, one might assume from your sample data set that what you have is a set of words and associated offset or document identifier lists. A simple method to approach searching in such a set would be to store an word/file-position index in a hash table to be able to access each associated list in constant time.
If u doesn't want to use the tools built for search, then store the data in DB and use sql.

Inserting element in XML under a sequence in an order specified by schema

We have a large workflow of programs that parse data into XML files. We have about 14 schemes each having a different root and is made up of about 60 XSD files. Some of the schemes share similar elements but the schemes are currently being modified on weekly basis.
I have a stage (written in Java) that accepts an XML file (which might correspond to any of the 14 schemes) and reads a list of tuples of (xpaths, message) and for each xpath a flag element is inserted under the element defined by the xpath that contains the message.
<default:flag issueDateTime="2012-01-10T21:00:09" recipient="lablabla" resolvedIndicator="false" sender="SS" xmlns:default="default">
<default:flagSubject/>
<default:message>
<default:p>This element should be non empty</default:p>
</default:message>
</default:flag>
My current approach was to insert the flag element as the last child of node referenced by the xpath, that has been casing an issue. In some schemes the referenced node does accept the flag element under a sequence in an order defined by the xsd (could be middle, first or last) so adding it as last element renders the xml invalid when the element already has subelements from that sequence.
My question is, how to append an element under a sequence in a way that respects XSD defined order?
I'm currently doing this
Element flag = rawXmlDoc.createElementNS("default", "default:flag");
xpath = factory.newXPath();
xpath.setNamespaceContext(nsContext);
XPathExpression expr = xpath.compile(xpathText);
Element refNode = (Element)expr.evaluate(rawXmlDoc, XPathConstants.NODE);
if (refNode.getNodeType()==Node.ELEMENT_NODE)
refNode.appendChild(flag);
else
refNode.getParentNode().appendChild(flag);
I am hoping to get an answer using the standard DOM interface without relying on MOXy.
Why don't you use JAXB? When you use JAXB along with XJC, you can
generate Java classes from your XSD files
use JAXB to unmarshal the XML into Java objects
manipulate the XML with Java
use JAXB again to unmarshal the Java objects into valid XML
That's using standard Java XML API's but not directly DOM. However, JAXB is contained in every JDK, so you have no additional dependencies

How to compare document objects in java with .xsd files?

I am trying to compare Document objects to understand if they are well formed or not. So to do that, I made a research about it and heard that xsd files are used to make this comparison. Can you please give me some basci examples to compare document with using xsd objcets ?
For example what do I have to write into xsd file and how I can compare it with a Document object ?
Thank you all
You don't need an XSD schema to determine if a document is well-formed. You only need it to determine if the document is valid against the schema.
I'm not sure what you mean by "comparing XML documents". What are you comparing them with?

parsing an xml schema to read/list all elements

So, I wish to parse an xml schema and list all the elements along with their annotation and type. I looked at some java possibilities - the closest was XSOM. It seems like driving a truck trailer to get some milk from the neighborhood store.
I looked at JAXB, but there's no parse and list all elements against schemata.
I don't want to validate- only want to list the elements/type/annotation.
Groovy's xmlsurper is a decent parser, but can't parse XSD. Anything you know in Java,Groovy (or python)?
thank you for your time.
The SAX parser is very simple.

Categories