How to invoke XML <?xml-stylesheet ...?> directives in Java? - java

I have an XML file which references an associated XSL file, like this:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="my-transform.xsl"?>
<my-root> .....
and I want to read it in as a org.w3c.dom.Document, applying the transform.
I'm considering reading it in, extracting the stylesheet processing-instruction using XPATH /processing-instruction('xml-stylesheet') and then loading the XSL file by hand and applying it with a Transformer.
But it seems odd that I need to do this manually - is there a neat way to read the file and apply the embedded transform automatically?
UPDATE: thanks to #raphaëλ for observing that TransformerFactory.getAssociatedStylesheet(...) will identify the xml-stylesheet value as a Source, which is pretty close. Is there anything more automatic than that?

Ok, nobody else answered, and I know the answer now. Stylesheets are not applied automatically. But you can get hold of the stylesheet using TransformerFactory.getAssociatedStylesheet(...), which will identify the xml-stylesheet value as a Source. You can then apply it manually.
Thanks to raphaëλ for pointing this out.

Related

While reading and rewriting XML in Java, is there a systematic way of preserving the processing instruction?

I want to update the XML but preserve the original processing instruction, most of the time it's just:
<?xml version="1.0" encoding="UTF-8"?>
However I can't find a way to extract the line from com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl.JAXPSAXParser(and other XML reader) or how to automatically carry it to the write. Is there any other way than manually read the line, keep it then write it first before flushing the new XML ?
It's proper name is an XML declaration; it looks like a processing instruction but technically it isn't one.
Parsing invariably involves decoding the file (that is, converting the octets into characters); once that has been done, the theory goes, the application doesn't need to know how they were originally encoded. Similarly, when serializing the file, the application has to tell the serializer what encoding to use, and the serializer then takes responsibility for writing an XML declaration that reflects that encoding.
Allowing the application control over the XML declaration would break proper architectural layering, and would create the possibility of writing an XML declaration that is wrong. This bit of the content belongs to the parser layer, not to the application layer.
Of course in practice it's possible to get an XML declaration that doesn't match the actual encoding anyway, because there's nothing to stop you writing an XML declaration using software that knows nothing about XML. People do that, and they create broken content, and then they ask us on StackOverflow how to fix it. I'm not going to encourage you down that route.

Parsing an a false xml using jaxb

I have a situation where the xml(But its not really a xml data, instead a tag based custom data format) is send from a third party server(Because of that I cant change the format and coordinating with the third party is pretty difficult. The markup looks like as follows
<?xml version="1.0" encoding="UTF-8"?>
<result>SUCCESS</result>
<req>
<?xml version="1.0" encoding="UTF-8"?>
<Secure>
<Message id="dfgdfdkjfghldkjfgh88934589345">
<VEReq>
<version>1.0.2</version><pan>3453243453453</pan>
<Merchant><acqBIN>433274</acqBIN>
<merID>3453453245</merID>
<password>342534534</password>
</Merchant>
<Browser></Browser>
</VEReq>
</Message>
</Secure>
</req>
<id>1906547421350020</id>
<trackid>f68fb35c-cbc2-468b-aaf8-7b3f399b709d</trackid>
<ci>6</ci>
Now here I want only result, req, id, trackid and ci tags value as the parse output. Means after parsing I need req to contain all contents inside tags. One more point here is the req tag is embedd with another xml as it is not as a CDATA. I cant parse it using JAXB.
Can somebody have library that can parse all the content if I can configure the avialable tags in a file, or any other way. I really dont want to convert them to an object, even a hashmap with tag as a key and content as value is also fine. But I prefer the POJO model(Generating a class from this kind of xml).
Let me know if somebody can help me.
Make it well-formed XML first and the pass to whatever tool you find suitable. JAXB is not bad as it will ignore elements it does not know (apart from the root element).
And since most (if not all) tools expect well-formed XML anyway, you'll have to take care of turning your "false" XML into "true" XML first. I'd first try something like JTidy or JSoup ans see if they help to make your non-well-formed XML well-formed.
If it does not work I'd try to hack it on the lower-level SAX or StAX parsing. The XML you posted seems to suffer from two problems: no single root element and XML declaration in the body. I think both problems can be addressed with some minimal parser hacking.
And I think there is a special place in hell for people who invent this type non-wellformed XML. Damned to sit there and correct all the HTML documents on the Internet into valid XHTML by hand.

How to deal with [xX][mM][lL] in java

So I've got a program that is reading in large XML files, which contain multiple entries of data. So the database I'm using it for originally contained 40,000 separate entries written in XML file, but you can download one XML file that contains all the entries. However, because of this, the XML declaration element:-
<?xml version="1.0" encoding="UTF-8"?>
is called multiple times throughout the document, and I was wondering whether there was some way of dealing with this through the use of StAX parser.
Edit: should of said that I can't properly parse through my document and read everything as it keeps returning the error:-
Exception in thread "main" javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1062,6]
Message: The processing instruction target matching "[xX][mM][lL]" is not allowed.
because of the fact that the xml declaration is stated multiple times.
Thanks
Until you eliminate the spurious <?xml ?> declaration(s), you cannot treat the file as XML because it is not well-formed. First treat it as text, either manually or programmatically, to eliminate the extra XML declarations before trying to parse it as XML.
For general information on all the ways the
The processing instruction target matching "[xX][mM][lL]" is not
allowed.
error arises and remedies for addressing each way, see this answer (as suggested by Stefan).
This line is called the XML prolog:
<?xml version="1.0" encoding="UTF-8"?>
The XML prolog is optional. If it exists, it must come first in the document.
It should not repeated anywhere else in the document.
Source : XMLProlog-W3Scools

Transforming XML with Java

I was learning how to convert an XML file into a HTML using just Java, then later I decided to learn how to use the XSLT language to do the same.
By saying just java, I mean, using just the syntax of the Java language, that is, not XSLT language.
To clarify:
Loading XML into a DOM (using a DocumentBuilder).
Parsing it (just doing things like doc.getFirstChild()).
Writing it to a HTML file (just using a character stream, not a XML serialization).
What happened?
After I include the following line in my XML:
<?xml-stylesheet type="text/xsl" href="mystylesheet.xsl"?>
My Java application couldn't write the HTML right...
If I remove that, everything is right, but I want to keep it.
Any ideas how to ignore this "instruction"?
XSLT will ignore processing instructions (that is, remove them) by default. If you want to retain this one, just add a template rule to do so:
<xsl:template match="processing-instruction('xml-stylesheet')">
<xsl:copy/>
</xsl:template>
This assumes that your stylesheet is written in the classic recursive-descent style using apply-templates; if you're self-taught in XSLT then you might not have yet learnt this style. As always, it's much easier to help people when they show us the code.
It depends on how you are reading the XML from your Java application. But if your XML has an embedded Processing Instruction like
<?xml-stylesheet type="text/xsl" href="mystylesheet.xsl"?>
then it means that the stylesheet is an integral part of the data, and must be applied to the XML for it to be of any use. This is very similar to a CSS stylesheet processing instruction like, for example
<?xml-stylesheet type="text/css" href="standard.css"?>
which, in the same way, is an integral part of the XHTML, just as if it was an internal style within <style> tags.
It is clearly possible to read and use the XML without applying the stylesheet, but that is to ignore the directive of the data itself.
If you want to treat the XML as raw data and apply an optional transform to it in different ways then you must omit the processing instruction from the XML.
Sorry guys, I thought that the XML with the stylesheet.xsl was being "transformed" in the DOM object that I was using to parse the XML.
I made assumptions that:
The XML was being transformed before being put in the DOM.
The <?xml-stylesheet type="text/xsl" href="mystylesheet.xsl"?> was invisble in the DOM.
Basically I had a simple XML to start learning how to do the transformation. Something like the following:
<?xml version="1.0" encoding="UTF-8"?>
<items><item>...</item></items>
For simplicity (I was learning...) I decided to start my parsing with:
parse(doc.getFirstChild().getFirstChild()); //Expecting the first "item".
But after introducing the stylesheet to the XML the document became:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="mystylesheet.xsl"?>
<items><item>...</item></items>
Because of this introduction the doc.getFirstChild().getFirstChild() was not being a "item" anymore.
Then I just realize that I forgot to skip the node with this instruction (I really thought that it was "invisible" in the DOM tree).
Learning guys, learning...
P.S. That was my first attempt to transform a XML with XSLT!
Thank you for your help.

How to generate XSD from elements of XML

I have a XML input
<field>
<name>id</name>
<dataType>string</dataType>
<maxlength>42</maxlength>
<required>false</required>
</field>
I am looking for a library or a tool which will take an XML instance document and output a corresponding XSD schema.
I am looking for some java library with which I can generate a XSD for the above XML structure
If all you want is an XSD so that the XML you gave conforms to it, you'd be much better off by crafting it yourself rather than using a tool.
No one knows better than you the particularities of the schema, such as which valid values are there (for instance, is the <maxlength> element required? are true and false the only valid values for <required>?).
If you really want to use a tool (I'd only advice using it if you haven't designed the XML and really can't get the real XSD - or if you designed it, double check the generated XSD), you could try Trang. It can infer an XSD Schema from a number of example XML's.
You'll have to take into account that the XSD a tool can infer you might be incomplete or inaccurate if XML samples aren't representative enough.
java -jar trang.jar sampleXML.xml inferredXSD.xsd
You can find a usage example of Trang here.
You can try with online tool called XMLGrid: http://xmlgrid.net/xml2xsd.html
You could write an XSLT to do something like that. But the problem is, a single document alone is not enough information to generate a schema. Are any of those elements optional? Is there anything missing from that document, that might appear in other instances? How many of a particular element can there be? Do they have to be in that order? There are loads of things that can be expressed in a schema, that are not immediately obvious from one instance of a document that conforms to that schema.
For the people who really want to include it in their Java code to generate an XSD and understand the perils, check out Generate XSD from XML programatically in Java
Try xmlbeans it has some tools one of them is ins2xsd you can find specifics here:
http://xmlbeans.apache.org/docs/2.0.0/guide/tools.html
Good luck

Categories