multiple html as output from 1 xsl with java - java

I want to know how can I generate multiple output (html) from one xml using java and xsl.
For example, having this xml:
<ARTICLE>
<SECT>
<PARA>The First 1st Major Section</PARA>
</SECT>
<SECT>
<PARA>The Second 2nd Major Section</PARA>
</SECT>
</ARTICLE>
For each child element "SECT" from "ARTICLE" I would like to have one ".html" as an output, example of the output:
sect1.html
<html>
<body>
<div>
<h1>The First 1st Major Section</h1>
</div>
</body>
</html>
sect2.html
<html>
<body>
<div>
<h1>The First 2nd Major Section</h1>
</div>
</body>
</html>
I've been working in java to transform the .xml document with the next code:
File stylesheet = new File(argv[0]);
File datafile = new File(argv[1]);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(datafile);
// Use a Transformer for output
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
OutputStream result=new FileOutputStream("sections.html");
transformer.transform(source, new StreamResult(result));
The problem is that I have only one output, Could you help me to write the .xslt document please? and tell me how to get more than 1 output?

To create more than one result document, you need an XSLT Processor which supports multiple result documents. The feature of multiple result documents was introduced in XSLT 2.0. Some XSLT Processors which do not yet implement XSLT 2.0 or newer feature multiple result documents as a proprietary extension.
Creating multiple result documents is, unlike the primary result document, not controlled directly from the Java source code. Instead, the XSLT code needs to contain the XSLT elements that create the multiple result documents.
In XSLT 2.0 and newer, the <xsl:result-document/> element is used to create multiple result documents. See XSLT 2.0, <xsl:result-document/> for more information and examples.
As far as I am aware, the XSLT Processor shipped with Java is Xalan-J, and Xalan-J does not yet support XSLT 2.0 or newer (according to their website http://xml.apache.org/xalan-j/). You might want to use Saxon instead, which supports XSLT 3.0. Or as described in this previous question Xalan XSLT multiple output files? you could use the Redirect extension.

Related

Problem with XSLT transformation from Mavan to create multiple documents with result-document

I want to transform and split a XML document. So i use "result-document" and it works. But when I try to start the XSLT with mavan, i get a output xml document just with the xml declaration.
XSL:
<xsl:result-document method="xml" href="{$filename}_{$Number}.html">
<html>
<head>
<style>
body {
font-family: "Times New Roman", Times, serif;
font-size: 17pt;
line-height: 19pt;
}
</style>
</head>
<body>
<xsl:apply-templates/>
</body>
</html>
</xsl:result-document>
JAVA:
TransformerFactory factory = TransformerFactory.newInstance();
InputStream inputStream= accessFile(xslpath);
TransformerFactoryImpl f = new net.sf.saxon.TransformerFactoryImpl();
f.setAttribute("http://saxon.sf.net/feature/version-warning", Boolean.FALSE);
f.setAttribute("http://saxon.sf.net/feature/linenumbering", Boolean.TRUE);
StreamSource schemaSource = new StreamSource(inputStream);
Transformer t = f.newTransformer(schemaSource);
StreamSource src = new StreamSource(new FileInputStream(inputpath));
StreamResult res = new StreamResult(new ByteArrayOutputStream());
t.transform(src, res);
String a= res.getOutputStream().toString();
What is wrong?
Thanks in advance
As a wild guess, I assume that you are getting an error with the relative URIs you have in xsl:result-document method="xml" href="{$filename}_{$Number}.html"; as you transform to a StreamResult over a ByteArrayOutputStream I think the processor does not have an absolute base output URI against which to resolve the relative URI constructed in the href attribute.
Assuming the XSLT 2 processor is some version of Saxon 9 or 10, it might depend on the exact version on how to set the base output URI when using the JAXP Transformer API; I think with Saxon 10 you can use e.h. ((TransformerImpl)t).getUnderlyingXsltTransformer().setBaseOutputURI("file:///dir/subfolder/subsubfolder/"); to write to a certain folder.
All such things are easier and more straightforward if you change to Saxon's own API, the s9api (http://saxonica.com/html/documentation/using-xsl/embedding/s9api-transformation.html) where you deal with Processor, XsltCompiler, XsltExecutable and XsltTransformer or Xslt30Transformer.

Edit HTML Document with Java

I have an HTML document stored in memory (set on a Flying Saucer XHTMLPanel) in my java application.
xhtmlPanel.setDocument(Main.class.getResource("/mailtemplate/DefaultMail.html").toString());
html file below;
<html>
<head>
</head>
<body>
<p id="first"></p>
<p id="second"></p>
</body>
</html>
I want to set the contents of the p elements. I don't want to set a schema for it to use getDocumentById(), so what alternatives do I have?
XHTML is XML, so any XML parser would be my recommendataion. I maintain the JDOM library, so would naturally recommend using that, but other libraries, including the embedded DOM model in Java will work. I would use something like:
Document doc = new SAXBuilder().build(Main.class.getResource("/mailtemplate/DefaultMail.html"));
// XPath that finds the `p` element with id="first"
XPathExpression<Element> xpe = XPathFactory.instance().compile(
"//p[#id='first']", Filters.element());
Element p = xpe.evaluateFirst(doc);
p.setText("This is my text");
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(doc, System.out);
Produces the following:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head />
<body>
<p id="first">This is my text</p>
<p id="second" />
</body>
</html>
use a fine graded Html parser and manipulation library like jsoup. You can easily create a Document by passing the html to jsoup.parse(String htmlContent) function. This library allows all of the DOM manupulation function including CSS or jquery-like selector syntax. doc.selct(String selector), where doc is an instance of Document.
For example you can select the first p using doc.select("p").first(). A minimal working solution would be:
Document doc = jsoup.parse(htmlContent);
Element p = doc.select("p").first();
p.text("My Example Text");
Reference:
Use selector-syntax to find elements

java dom xml parser get html tags(<p color="something">some text</p>) from xml

I have an xml file with html tags like:
<?xml version="1.0" encoding="utf-8" ?>
<blog>
<blogid>49</blogid>
<title>[FIXED] Job requests page broken</title>
<fulltext>
<img title="page broken" src="images/west/blog/site-broken.jpg" alt="page broken" />
<p><span style="background-color: #ccffcc;">Update 28/05/2011</span>: Job requests page seems to be working OK now. If you find any issues please use the contact page to notify us. Thank you for your patience!</p>
<p>Â </p>
<p>Well, what can I say? Why does it always have to be that way? You are trying to create something new and something else gets broken on the way...</p>
</fulltext>
Now I want the whole html part between tag as it is.
What I get right now is blank as I think dom is parsing html tags as well.
I tried xpath but it is not working with android.
I don't think you can get this not well-formed XML into a DOM as-is. (EDIT: or is it well-formed?)
You would need to a) either escape the characters - making the XML well-formed and parseable (but probably not into a DOM you want, I guess you want to display the HTML in a different system) or b) parse it using a stream processor or c) fix it using string manipulation (add <[[CDATA .. ]]>) and then parse it into a DOM.
HTH
HTML is a sub-language of XML (without getting into details related to XHTML). Therefore, there is no reason for the DOM parser not to treat those inner tags as XML tags.
Maybe what you're looking for is a way to flatten what's inside <fulltext>?
use a library like Jsoup for this purpose.
public static void main(String args[]){
String html = "<?xml version="1.0"?><foo>" +
"<bar>Some text — invalid!</bar></foo>";
Document doc = Jsoup.parse(html, "", Parser.xmlParser());
for (Element e : doc.select("bar")) {
System.out.println(e);
}
}

How to put String text without converting content to xml file in Java?

I need to put String content to xml in Java. I use this kind of code to insert information in xml:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(new File ("file.xml"));
DOMSource source = new DOMSource (doc);
Node cards = doc.getElementsByTagName ("cards").item (0);
Element card = doc.createElement ("card");
cards.appendChild(card);
Element question = doc.createElement("question");
question.appendChild(doc.createTextNode("This <b>is</b> a test.");
card.appendChild (question);
StreamResult result = new StreamResult (new File (file));
Transformer tf = TransformerFactory.newInstance().newTransformer();
tf.setOutputProperty(OutputKeys.INDENT, "yes");
tf.transform(source, result);
But string is converted in xml like this:
<cards>
<card>
<question>This <b>is</b> a test.</question>
</card>
</cards>
It should be like this:
<cards>
<card>
<question>This <b>is</b> a test.</question>
</card>
</cards>
I tried to use CDDATA method but it puts code like this:
// I changed this code
question.appendChild(doc.createTextNode("This <b>is</b> a test.");
// to this
question.appendChild(doc.createCDATASection("This <b>is</b> a test.");
This code gets a xml file look like:
<cards>
<card>
<question><![CDATA[This <b>is</b> a test.]]></question>
</card>
</cards>
I hope that somebody can help me to put String content in the xml file exactly with same content.
Thanks in advance!
This would be expected behaviour.
Consider if the brackets were kept as you put them, the end result would essentially be:
<cards>
<card>
<question>
This
<b>
is
</b>
a test.
</question>
</card>
</cards>
Basically, it would result in the <b> being an additional node in the xml tree. Encoding the brackets to < and > ensures that when displayed by any xml parser, the brackets will be displayed, and not confused as being an additional node.
If you really wanted them to display as you say you do, you will need to create elements named b. This will not only be awkward, it will also not display quite as you've written above - it would display as additional nested nodes as I've shown above. So you would need to amend the xml writer to output inline for those tags.
Nasty.
Check this solution: how to unescape XML in java
Maybe you could solve it in this way (code only for <question> tag part):
Element question = doc.createElement("question");
question.appendChild(doc.createTextNode("This ");
Element b = doc.createElement("b");
b.appendChild(doc.createTextNode("is");
question.appendChild(b);
question.appendChild(doc.createTextNode(" a test.");
card.appendChild(question);
What you are effectively trying to do is to insert XML into the middle of a DOM without parsing it. You can't do this since the DOM APIs don't support it.
You have three choices:
You could serialize the DOM and then insert the String at the appropriate point. The end result may or may not be well-formed XML ... depending on what is in the String that you inserted.
You could create and insert DOM nodes representing the text and the <b>...</b> element. This requires you to know the XML structure of the stuff that you are inserting. #bluish's answer gives an example.
You could wrap the String in some container element, parse it using an XML parser to give a second DOM, find the nodes of interest, and add them to the original DOM. This requires that the String is well-formed XML when wrapped in the container element.
Or, since you're already using a Transformation, why not go all the way?
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="cards">
<card>
<question>This <b>is</b> a test</question>
</card>
</xsl:template>
</xsl:stylesheet>

I need to parse non well-formed xml data (HTML)

I have some non well-formed xml (HTML) data in JAVA, I used JAXP Dom, but It complains.
The Question is :Is there any way to
use JAXP to parse such documents ??
I have a file containing data such as :
<employee>
<name value="ahmed" > <!-- note, this element is not closed, So it is not well-formed xml-->
</employee>
You could try running your document through the jtidy API first - that has the ability to convert html into valid xhtml: http://jtidy.sourceforge.net/howto.html
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parse(......)...
You could use TagSoup. I have used it with great success. It is completely compatible with the Java XML APIs, including SAX, DOM, XSLT, and StAX. For example, here is how I used it to apply XSLT transforms to particularly poor HTML:
public static void transform(InputStream style, InputStream data)
throws SAXException, TransformerException {
XMLReader reader =
XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
Source input = new SAXSource(reader, new InputSource(data));
Source xsl = new StreamSource(style);
Transformer transformer =
TransformerFactory.newInstance().newTransformer(xsl);
transformer.transform(input, new StreamResult(System.out));
}
Not really. JAXP wants well-formed markup. Have you considered the Cyberneko HTML Parser? We've been very successful with it at our shop.
EDIT: I see you are wanting to parse XML too. Hrmm.... Cyberneko works well for HTML but I don't know about others. It has a tag balancer that would close some tags off, but I don't know if you can train it to recognize tags that are not HTML.

Categories