Sum values in HTML page using XQuery / XPath 2.0

Sum values in HTML page using XQuery / XPath 2.0 - java

I have an HTML page like:
<html>
<head><title>Hello</title></head>
<body>
<div id="foo">
<h6>9</h6>
<h6>3</h6>
<h6>5</h6>
</div>
</body>
</html>
I'd like to use XQuery (or xpath 2.0) to sum the values in the <h6> elements. I'm using xmlbeans (with saxon as the engine) and I tried the following which just gives me a null pointer exception;
XmlObject xml = XmlObject.Factory.parse(xmlFile);
XmlCursor htmlCursor = xml.newCursor();
XmlCursor result = htmlCursor.execQuery("sum(for $val in $this//h6 return number($val))");
System.out.println(result.getObject());
Any ideas?

Use the XPath Sum Function:
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xmlFile);
XPath path = XPathFactory.newInstance().newXPath();
System.out.println(path.evaluate("sum(//h6)", doc));
prints:
17

I'm guessing here but the $this in your query looks a bit odd. In standard XQuery there is no variable in scope called $this. I assume you want the context item, so your query would look like:
sum(for $val in .//h6 return number($val))
or more simply:
sum(.//h6/number(.))
or just:
sum(.//h6)
Omitting the dot would mean that the XPath starts at the root of the document, not at the context item.

Related

multiple html as output from 1 xsl with java

I want to know how can I generate multiple output (html) from one xml using java and xsl.
For example, having this xml:
<ARTICLE>
<SECT>
<PARA>The First 1st Major Section</PARA>
</SECT>
<SECT>
<PARA>The Second 2nd Major Section</PARA>
</SECT>
</ARTICLE>
For each child element "SECT" from "ARTICLE" I would like to have one ".html" as an output, example of the output:
sect1.html
<html>
<body>
<div>
<h1>The First 1st Major Section</h1>
</div>
</body>
</html>
sect2.html
<html>
<body>
<div>
<h1>The First 2nd Major Section</h1>
</div>
</body>
</html>
I've been working in java to transform the .xml document with the next code:
File stylesheet = new File(argv[0]);
File datafile = new File(argv[1]);
DocumentBuilder builder = factory.newDocumentBuilder();
document = builder.parse(datafile);
// Use a Transformer for output
TransformerFactory tFactory = TransformerFactory.newInstance();
StreamSource stylesource = new StreamSource(stylesheet);
Transformer transformer = tFactory.newTransformer(stylesource);
DOMSource source = new DOMSource(document);
OutputStream result=new FileOutputStream("sections.html");
transformer.transform(source, new StreamResult(result));
The problem is that I have only one output, Could you help me to write the .xslt document please? and tell me how to get more than 1 output?

To create more than one result document, you need an XSLT Processor which supports multiple result documents. The feature of multiple result documents was introduced in XSLT 2.0. Some XSLT Processors which do not yet implement XSLT 2.0 or newer feature multiple result documents as a proprietary extension.
Creating multiple result documents is, unlike the primary result document, not controlled directly from the Java source code. Instead, the XSLT code needs to contain the XSLT elements that create the multiple result documents.
In XSLT 2.0 and newer, the <xsl:result-document/> element is used to create multiple result documents. See XSLT 2.0, <xsl:result-document/> for more information and examples.
As far as I am aware, the XSLT Processor shipped with Java is Xalan-J, and Xalan-J does not yet support XSLT 2.0 or newer (according to their website http://xml.apache.org/xalan-j/). You might want to use Saxon instead, which supports XSLT 3.0. Or as described in this previous question Xalan XSLT multiple output files? you could use the Redirect extension.

Reading XML tag from MediaWiki using Java

I need to read output of 'search' tag from following url usign Java.
First I need to read XML into some string from following URL:
http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother
I should end up having this:
<api>
<query-continue>
<search sroffset="1"/>
</query-continue>
<query>
<searchinfo totalhits="55180"/>
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
</query>
</api>
Then once I have the XML, I need to get content of the search tag:
Output of 'search' tag looks like this and I need to get two parts from the code in the middle:
<search>
<p ns="0" title="Big Brothers Big Sisters of America" snippet="<span class='searchmatch'>Big</span> <span class='searchmatch'>Brothers</span> <span class='searchmatch'>Big</span> Sisters of America is a 501(c)(3) non-profit organization whose goal is to help all children reach their potential through <b>...</b> " size="13008" wordcount="1906" timestamp="2014-04-15T06:46:01Z"/>
</search>
At the end, all I need is to have two strings, which would equal to this:
String title = Big Brothers Big Sisters of America
String snippet = "<span class='searchmatch'>Big..."
Can someone please help me amending this code, I am not sure what I am doing wrong. I don't think it's even retrieving XML from url, much less the tags inside the XML.
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("http://en.wikipedia.org/w/api.php?format=xml&action=query&list=search&srlimit=1&srsearch=big+brother");
doc.getDocumentElement().normalize();
XPathFactory xFactory = XPathFactory.newInstance();
XPath xpath = xFactory.newXPath();
XPathExpression expr = xpath.compile("//query/search/text()");
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;
for (int i=0; i<nodes.getLength();i++){
System.out.println(nodes.item(i).getNodeValue());
}
Sorry, I am a newbie and can't find the answer to this anywhere.

The main problem here is that you're asking for text nodes that are children of <search>, but in fact the <p ..> that you want is not a text node: it's an element. (In fact, the <search> element has no text node children, as you can tell when you view the response from that URL using "View Source".)
So what you want to do is change your XPath expression to
//query/search/p
which will give you the p element node. Then ask for the value of this node's two attributes title and snippet in your Java code:
Element e = (Element)(nodes.item(i));
String title = e.getAttribute("title");
String snippet = e.getAttribute("snippet");
Or, you could do two XPath queries, one for each attribute:
//query/search/p/#title
and
//query/search/p/#snippet
assuming there will only be one <p> element. If you were doing this over multiple <p> elements, you'd probably want to keep each pair of attributes together instead of having two separate lists of results.

Edit HTML Document with Java

I have an HTML document stored in memory (set on a Flying Saucer XHTMLPanel) in my java application.
xhtmlPanel.setDocument(Main.class.getResource("/mailtemplate/DefaultMail.html").toString());
html file below;
<html>
<head>
</head>
<body>
<p id="first"></p>
<p id="second"></p>
</body>
</html>
I want to set the contents of the p elements. I don't want to set a schema for it to use getDocumentById(), so what alternatives do I have?

XHTML is XML, so any XML parser would be my recommendataion. I maintain the JDOM library, so would naturally recommend using that, but other libraries, including the embedded DOM model in Java will work. I would use something like:
Document doc = new SAXBuilder().build(Main.class.getResource("/mailtemplate/DefaultMail.html"));
// XPath that finds the `p` element with id="first"
XPathExpression<Element> xpe = XPathFactory.instance().compile(
"//p[#id='first']", Filters.element());
Element p = xpe.evaluateFirst(doc);
p.setText("This is my text");
XMLOutputter xout = new XMLOutputter(Format.getPrettyFormat());
xout.output(doc, System.out);
Produces the following:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<head />
<body>
<p id="first">This is my text</p>
<p id="second" />
</body>
</html>

use a fine graded Html parser and manipulation library like jsoup. You can easily create a Document by passing the html to jsoup.parse(String htmlContent) function. This library allows all of the DOM manupulation function including CSS or jquery-like selector syntax. doc.selct(String selector), where doc is an instance of Document.
For example you can select the first p using doc.select("p").first(). A minimal working solution would be:
Document doc = jsoup.parse(htmlContent);
Element p = doc.select("p").first();
p.text("My Example Text");
Reference:
Use selector-syntax to find elements

Java equivalent of System.Xml.XmlNode.InnerXml

Is there a java equivalent of .NET's System.Xml.XmlNode.InnerXml?
I need to replace some words in an XML document.
I cannot use Java's org.w3c.dom.Node.setTextContent() because this removes the XML nodes.
Thanks!
Source:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a Home Owners Agreement is that...</p>
</body>
Desired output:
<body>
<title>Home Owners Agreement</title>
<p>The <b>good</b> thing about a HOA is that...</p>
</body>
I only want text in <p> tags to be replaced. I tried the following:
replaceText(string term, string replaceWith, org.w3c.dom.Node p){
p.setTextContent(p.getTextContent().replace(term, replaceWith));
}
The problem with the above code is that all the child nodes of p get lost.

You could have a look at jdom.
Something like document.getRootElement().getChild("ELEMENT1").setText("replacement text");
You have a little bit of work to do in converting the document into a JDOM document, but there are adapters that make that fairly easy for you. Or, if the XML is in a file, you can just use a JDOM Builder class to create the DOM that you want to manipulate.
`

Okay, I figured out the solution.
The key to this is that you don't want to replace the text of the actual node. There is a actually a child representation of just the text. I was able to accomplish what I needed with this code:
private static void replace(Node root){
if (root.getNodeType() == root.TEXT_NODE){
root.setTextContent(root.getTextContent().replace("Home Owners Agreement", "HMO"));
}
for (int i = 0; i < root.getChildNodes().getLength(); i++){
outputTextOfNode(root.getChildNodes().item(i));
}
}

fn: functions in XPathExpressions cause Exceptions in JDK1.6

I would like to select the value of the first "href" attribute from documents like this using XPath:
<div>
<a href="#a">
<span>foo</span>
</a>
<a href="#b">
<span>bar</span>
</a>
<a href="#c">
<span>baz</span>
</a>
</div>
However, I am only interested in those a elements that govern spans with text content "bar" or "baz". I was hoping that I could achieve that with the following Java code:
Document document = getDocument(); // returns non-null Document
XPath xpath = XPathFactory.newInstance().newXPath();
String href = xpath.evaluate("//a[fn:matches(span, '^ba.$')]/attribute::href", document);
but whenever I'm using one of the fn: functions in an XPathExpression, I get
javax.xml.transform.TransformerException: Unknown error in XPath.
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:363)
at com.sun.org.apache.xpath.internal.XPath.execute(XPath.java:301)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.eval(XPathImpl.java:210)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:275)
at com.sun.org.apache.xpath.internal.jaxp.XPathImpl.evaluate(XPathImpl.java:365)
at MyCode(MyCode.java:71)
Caused by: java.lang.NullPointerException
at com.sun.org.apache.xpath.internal.functions.FuncExtFunction.execute(FuncExtFunction.java:206)
at com.sun.org.apache.xpath.internal.axes.PredicatedNodeTest.executePredicates(PredicatedNodeTest.java:340)
[...]
Similar Exceptions are thrown when I use fn:starts-with. I'm using JDK 1.6 on GNU/Linux.
Any ideas what I'm doing wrong? Thanks!

These string functions are available in XPath 2.0 which is not supported by the Java XPath API. You will have to use another library like Saxon for evaluating XPath 2.0 expressions:

Only the core xpath functions 1.0 (http://www.w3.org/TR/xpath/#corelib) are supported by default (as stated here: http://download.oracle.com/javase/6/docs/api/javax/xml/xpath/XPathFunctionResolver.html).
Therefore, instead of matches you should use contains (http://www.w3.org/TR/xpath/#function-contains).

First, if you use a prefix (fn) you should bind that to a namespace URI.
Second, XPath 1.0 functions doesn't use a prefix binding. That would be interpreted as a extension function call.
Third, match() is a XPath 2.0 function.
In XPath 1.0, this expression should work:
/div/a[span[starts-with(.,'ba')]][1]/#href
If you use // step operator, you should use:
(//a[span[starts-with(.,'ba')]])[1]/#href

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Sum values in HTML page using XQuery / XPath 2.0 - java

Use the XPath Sum Function: Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(xmlFile); XPath path = XPathFactory.newInstance().newXPath(); System.out.println(path.evaluate("sum(//h6)", doc)); prints: 17

Related

multiple html as output from 1 xsl with java

Reading XML tag from MediaWiki using Java

Edit HTML Document with Java

Java equivalent of System.Xml.XmlNode.InnerXml

fn: functions in XPathExpressions cause Exceptions in JDK1.6

Categories

Resources