java xpath parsing - java

Is there a way to retrieve from a XML file all the nodes that are not empty using XPath? The XML looks like this:
<workspace>
<light>
<activeFlag>true</activeFlag>
<ambientLight>0.0:0.0:0.0:0.0</ambientLight>
<diffuseLight>1.0:1.0;1.0:1.0</diffuseLight>
<specularLight>2.0:2.0:2.0:2.0</specularLight>
<position>0.1:0.1:0.1:0.1</position>
<spotDirection>0.2:0.2:0.2:0.2</spotDirection>
<spotExponent>1.0</spotExponent>
<spotCutoff>2.0</spotCutoff>
<constantAttenuation>3.0</constantAttenuation>
<linearAtenuation>4.0</linearAtenuation>
<quadricAttenuation>5.0</quadricAttenuation>
</light>
<camera>
<activeFlag>true</activeFlag>
<position>2:2:2</position>
<normal>1:1:1</normal>
<direction>0:0:0</direction>
</camera>
<object>
<material>lemn</material>
<Lu>1</Lu>
<Lv>2</Lv>
<unit>metric</unit>
<tip>tip</tip>
<origin>1:1:1</origin>
<normal>2:2:2</normal>
<parent>
<object>null</object>
</parent>
<leafs>
<object>null</object>
</leafs>
</object>
After each tag the parser "sees" another empty node that i don't need.

I guess what you want is all element nodes that have an immediate text node child that does not consist solely of white space:
//*[string-length(normalize-space(text())) > 0]

If you're using XSLT, use <xsl:strip-space elements="*"/>. If you're not, it depends what technology you are using (you haven't told us), eg. DOM, JDOM, etc.

You want:
//*[normalize-space()]
The expression:
//*[string-length(normalize-space(text())) > 0]
is a wrong answer. It selects all elements in the document whose first text node child's text isn't whitespace-only.
Therefore, this wouldn't select:
<p><b>Hello </b><i>World!</i></p>
although this paragraph contains quite a lot of text...

Related

Why is DOM doing this? (Wrong nodeName XML)

I have this XML (just a little part.. the complete xml is big)
<Root>
<Products>
<Product ID="307488">
<ClassificationReference ClassificationID="AR" Type="AgencyLink"/>
<ClassificationReference ClassificationID="AM" Type="AgencyLink">
<MetaData>
<Value AttributeID="tipoDeCompra" ID="C">Compra Centralizada</Value>
</MetaData>
</ClassificationReference>
</Product>
</Products>
</Root>
Well... I want to get the data from the line
<Value AttributeID="tipoDeCompra" ID="C">Compra Centralizada</Value>
I'm using DOM and when I use nodoValue.getTextContent() I got "Compra Centralizada" and that is ok...
But when I use nodoValue.getNodeName() I got "MetaData" but I was expecting "Value"
What is the explanations for this behaviour?
Thanks!
Your nodeValuevariable most likely points to the MetaData node, so the returned name is correct.
Note that for an element node Node.getTextContent() returns the concatenation of the text content of all child nodes. Therefore in your example the text content of the MetaData element is equal to the text content of the Value element, namely Compra Centralizada.
I guess your are getting the Node object using getElementsByTagName("MetaData"). In this case nodoValue.getTextContent() will return the text content correctly but to get the node name you need to get the child node.
Your current node must be MetaData and getTextContent() will give all the text within its opening and closing tags. This is because you are getting
Compra Centralizada
as the value. You should get the first child using getChildNodes() and then can get the Value tag.

localizing xsl for different languages

I have some static xsl to translate dynamic xml into html to response to browser. The rest of the web pages use Spring MVC for view. So they can be localized by using Spring's messages.properties file written in my language. But I don't know how to localize the text nodes in the static xsl using the same method. More specific below.
In Spring's web page, I can
<title><spring:message code="title.MyTitle"/></title>
In my static xsl, I have
<xsl:stylesheet ........
<xsl:output method="html"/>
<xsl:template match="/">
.....
<title>My Title</title>
and I want something like this
<xsl:stylesheet ........
<xsl:output method="html"/>
<xsl:template match="/">
.....
<title><spring:message code="title.MyTitle"/></title>
Of course the above doesn't work. But I hope I can keep all titles and labels in messages.properties for easy changes between languages. How can I do this? Please help.
Jirka Kosek has a technique for doing l10n lookups at http://www.xml.com/pub/a/2003/11/05/xslt.html. I thought he'd made a whole system for doing l10n with XSLT, but I can't find that now.
Separately, if your properties files are text rather than the XML property file format that Java also understands, the general technique would be:
Use unparsed-text() to get the text of the properties file
Tokenize on line-ends (those not preceded by /, that is)
Make a variable that contains an element for each of those strings, where the key is in an attribute value and the text is the text content of the element
All that's done so far is mimic the XML property file format.
Make a key that matches on the elements and uses the attribute value as the lookup
Make a function that takes the string as a parameter and does a lookup on the key, using the string as the second parameter and the variable as the third

Android java XML junk after document element

I'm using SAX to read/parse XML documents and I have it working fine except for this particular site where eclipse tells me "junk after document element" and I get no data returned
http://www.zachblume.com/apis/rhyme.php?format=xml&word=example
The site is not mine..just trying to get some data from it.
Yes, that's not an XML document. It's trying to include more than one root element:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
The parser regards everything after <word>ampal</word> as by that time it's read a complete document... hence the complain about "junk after document element".
An XML document can only have one root, but several children within the root. For example:
<?xml version="1.0"?>
<words>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
</words>
The page does not contain XML. It contains an XML snippet at best:
<?xml version="1.0"?>
<word>ampal</word>
<word>ample</word>
<word>hampel</word>
<word>hample</word>
<word>lampl</word>
<word>pampel</word>
<word>sample</word>
This is incorrect since there is no document element. SAX interprets the first <word> as the document element, and correctly reports "junk after document element" since for all it knows, the document element ends on line 1.
To get around the error, do not treat this document as XML. Download it as text, remove the XML declaration (<?xml version="1.0"?>) and then wrap it in a fake document element before you try to process it.

Check whether XML tag exist in Java

I am parsing an XML javax.xml, but i want to know whether a tag exist on a child node For example
<tag>
<child>
<special_tag>Special</special_tag>
<normal_tag>Normal</normal_tag>
</child>
<child>
<normal_tag>Normal</normal_tag>
</child>
</tag>
I am trying to know which Child has Special tag and which doesnt
Using xPath:
//child[special_tag]
Build the DOM for your xml and use XPath to query for the existence of "special_tag" using //special_tag
It seems that this is a classic usecase for SAX.
You should register your listener and check the tag name. When it equals to special_tag get the element's parent.

Getting The XML Data Inside Custom XPath function

Is there a way to get the current xml data when we make our own custom XPath function (see here).
I know you have access to an XPathContext but is this enough?
Example:
Our XML:
<foo>
<bar>smang</bar>
<fizz>buzz</fizz>
</foo>
Our XSL:
<xsl:template match="/">
<xsl:value-of select="ourFunction()" />
</xsl:template>
How do we get the entire XML tree?
Edit: To clarify: I'm creating a custom function that ends up executing static Java code (it's a Saxon feature). So, in this Java code, I wish to be able to get elements from the XML tree, such as bar and fizz, and their CDATA, such as smang and buzz.
Try changing your XSL so you call 'ourFunction(/)'. That should pass the root node to the function. You could also try . or ..
You'll presumably need to change the signature of the implementing function, I'll let someone else help with that.
What about select the current node selecting the relevant data from the current node into an XSL parameter, and passing that parameter to the function? Like:
<xsl:value-of select="ourFunction($data)" />

Categories