xpath expression to match text ending with a variable - java

I have an XML with entities like this :
<Entity>
<Name>Lwresd_Dns_Server|LwresdDnsServer</Name>
</Entity
<Entity>
<Name>Lwresd_Dns_Server_Data|LwresdDnsServerData</Name>
</Entity>
My xpath expression is
XPathExpression expr = xpath1.compile("//Entity[matches(Name,'" +line+ "')]");
where line is a variable with value LwresdDnsServer.
The above xpath expression matches both entities , where I need it to match only the first one, i.e
Lwresd_Dns_Server|LwresdDnsServer
How should I frame the expression to do that ??

I believe this should do the trick:
XPathExpression expr =
xpath1.compile("//Entity[contains(concat('|', Name, '|'),'|" +line+ "|')]");
This compares the entity Name enclosed in |s with the variable name enclosed in |s, so you get something like:
contains('|Lwresd_Dns_Server|LwresdDnsServer|', '|LwrestDnsServer|') => Yes
contains('|Lwresd_Dns_Server_Data|LwresdDnsServerData|', '|LwrestDnsServer|') => No
And resultingly, only the first of the two Entities is selected.
If you only want to find entities that end with line (and not just those that contain an exact match for it), then you can do this (assuming the values are guaranteed to not contain the character $ - if there's the possibility it would contain a $, you should choose a different delimiter that it definitely won't contain, or use Dimitre Novatchev's answer to this question):
XPathExpression expr =
xpath1.compile("//Entity[contains(concat(Name, '$'),'" +line+ "$')]");
I haven't used the matches() function in XPath (it's not supported in XPath 1.0), but I suspect the following would also work for finding a value at the end of an Entity name, if your XPath evaluator supports matches():
XPathExpression expr =
xpath1.compile("//Entity[matches(Name,'" +line+ "$')]");
Here, $ is the RegEx symbol for the end of a string.

Here is an XPath 1.0 expression that implements what the XPath 2.0 function ends-with($s, $t) does:
substring($s, string-length($s) - string-length($t) +1) = $t
You can substitute $s and $t above with specific strings.

Related

ANTLR: parse NULL as a function name and a parameter

I would like to be able to use 'NULL' as both a parameter (the value null) and a function name in my grammar. See this reduced example :
grammar test;
expr
: value # valueExpr
| FUNCTION_NAME '(' (expr (',' expr)* )* ')' # functionExpr
;
value
: INT
| 'NULL'
;
FUNCTION_NAME
: [a-zA-Z] [a-zA-Z0-9]*
;
INT: [0-9]+;
Now, trying to parse:
NULL( 1 )
Results in the parse tree failing because it parses NULL as a value, and not a function name.
Ideally, I should even be able to parse NULL(NULL)..
Can you tell me if this is possible, and if yes, how to make this happen?
That 'NULL' string in your grammar defines an implicit token type, it's equivalent to adding something along this:
NULL: 'NULL';
At the start of the lexer rules. When a token matches several lexer rules, the first one is used, so in your grammar the implicit rule get priority, and you get a token of type 'NULL'.
A simple solution would be to introduce a parser rule for function names, something like this:
function_name: FUNCTION_NAME | 'NULL';
and then use that in your expr rule. But that seems brittle, if NULL is not intended to be a keyword in your grammar. There are other solution to this, but I'm not quite sure what to advise since I don't know how you expect your grammar to expand.
But another solution could be to rename FUNCTION_NAME to NAME, get rid of the 'NAME' token type, and rewrite expr like that:
expr
: value # valueExpr
| NAME '(' (expr (',' expr)* )* ')' # functionExpr
| {_input.LT(1).getText().equals("NULL")}? NAME # nullExpr
;
A semantic predicate takes care of the name comparison here.

XPath normalize-space() to return a sequence of normalized strings

I need to use the XPath function normalized-space() to normalize the text I want to extract from a XHTML document: http://test.anahnarciso.com/clean_bigbook_0.html
I'm using the following expression:
//*[#slot="address"]/normalize-space(.)
Which works perfectly in Qizx Studio, the tool I use to test XPath expressions.
let $doc := doc('http://test.anahnarciso.com/clean_bigbook_0.html')
return $doc//*[#slot="address"]/normalize-space(.)
This simple query returns a sequence of xs:string.
144 Hempstead Tpke
403 West St
880 Old Country Rd
8412 164th St
8412 164th St
1 Irving Pl
1622 McDonald Ave
255 Conklin Ave
22011 Hempstead Ave
7909 Queens Blvd
11820 Queens Blvd
1027 Atlantic Ave
1068 Utica Ave
1002 Clintonville St
1002 Clintonville St
1156 Hempstead Tpke
Route 49
10007 Rockaway Blvd
12694 Willets Point Blvd
343 James St
Now, I want to use the previous expression in my Java code.
String exp = "//*[#slot=\"address"\"]/normalize-space(.)";
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr = xpath.compile(exp);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
But the last line throws an Exception:
Cannot convert XPath value to Java object: required class is org.w3c.dom.NodeList; supplied value has type xs:string
Obvsiously, I should change XPathConstants.NODESET for something; I tried XPathConstants.STRING but it only returns the first element of the sequence.
How can I obtain something like an array of Strings?
Thanks in advance.
Your expression works in XPath 2.0, but is illegal in XPath 1.0 (which is used in Java) - it should be normalize-space(//*[#slot='address']).
Anyway, in XPath 1.0, when normalize-space() is called on a node-set, only the first node (in document order) is taken.
In order to do what you want to do, you'll need to use a XPath 2.0 compatible parser, or traverse the resulting node-set and call normalize-space() on every node:
XPath xpath = XPathFactory.newInstance().newXPath();
XPathExpression expr;
String select = "//*[#slot='address']";
expr = xpath.compile(select);
NodeList result = (NodeList)expr.evaluate(input, XPathConstants.NODESET);
String normalize = "normalize-space(.)";
expr = xpath.compile(normalize);
int length = result.getLength();
for (int i = 0; i < length; i++) {
System.out.println(expr.evaluate(result.item(i), XPathConstants.STRING));
}
...outputs exactly your given output.
It depends on what version of XPath you're using. Check out this post, hopefully it'll answer your question: Is it possible to apply normalize-space to all nodes XPath expression finds? Good luck.
The expression:
//*[#slot="address"]/normalize-space(.)
is syntactically legal (and practically useful) XPath 2.0 expression.
The same expression is not syntactically legal in XPath 1.0 -- it isn't allowed for a location step to be a function call.
In fact, it isn't possible to write a single XPath 1.0 expression the result of whose evaluation is the wanted set of strings.
You need to use in your program a product that implements XPath 2.0 -- such as Saxon 9.x.
As you noted, the XPath 2.0 expression //*[#slot="address"]/normalize-space(.) returns a sequence of strings. This return type is not supported by the JAXP XPathConstants class, because the JAXP interfaces were not designed to support XPath 2.0.
This leaves you with two choices:
Use an XPath 2.0 processor that has native interfaces for XPath 2.0 or that can convert sequences to a return type supported by JAXP
Use only XPath 1.0 expressions. For example, in your case you could simply select the target nodes:
//*[#slot="address"]
And then iterate the resulting nodeset, collecting the results into an array or List.
Note that it's important to distinguish between the processer you're using to evaluate the expression and the interface you're using to initiate the evaluation.

select using XPath with considering namespace and prefix

How to select an XML using XPath without considering namespace and prefix
tried
/*:OrderPerson/OrderUser/
But returns error
org.jdom.JDOMException: Invalid XPath expression: ....
Unexpected ':'
You can try this expression
/*[local-name()='OrderPerson']/OrderUser/
Your xpath query has not a valid syntax
try with
/OrderPerson/OrderUser/*

Find duplicated XML Element Names (xPath with variable)

I'm using XPATH 1.0 parsers alongside CLiXML in my JAVA project, I'm trying to setup a CLiXML constraint rules file.
I would like to show an error if there are duplicate element names under a specific child.
For example
<parentNode version="1">
<childA version="1">
<ignoredChild/>
</childA>
<childB version="1">
<ignoredChild/>
</childB>
<childC version="4">
<ignoredChild/>
</childC>
<childA version="2">
<ignoredChild/>
</childA>
<childD version="6">
<ignoredChild/>
</childD>
</parentNode>
childA appears more than once, so I would show an error about this.
NOTE: I only want to 'check/count' the Element name, not the attributes inside or the children of the element.
The code inside my .clx rules file that I've tried is:
<forall var="elem1" in=".//parentNode/*">
<equal op1="count(.//parentNode/$elem1)" op2="1"/>
</forall>
But that doesn't work, I get the error:
Caused by: class org.jaxen.saxpath.XPathSyntaxException: count(.//PLC-Mapping/*/$classCount: 23: Expected one of '.', '..', '#', '*', <QName>
As I want the code to check each child name and run another xPath query with the name of the child name - if the count is above 1 then it should give an error.
Any ideas?
Just try to get list of subnodes with appropriate path expression and check for duplicates in that list:
XPathExpression xPathExpression = xPath.compile("//parentNode/*");
NodeList children = (NodeList) xPathExpression.evaluate(config, XPathConstants.NODESET);
for (int i = 0; i < children.getLength(); i++) {
// maintain hashset of clients here and check if element is already there
}
This cannot be done with a single XPath 1.0 expression (see this similar question I answered today).
Here is a single XPath 2.0 expression (in case you can use XPath 2.0):
/*/*[(for $n in name()
return count(/*/*[name()=$n])
)
>1
]
This selects all elements that are children of the top element of the XML document and that occur more than once.

Iterate and concat using XPath Expression

I have the following xml file:
<author>
<firstname>Akhilesh</firstname>
<lastname>Singh</lastname>
</author>
<author>
<firstname>Prassana</firstname>
<lastname>Nagaraj</lastname>
</author>
And I am using the following JXPath expression,
concat(author/firstName," ",author/lastName)
To get the value Akhilesh Singh ,Prassana Nagaraj but
I am getting only Akhilesh Singh.
My requirement is that I should get the value of both author by executing only one JXPath expression.
XPath 2.0 solution:
/*/author/concat(firstname, ' ', lastname, following-sibling::author/string(', '))
With XPath 1.0, when an argument type other than node set is expected, the first node in the node set is selected and then apply the type conversion (boolean type conversion is some how different).
So, your expresion (Note: no capital):
concat(author/firstname," ",author/lastname)
It's the same as:
concat( string( (author/firstname)[1] ), " ", string( (author/lastname)[1] ) )
Depending on the host language you could use:
author/firstname|author/lastname
This is evaluate to a node set with firstName and lastName in document order, so then you could iterate over this node set extracting the string value.
In XPath 2.0 you could use:
string-join(author/concat(firstname,' ', lastname),' ,')
Output:
Akhilesh Singh ,Prassana Nagaraj
Note: Now, with sequence data type and function calls as steps, XPath resembles the functional language it claims to be. Higher Order Functions and partial applycation must wait to XPath 2.1 ...
Edit: Thanks to Dimitre's comments, I've corrected the string separator.
concat() will return single string. If you want both results then you need to iterate over "author" element and do "concat(firstName," ",lastName)"

Categories