How to match string that ends with a number using XPath

How to match string that ends with a number using XPath - java

The issue is that I'm looking to construct an XPath expression to get nodes having attributes XXX having values like TP* where the star is a number.
Suppose I have this XML file
<tagA attA="VAL1">text</tagA>
<tagB attB="VAL333">text</tagB>
<tagA attA="VAL2">text</tagA>
<tagA attA="V2">text</tagA>
So the xpath expression should get me all tagA having attribute attrA with values with the pattern VAL*
//tagA[#attrA[matches('VAL\d')]]: is not working

If you need XPath 1.0 solution, try below:
//tagA[boolean(number(substring-after(#attA, "VAL"))) or number(substring-after(#attA, "VAL")) = 0]
If #attA cannot be "VAL0", then just
//tagA[boolean(number(substring-after(#attA, "VAL")))]

matches() requires XPath 2.0, but javax.xml.xpath in Java 8 supports only XPath 1.0.
Furthermore, the first argument of matches() is the string to match. So, you'd want:
//tagA[#attrA[matches(., 'VAL\d')]]
This is looking for "VAL" plus a single digit anywhere in the attribute value of #attrA. See the regex in #jschnasse's answer if you wish to match the entire string with multiple/optional digit suffixes (XPath 2.0) or Andersson's answer for an XPath 1.0 solution.

Add a quantifier (*,+,...) to your \d. Try
'^VAL\d*$'
As #kjhughes has pointed out. This will not work with standard Java, because even current version of Java 11 does not support XPath 2.0.
You can however use Saxon if you need XPath 2.0 support.
Saxon Example (It is a variant of this answer using javax.xml)
Processor processor = new Processor(false);
#Test
public void xpathWithSaxon() {
String xml = "<root><tagA attA=\"VAL1\">text</tagA>\n" + "<tagB attB=\"VAL333\">text</tagB>\n"
+ "<tagA attA=\"VAL2\">text</tagA>\n" + "<tagA attA=\"V2\">text</tagA>\n" + "</root>";
try (InputStream in = new ByteArrayInputStream(xml.getBytes("utf-8"));) {
processFilteredXmlWith(in, "//root/tagA[matches(#attA,'^VAL\\d*$')]", (node) -> {
printItem(node, System.out);
});
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private void printItem(XdmItem node, PrintStream out) {
out.println(node);
}
public void processFilteredXmlWith(InputStream in, String xpath, Consumer<XdmItem> process) {
XdmNode doc = readXmlWith(in);
XdmValue list = filterNodesByXPathWith(doc, xpath);
list.forEach((node) -> {
process.accept(node);
});
}
private XdmNode readXmlWith(InputStream xmlin) {
try {
return processor.newDocumentBuilder().build(new StreamSource(xmlin));
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private XdmValue filterNodesByXPathWith(XdmNode doc, String xpathExpr) {
try {
return processor.newXPathCompiler().evaluate(xpathExpr, doc);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Prints
<tagA attA="VAL1">text</tagA>
<tagA attA="VAL2">text</tagA>

Related

How can I detect if a user enters a string which does not follow my ANTLR grammar rules?

I am making a Computer Algebra System which will take an algebraic expression and simplify or differentiate it.
As you can see by the following code the user input is taken, but if it is a string which does not conform to my grammar rules the error,
line 1:6 mismatched input '' expecting {'(', INT, VAR}, occurs and the program continues running.
How would I catch the error and stop the program from running? Thank you in advance for any help.
Controller class:
public static void main(String[] args) throws IOException {
String userInput = "x*x*x+";
getAST(userInput);
}
public static AST getAST(String userInput) {
ParseTree tree = null;
ExpressionLexer lexer = null;
ANTLRInputStream input = new ANTLRInputStream(userInput);
try {
lexer = new ExpressionLexer(input);
}catch(Exception e) {
System.out.println("Incorrect grammar");
}
System.out.println("Lexer created");
CommonTokenStream tokens = new CommonTokenStream(lexer);
System.out.println("Tokens created");
ExpressionParser parser = new ExpressionParser(tokens);
System.out.println("Tokens parsed");
tree = parser.expr();
System.out.println("Tree created");
System.out.println(tree.toStringTree(parser)); // print LISP-style tree
Trees.inspect(tree, parser);
ParseTreeWalker walker = new ParseTreeWalker();
ExpressionListener listener = new buildAST();
walker.walk(listener, tree);
listener.printAST();
listener.extractExpression();
return new AST();
}
}
My Grammar:
grammar Expression;
#header {
package exprs;
}
#members {
// This method makes the parser stop running if it encounters
// invalid input and throw a RuntimeException.
public void reportErrorsAsExceptions() {
//removeErrorListeners();
addErrorListener(new ExceptionThrowingErrorListener());
}
private static class ExceptionThrowingErrorListener extends BaseErrorListener {
#Override
public void syntaxError(Recognizer<?, ?> recognizer,
Object offendingSymbol, int line, int charPositionInLine,
String msg, RecognitionException e) {
throw new RuntimeException(msg);
}
}
}
#rulecatch {
// ANTLR does not generate its normal rule try/catch
catch(RecognitionException e) {
throw e;
}
}
expr : left=expr op=('*'|'/'|'^') right=expr
| left=expr op=('+'|'-') right=expr
| '(' expr ')'
| atom
;
atom : INT|VAR;
INT : ('0'..'9')+ ;
VAR : ('a' .. 'z') | ('A' .. 'Z') | '_';
WS : [ \t\r\n]+ -> skip ;

A typical parse run with ANTLR4 consists of 2 stages:
A "quick'n dirty" run with SLL prediction mode that bails out on the first found syntax error.
A normal run using the LL prediction mode which tries to recover from parser errors. This second step only needs to be executed if there was an error in the first step.
The first step is kinda loose parse run which doesn't resolve certain ambiquities and hence can report an error which doesn't really exist (when resolved in LL mode). But the first step is faster and delivers so a quicker result for syntactically correct input. This (JS) code shows the setup:
this.parser.removeErrorListeners();
this.parser.addErrorListener(this.errorListener);
this.parser.errorHandler = new BailErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.SLL);
try {
this.tree = this.parser.grammarSpec();
} catch (e) {
if (e instanceof ParseCancellationException) {
this.tokenStream.seek(0);
this.parser.reset();
this.parser.errorHandler = new DefaultErrorStrategy();
this.parser.interpreter.setPredictionMode(PredictionMode.LL);
this.tree = this.parser.grammarSpec();
} else {
throw e;
}
}
In order to avoid any resolve attempt for syntax errors in the first step you also have to set the BailErrorStrategy. This strategy simply throws a ParseCancellationException in case of a syntax error (similar like you do in your code). You could add your own handling in the catch clause to ask the user for correct input and respin the parse step.

Java + MongoDB: how get a nested field value using complete path?

I have this path for a MongoDB field main.inner.leaf and every field couldn't be present.
In Java I should write, avoiding null:
String leaf = "";
if (document.get("main") != null &&
document.get("main", Document.class).get("inner") != null) {
leaf = document.get("main", Document.class)
.get("inner", Document.class).getString("leaf");
}
In this simple example I set only 3 levels: main, inner and leaf but my documents are deeper.
So is there a way avoiding me writing all these null checks?
Like this:
String leaf = document.getString("main.inner.leaf", "");
// "" is the deafult value if one of the levels doesn't exist
Or using a third party library:
String leaf = DocumentUtils.getNullCheck("main.inner.leaf", "", document);
Many thanks.

Since the intermediate attributes are optional you really have to access the leaf value in a null safe manner.
You could do this yourself using an approach like ...
if (document.containsKey("main")) {
Document _main = document.get("main", Document.class);
if (_main.containsKey("inner")) {
Document _inner = _main.get("inner", Document.class);
if (_inner.containsKey("leaf")) {
leafValue = _inner.getString("leaf");
}
}
}
Note: this could be wrapped up in a utility to make it more user friendly.
Or use a thirdparty library such as Commons BeanUtils.
But, you cannot avoid null safe checks since the document structure is such that the intermediate levels might be null. All you can do is to ease the burden of handling the null safety.
Here's an example test case showing both approaches:
#Test
public void readNestedDocumentsWithNullSafety() throws IllegalAccessException, NoSuchMethodException, InvocationTargetException {
Document inner = new Document("leaf", "leafValue");
Document main = new Document("inner", inner);
Document fullyPopulatedDoc = new Document("main", main);
assertThat(extractLeafValueManually(fullyPopulatedDoc), is("leafValue"));
assertThat(extractLeafValueUsingThirdPartyLibrary(fullyPopulatedDoc, "main.inner.leaf", ""), is("leafValue"));
Document emptyPopulatedDoc = new Document();
assertThat(extractLeafValueManually(emptyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(emptyPopulatedDoc, "main.inner.leaf", ""), is(""));
Document emptyInner = new Document();
Document partiallyPopulatedMain = new Document("inner", emptyInner);
Document partiallyPopulatedDoc = new Document("main", partiallyPopulatedMain);
assertThat(extractLeafValueManually(partiallyPopulatedDoc), is(""));
assertThat(extractLeafValueUsingThirdPartyLibrary(partiallyPopulatedDoc, "main.inner.leaf", ""), is(""));
}
private String extractLeafValueUsingThirdPartyLibrary(Document document, String path, String defaultValue) {
try {
Object value = PropertyUtils.getNestedProperty(document, path);
return value == null ? defaultValue : value.toString();
} catch (Exception ex) {
return defaultValue;
}
}
private String extractLeafValueManually(Document document) {
Document inner = getOrDefault(getOrDefault(document, "main"), "inner");
return inner.get("leaf", "");
}
private Document getOrDefault(Document document, String key) {
if (document.containsKey(key)) {
return document.get(key, Document.class);
} else {
return new Document();
}
}

XPath evaluation in for loop always returns value for node from first Iteration

I am trying to extract values from a node using a method which is invoked within a for loop.
Every time the Method is invoked, the xpath.evaluate only evaluates the original node that was first passed into the method.
-Sample Node
<doc>
<str name="subcategory">xyzabc</str>
<str name="type">QAs</str>
<str name="id">1234</str>
</doc>
-- for loop where invoked
ArrayList<Node> responseNodeList = parseResponse(*document as string*);
for(int k = 0;k<responseNodeList.size();k++){
Result resultNode = new Result();
resultNode.setUrl(new URL(getAttributeValue(responseNodeList.get(k),resultUrl)));
resultNode.setId(new String(getAttributeValue(responseNodeList.get(k),resultId)));
}
-- Method where i extract the value
private String getAttributeValue(final Node responseNodeUrl, String attribute) throws SearchException {
try {
Node evalNode = null;
evalNode = responseNodeUrl;
try {
nodelisttostr(responseNodeUrl); // This is what i am printing out the node.. where we dont have a problem
} catch (TransformerFactoryConfigurationError e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (TransformerException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
XPath xPath = XPathFactory.newInstance().newXPath();
XPathExpression expr = null;
expr = xPath.compile("/doc/str[#name='"+attribute+"']");
String result ;
result=(String) expr.evaluate(evalNode,XPathConstants.STRING);
xPath.reset();
System.out.println("++++++++"+result); //But when i print this out, it always prints the same value
return result;
} catch (XPathExpressionException e) {
log.error("Error in SearchEngine:getAttributeValue:: "+e.getMessage());
throw new SearchException(e);
}
}

Solved..
The XpathExpression was always picking up the first node passed to the method and for some reason it seemed like it was cached, I changed my evaluation string to look like expr = xPath.compile(".//str[#name='"+attribute+"']"); and then the evaluate method now pics up the current document.
Thank you All.

If on your XML above, you use:
//doc/str/#name
You will get:
Attribute='name="subcategory"'
Attribute='name="type"'
Attribute='name="id"'
If you use contains in the XPath expression on your XML above i.e.
//doc/str/#name[contains(.,'e')]
then you will get:
Attribute='name="subcategory"'
Attribute='name="type"'
But I think you want to just want to do the following with your code:
change this:
expr = xPath.compile("/doc/str[#name='"+attribute+"']");
to this i.e. add an extra /:
expr = xPath.compile("//doc/str[#name='"+attribute+"']");
To evaluate all nodes instead of just the first.
I hope this helps!

Java/Android regex test if in a string is a link

Pattern.compile("((http\\://|https\\://|ftp\\://|sftp\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\\?\\.'~]*)?");
I have this pattern, I'd like to test if there is a link in my string.
I'd like to linkify those text in a TextView.
The code does not work when the link contains a & character.
full code:
Pattern httpMatcher = Pattern.compile("((http\\://|https\\://|ftp\\://|sftp\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%:/-_\\?\\.'~]*)?");
String httpViewURL = "myhttp://";
Linkify.addLinks(label, httpMatcher, httpViewURL);

I think this is cleaner that using regex:
boolean isLink(String s) {
try {
new URL(s);
return true;
} catch (MalformedURLException e) {
return false;
}
}

You can use Patterns.WEB_URL:
public boolean isLink(String string) {
return Patterns.WEB_URL.matcher(string).matches();
}
Note that Patterns class is available only since API level 8, but you can get its source code here https://github.com/android/platform_frameworks_base/blob/master/core/java/android/util/Patterns.java

Pattern httpMatcher = Pattern.compile("((http\\://|https\\://)|(www.))+((\\S+):(\\S+)#)?+(([a-zA-Z0-9\\.-]+\\.[a-zA-Z]{2,4})|([0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}))(/[a-zA-Z0-9%&#-:/-_\\?\\.'~]*)?");
this is working now, thanks

MongoDB regex, I get a different answer from the Java API compared with the console

I must be doing my regex wrong.
In the console I do
db.triples.find({sub_uri: /.*pdf.*/ }); and get the desired result.
My Java class looks like this, (I have set input="pdf"):
public static List<Triple> search(String input){
DB db=null;
try {
db = Dao.getDB();
}
catch (UnknownHostException e1) { e1.printStackTrace(); }
catch (MongoException e1) { e1.printStackTrace(); }
String pattern = "/.*"+input+".*/";
System.out.println(input);
List<Triple> triples = new ArrayList<Triple>();
DBCollection triplesColl = null;
try {
triplesColl = db.getCollection("triples"); } catch (MongoException e) { e.printStackTrace();}
{
Pattern match = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
BasicDBObject query = new BasicDBObject("sub_uri", match);
// finds all people with "name" matching /joh?n/i
DBCursor cursor = triplesColl.find(query);
if(cursor.hasNext()){
DBObject tripleAsBSON = cursor.next();
Triple t = new Triple();
t.setSubject(new Resource((String)tripleAsBSON.get("sub_uri")));
System.out.println(t.getSubject().getUri());
triples.add(t);
}
}
return triples;
}
From the console I get 12 results as I should, from the Java code I get no results.

Java doesn't need/understand regex delimiters (/ around the regex). You need to remove them:
String pattern = ".*"+input+".*";
I'm also not sure if that regex is really what you want. At least you should anchor it:
String pattern = "^.*"+input+".*$";
and compile it using the Pattern.MULTILINE option. This avoids a severe performance penalty if a line doesn't contain your sub-regex input. You are aware that input is a regex, not a verbatim string, right?

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to match string that ends with a number using XPath - java

If you need XPath 1.0 solution, try below: //tagA[boolean(number(substring-after(#attA, "VAL"))) or number(substring-after(#attA, "VAL")) = 0] If #attA cannot be "VAL0", then just //tagA[boolean(number(substring-after(#attA, "VAL")))]

Related

How can I detect if a user enters a string which does not follow my ANTLR grammar rules?

Java + MongoDB: how get a nested field value using complete path?

XPath evaluation in for loop always returns value for node from first Iteration

Java/Android regex test if in a string is a link

MongoDB regex, I get a different answer from the Java API compared with the console

Categories

Resources