Catch-all second alternative for my start rule - java

I'm trying to write an ANTLR grammar for a little query language. Queries are a list of search terms restricted to specific fields:
field1:a field2:b field3:c
That's supposed to return a list of entities where field1 matches a, field2 matches b, and so on. Queries can also be completely unrestricted:
abc
That's supposed to return entities with any field that matches abc. Here's the ANTLR grammar:
#members {
String unrestrictedQuery;
}
FIELD1_OPERATOR: 'field1:';
FIELD2_OPERATOR: 'field2:';
FIELD3_OPERATOR: 'field3:';
DIGIT: '0'..'9';
LETTER: 'A'..'Z' | 'a'..'z';
query: subquery (' ' subquery)*
| UNRESTRICTED_QUERY=.* {unrestrictedQuery = $UNRESTRICTED_QUERY.text;}
;
I want unrestricted queries to be any text that doesn't match the query rule's first alternative.
1) Is there a better way to grab the text that the second alternative matched?
2) When I plug this into my web server, the unrestrictedQuery parser field resolves to the last character of the query. It seems like the action gets called for every character of the query when I really want the whole string.
Thanks for reading!

"I want unrestricted queries to be any text that doesn't match the query rule's first alternative".
This is a bad design decision. What if in future, you want to add Field4? Then incompatibility occur. Better change the grammar so that unrestricted queries are easily recognized. Surround field values (a, b, c) with quotes, or start unrestricted query with a colon:
field1:a :abc field2:b

Related

Accent insensitive in CriteriaBuilder query

Well, I'm using CriteriaBuilder and PredicateList to work with JPA and extract information from the database, the fact is that it contains data with accents and I need to do searches that even though I search without accents I find the words with accents too.
For example:
The database contains the following data:
'técnico a' means
'tecnico b'.
In the clause where I enter'tec' and I need both options to appear. How can I make him skip the accents? I need that the result will be:
tecnico a,
tecnico b
I have this code:
predicatesList.add((builder.like(builder.lower(root.<String>get("descripcion")),
'%' + (descripcion.toLowerCase().trim() + '%'))));
thank you in advance.
The point is that character 'é' and 'e' are different. So you can not find both if searching by one of them. Here is an idea that I just thought of. Make your table to hold two fields for description. Call one "description" and another "normilized_description" in the "description" store the original value in the "normilized_description" store the value where you will replace all accented characters with non-accented ones. So in your case your records will look like this:
Record a:
description:'técnico a'
normilized_description: 'tecnico a'
Record b:
description:'tecnico b'
normilized_description: 'tecnico b'
Then if you need to search for both search by field "normilized_description" and if you need a specific value search by "description"

StringUtil indexOf() equivalent postgreSQL query

I need to implement stringUtils Class indexOf() method in postgresql.
Lets say I have a table in which url is one of the column.
url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"
My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query
Any idea on implementing this would be grateful.
Thanks
Have you tried split_part method?
SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)
Result:
split_part
paypal-info.com
For other string functions try this doc:
http://www.postgresql.org/docs/9.1/static/functions-string.html
Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:
SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;
The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.
select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
however it doesn't offer the option to get the n'th occurrence.
You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.
PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:
create language plpythonu;
create or replace function urlhost(url text) returns text
language plpythonu
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;
then:
regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
urlhost
-----------------
paypal-info.com
(1 row)
If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.
For best performance, you could write a simple C function and expose that as an extension.
Just replace 3 with N to get the index of the Nth '/' in a given string
SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1
To extract the host name from url you can use
SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')
Tested here: SQLFiddle

How to nest a query with Neo4j?

I am trying to do a sort of nested Neo4j query in Java, which first labels a subset of nodes and then tries to match certain patterns among them. More specifically it is like combining 2 queries of this type:
1 - MATCH (n)-[r:RELATIONSHIP*1..3]->(m) set m:LABEL
2 - MATCH (p:LABEL)-[r2:RELATIONSHIP]->(q:OTHERLABEL) where r2.time<100 return p,r2,q
Is there a way I can merge these two query in only one using the Java function engine.execute() ?
'p' in query #2 will, in general, correspond to a superset of 'm' in query #1. If that is your intention, then the following should work. Notice that the 2 MATCH statements have no common variables, but a WITH is required by the Cypher syntax, so I arbitrarily picked the variable 'm' to pass to the second MATCH (even though it will be ignored).
MATCH (n)-[r:RELATIONSHIP*1..3]->(m)
SET m:LABEL
WITH m
MATCH (p:LABEL)-[r2:RELATIONSHIP]->(q:OTHERLABEL)
WHERE r2.time<100
RETURN p,r2,q;
If you intend 'm' and ''p' to be the exactly the same, then just replace '(p:LABEL)' with '(m)':
MATCH (n)-[r:RELATIONSHIP*1..3]->(m)
SET m:LABEL
WITH m
MATCH (m)-[r2:RELATIONSHIP]->(q:OTHERLABEL)
WHERE r2.time<100
RETURN m,r2,q;

how to built regular expression to get value between two single quotes and if there is no single qoute, extract between commas

Problem that i face:
-I have an input string, a SQL statement that i need to parse
-extract the value that need to be insert base on the column name specify
-i can extract the value that is wrap in between 2 single quotes, but:
--?what about value that has no single quotes wrap at them? (like: integer or double)
--?what if the value inside already has single quotes? (like: 'James''s dictionary')
Below is the sample input string:
INSERT INTO LJS1_DX (base, doc, key1, key2, no, sq, eq, ln, en, date, line)
VALUES ('GET','','#000210','',' 0',' 1','5',1,0,'20100706','Street''James''s dictionary')
The Java Code i have below match value between two single quotes only:
Pattern p = Pattern.compile("'.*?'");
columnValues = "'GET0','','#000210','',' 0',' 1','5',1,0,'20100706','Street''James''s dictionary'";
Matcher m = p.matcher(columnValues); // get a matcher object
StringBuffer output = new StringBuffer();
while (m.find()) {
logger.trace(m.group());
}
Appreciate if anyone can provide any guideline or sample to this question.
Thank you!!
I agree with gnibbler that this is a job for a csv parser.
A regex that works on your example would be
'(?:''|[^'])*'|[^',]+
which looks challenging to debug and maintain, doesn't it?
Explanation:
' # First alternative: match an "opening" '
(?: # followed by either...
'' # two ' in a row (escaped ')
| # or...
[^'] # any character that is not a '
)* # zero or more times,
' # then match a "closing" '
| # or (second alternative):
[^',\s]+ # match any run of characters except ', comma or whitespace
It also works if there is whitespace around the values/commas (and will leave that out of the match).
Regex are not really suitable for this. You will always find cases that fail
A csv parser such as opencsv is probably a better option
In general, when you need to parse complex langauges, regexps are not the best tool - there's too much context to make sense of. So, if reading XML use an XML parser, if reading C code, use a C language parser and if reading SQL ...
There's a Java SQL parser here, I would use somethink like this.
For other languages it may be best to use a "YACC"-like parser. For example JACK
instead you can get all values using subString after Values keyword. Same way we can get names also. then you will have two comma-separated string which can be converted to array and you will have a arrays for names and values. you can then check which param has which value .
hope this helps.
I think Tim had the right idea; it just needs to be implemented more efficiently. Here's a much more efficient version:
'[^']*+(?:''[^']*+)*+'|[^',\s]++
It uses Friedl's "unrolled loop" technique to avoid excessive reliance on alternations that match one or two characters at a time (I think that's what did you in, Tim), plus possessive quantifiers throughout.
Regular expressions are not easy to use with this (but everything is possible).
I would suggest parsing it yourself, or use a library to do the parsing. By writing the parser yourself you are certain that it works exactly as you need it to.

Exact match with sql like and the bind

I have a bind in the SQL query
SELECT * FROM users WHERE name LIKE '%?%'
the bind set the ?.
Now, if i want to search with like method everything work but if, without change the sql, i want to search the exact match i dont now how to do.
I tried some regexp int the textbox es:
_jon \jon\ [jon] and some others but nothing work properly.
Any ideas?
Change your query to
select * from users where name like '?'
If you want to do a wildcard match, put the wildcards as part of the string that you're binding to the variable. If you don't want to do a wildcard match, then don't.
Note that like and = have the same performance except when your wildcard character is first in the string (for example, '%bob') as in that case the query optimizer can't use indexes as well to find the row(s) that you're looking for.
you can't search an exact match if the sql contains % symbols, as they are wildcards. you'll need to change the sql to
select * from users where name = '?'
for an exact match
(you can also use select * from users where name like '?' but that's more inefficient)
What is keeping you from changing the SQL?
The Like condition is for 'similar' matches, while the '=' is for exact matches.

Categories