PreparedStatement IN clause Regexp alternative? - java

I came across the same issue as the author of this question (PreparedStatement IN clause alternatives?), and wondered if using mysql's REGEXP would be an elegant way of getting the same functionality of IN while using only one PreparedStatement for varying number of values to match? Some example SQL here to show what I am talking about:
SELECT first_name, last_name
FROM people
WHERE first_name REGEXP ?
Multiple values could be supplied using a string like "Robert|Janice|Michael". I did not see REGEXP mentioned anywhere in that post.

Technically, yes, it is an alternative.
Note, however, that using a regex for matching is less efficient that the in operator ; it incurs more work for the database, that needs to initialize the regex engine, and run it against each and every value (it cannot take advantage of an index).You might not notice it on small volumes, but as your data grows larger this might become an issue. So I would not recommend that as a general solution: instead, just write a few more code lines in your application to properly use the in operator, and use regexes only where they are truly needed.
Aside: if you want to match the entire string, as in does, you need to surround the list of values with ^ and $, so the equivalent for:
first_name in ('Robert', 'Janice', 'Michael')
Would be:
first name regexp '^(Robert|Janice|Michael)$'

Another approach:
FIND_IN_SET(name, 'Robert,Janice,Michael')
Yes, that could be substituted in. But it must be a commalist of the desired values. This also works for FIND_IN_SET(foo, '1,123,45'). Note that 12 will not match.

Related

Query with IN condition - problem with listing attributes which have comma in attributes name/value

To query specific objects in our system we can use filters. Currently we are working on filter manager in which you can enter your conditions in text mode to query objects in the system. We can write there our own custom filter, in the form of RSQL queries ( https://github.com/jirutka/rsql-parser ) and our own query language. The backend logic of the filter manager itself is written in Java, and JS is used for the front-end.
As a rule, the expression queries written in the Filter Manager work fine, the problem starts with the "IN condition". For example, we have such a condition:
Users IN ("value1,value2", "value3", "value4")
Unfortunately, "value1,value2" will not be treated as one attribute, but broken into two:
Users IN (""value1", "value2"", "value3", "value4")
As a result, the condition becomes invalid, because two quotation marks appear next to each other and a single value is broken into two values.
The problem is that some objects and attributes have a comma in our name, and as far as I can see with the "IN operator" you cannot use commas if there is a comma in the attribute name. I would like to give up the idea, where for such purposes you cannot use IN operator, only OR operator - because OR operator works fine.
I am wondering what approach can be used (besides bagaround using the OR operator). I was wondering about the Jiro approach, where in filters you can write values with and without quotes. For example:
User IN ("value1,value2", value3, value4)
And then "value1,value2" is treated as one attribute. I also thought about some clever escape characters, or maybe some regexp, but I am not sure.
Does anyone have any suggestions on how to approach this problem or any possible solutions? I'm quite lost. Thanks in advance for any answers and comments!

understanding regex if then statements

So I'm not sure if I understand how this works and would like
a simple explanation to how they work is all. I probably have it way off. A pure regex solution is required, and I don't know if this is possible. If it is, a solution would be awesome too, but a shove in the right direction would be good for my learning process ^_^
This is how I thought the if/then/else option built into my regex engines was formatted:
?(condition)if regex|else regex
I want it to capture a string from a very specific location only when this string exists within a certain section of javascript. Because this is how I thought it worked after a decent amount of research I tried out a few variations of this code but they all ended up something like this.
((?^view_large$)Tables-137(.*?)search.htm)
Also of relevance: I'm using an java based app that has regex searches which pull the data I need so I cannot write an if statement in java which would be my preferred method. It's a pain to have to do it this way, but at the moment I have no other choice. I'm trying really hard for them to allow java code functionality instead of pure regex for more versatile options.
So to summarize, is there even a if/then option in regex and if so how is it formatted for what I'm trying to accomplish?
EDIT: The string that I want to be the "if condition" is like this: if view_large string exists and is not null then capture the exact string 500/ which is captured within the catch all group I used: (.*?)
There is no conditionals in Java regexp, but you can simulate them by writing two expressions that include mutually exclusive look-behind constructs, like this:
((?<=if )then)|((?<!if )end)
This expression will match "then" when it is preceded by an "if "; it will match "end" when it is not preceded by an "if "
The Javadoc for java.util.regex.Pattern mentions, in its list of "Perl constructs not supported by this class":
The conditional constructs (?(condition)X) and (?(condition)X|Y).
So, no dice. But you should look through the Javadoc to see if you can achieve what you need by using regex features that it does support. (Or, if you post some more detailed examples, we can try to help.)
Try lookaround assertions.
For example, say you want to capture FOOBAR only if there is a 4+ digit number somewhere:
(?=.*\d{4}).*(FOOBAR)

Java SQL Escape without using setString

Is there a built-in method to escape a string for SQL? I would use setString, but it happens I am using setString multiple times in the same combined SQL statement and it would be better performance (I think) if the escape happened only once instead of each time I say setString. If I had the escaped string in a variable, I could re-use it.
Is there no way to do this in Java?
Current method, multi-source search. In reality they are three entirely different where statements including joins, but for this example I will just show the same where for each table.
String q = '%' + request.getParameter("search") + '%';
PreparedStatement s = s("SELECT a,b,c FROM table1 where a = ? UNION select a,b,c from table2 where a = ? UNION select a,b,c FROM table3 where a = ?");
s.setString(1, q);
s.setString(2, q);
s.setString(3, q);
ResultSet r = s.executeQuery();
I know this is not a big deal, but I like to make things efficient and also there are situations where it is more readable to use " + quote(s) + " instead of ? and then somewhere down the line you find setString.
If you use setString for a parameter (e.g. PreparedStatement.setString), there may well be no actual escaping required - it's likely that the data will be passed separately from the SQL itself, in a way that doesn't require escaping.
Do you have any concrete indication that this really is a performance bottleneck? It seems very unlikely that within a database query, the expensive part is setting the parameters locally...
Short answer: I wouldn't bother. It's best to do escaping at the last popssible moment. When you try to escape a string early and keep it around, it becomes much more difficult to verify that all strings have been escaped exactly once. (Escaping a string twice is almost as bad as not escaping it at all!) I've seen plenty of programs that try to escape strings early and then run into trouble because they need to update the string and then the programmer forgets to re-do the escape, or they update the escaped version of the string, or they have four strings and they escape three of them, etc. (I was just working on a bug where a programmer did HTML escapes on a string early, then decided he had to truncate the string to fit on a form, and ended up trying to output a string that ended with "&am". That is, he truncated his escape sequence so it was no longer valid.)
The CPU time to escape a string should be trivial. Unless you have a very large number of records or very big strings that are re-used, I doubt the savings would be worth worrying about. You'd probably be better off spending your time optimizing queries: saving a read of one record would probably be worth far more than eliminating 1000 trips through the string escape logic.
Longer answer: There's no built-in function. You could write one easily enough: Most flavors of SQL just need you to double any single quotes. You may need to also double backslashes or one or two other special characters. The fact that this can be different between SQL engines is one of the big arguments for using PreparedStatements and letting JDBC worry about it. (Personally I think there should be a JDbC function to do escaping that could then know any requirements specific to the DB engine. But there isn't so that's how it is.)
In any case, it's not clear how it would work with a PreparedStatement. There'd have to be some way to tell the PreparedStatement not to escape this string because it's already been escaped. And who really knows what's happening under the table in the conversation between JDBC and the DB engine: Maybe it never really escapes it at all, but passes it separately from the query. I suppose there could be an extra parameter on the setString that says "this string was pre-escaped", but that would add complexity and potential errors for very little gain.
Do not use org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);
It does not escape characters like \
You can use StringEscapeUtils from Apache commons:
org.apache.commons.lang.StringEscapeUtils.escapeSql(yourUnscapedSQL);

parsing string according to oracle operators with regex

Basically I was trying to replace the part of string with its actual value which comes immediately after oracle operators. I can do this for limited operators list like {=,>,<} but I wonder that is there any way out to gather all the operators rather than giving them by hands? For instance, I have this string; "a = xyz", then I will replace xyz with lets say 3. But as you know we have bunch of operator namely "like,in,exists etc". So my string can also be this: "a like xyz".
So what do you suggest me?
Thanks.
So what do you suggest me?
I suggest not to do this with regex (regex are not able to do so), but use an existing, proven SQL parser.
Have a look at this question on SO: SQL parser library for Java
Make your job simpler - require a very strict syntax on behalf of the caller.
For example, require that the string be in the form "target operator #variable#", e.g. "a = #xyz#".
Then, all you need to do is use REPLACE(input, '#xyz#', 3).
As noted above, you probably don't want to reinvent the Oracle SQL statement parser.

Keyword (OR, AND) search in Lucene

I am using Lucene in my portal (J2EE based) for indexing and search services.
The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.
For example:
searchTerms = "ik OR jij"
This works fine, because it will search for "ik" or "jij"
searchTerms = "ik AND jij"
This works fine, it searches for "ik" and "jij"
But when you search:
searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"
Etc., it will fail with an error:
Component Name: STSE_RESULTS Class: org.apache.lucene.queryParser.ParseException Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0.
Was expecting one of:
...
It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.
In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?
How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.
I suppose you have tried putting the "OR" into double quotes?
If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.
The good news, however, is that there's only one line to change:
| <OR: ("OR" | "||") >
becomes
| <OR: ("||") >
That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.
This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)
OR, NOT and AND are reserved keywords. I solved this problem just 2 days ago by lower-casing those 3 words in the user's search term before feeding it into the lucene query parser. Note that if you search and replace for these keywords make sure you use word boundaries (\b) so you don't end up changing words such as ANDROID and ORDER.
I then let the user specify NOT and AND by using - and +, just like Google does.
Escaping OR and AND with double quotes works for me. So try with a Java string like
String query = "field:\"AND\"";
I have read your question many times! =[
please look at these suggestions
How is your index stored?
Document containing Fields stored can be stored as
1)Stored 2)Tokenized 3)Indexed 4)Vector
it can make a significant difference
please use Luke, it can tell you how your indexes are stored(actually)
Luke is a must have if you are working with lucene, as it gives you a real idea of how indexes are stored,it also offers search, try it let us know with your update!
You're probably doing something wrong when you're building the query. I'll second Narayan's suggestion on getting Luke (as posted in the comments) and try running your queries with that. It has been a little while since I used Lucene, but I don't remember ever having issues with OR and AND.
Other than that, you can try escaping the input strings using QueryParser.escape(userQuery)
More On Escaping
You can escape the "OR" when it's a search term, or write your own query parser for a different syntax. Lucene offers an extensive query API in addition to the parser, with which you support your own query syntax quite easily.

Categories