StringUtil indexOf() equivalent postgreSQL query - java

I need to implement stringUtils Class indexOf() method in postgresql.
Lets say I have a table in which url is one of the column.
url : "http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit"
My requirement is to find the index of the 3rd occurence of '/' in the above url and do substring and take only paypal-info.com host name in Postgresql Query
Any idea on implementing this would be grateful.
Thanks

Have you tried split_part method?
SELECT split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 3)
Result:
split_part
paypal-info.com
For other string functions try this doc:
http://www.postgresql.org/docs/9.1/static/functions-string.html
Edit: as for indexOf itself I don't know any built-in postgres solution. But using two string functions You can achieve it like this:
SELECT strpos('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', split_part('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit', '/', 4)) - 1 as index_of;

The string functions and operators section of the manual is the equivalent of String.indexOf, e.g.
select position('/' in 'http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
however it doesn't offer the option to get the n'th occurrence.
You're really approaching this all wrong. You should use proper URL parsing code to extract the host portion, not attempt to roll your own or use regex / splitting / string mangling.
PostgreSQL doesn't have a native URL/URI type, but its procedural languages do and it's trivial to wrap suitable functions. e.g. with PL/Python:
create language plpythonu;
create or replace function urlhost(url text) returns text
language plpythonu
immutable strict
as $$
import urlparse
return urlparse.urlparse(url).netloc
$$;
then:
regress=# select urlhost('http://paypal-info.com/home.webapps.cgi-bin-limit/webscr.cmd-login-submit');
urlhost
-----------------
paypal-info.com
(1 row)
If you'd prefer to use PL/Perl, PL/V8, or whatever, that's fine.
For best performance, you could write a simple C function and expose that as an extension.

Just replace 3 with N to get the index of the Nth '/' in a given string
SELECT length(substring('http://asd/asd', '(([^/]*/){3})')) - 1
To extract the host name from url you can use
SELECT substring('http://asd.com:234/qwe', 'http://([^:]+).*/')
Tested here: SQLFiddle

Related

PreparedStatement IN clause Regexp alternative?

I came across the same issue as the author of this question (PreparedStatement IN clause alternatives?), and wondered if using mysql's REGEXP would be an elegant way of getting the same functionality of IN while using only one PreparedStatement for varying number of values to match? Some example SQL here to show what I am talking about:
SELECT first_name, last_name
FROM people
WHERE first_name REGEXP ?
Multiple values could be supplied using a string like "Robert|Janice|Michael". I did not see REGEXP mentioned anywhere in that post.
Technically, yes, it is an alternative.
Note, however, that using a regex for matching is less efficient that the in operator ; it incurs more work for the database, that needs to initialize the regex engine, and run it against each and every value (it cannot take advantage of an index).You might not notice it on small volumes, but as your data grows larger this might become an issue. So I would not recommend that as a general solution: instead, just write a few more code lines in your application to properly use the in operator, and use regexes only where they are truly needed.
Aside: if you want to match the entire string, as in does, you need to surround the list of values with ^ and $, so the equivalent for:
first_name in ('Robert', 'Janice', 'Michael')
Would be:
first name regexp '^(Robert|Janice|Michael)$'
Another approach:
FIND_IN_SET(name, 'Robert,Janice,Michael')
Yes, that could be substituted in. But it must be a commalist of the desired values. This also works for FIND_IN_SET(foo, '1,123,45'). Note that 12 will not match.

search string inside a word in java

I want to search a word inside a string :
For example let the String be "ThisIsFile.java"
and let i want to search "File.java" or "IsFile"
This is something like sql 'like' query but unfortunately i am not
getting that string from database.
Please suggest me any good solution for this.
Thanks
There's a variety of ways of achieving this and the method you choose will depend on the complexity of your queries. For a simple plain symbol/word match the String class provides a contains method that takes a single parameter, the String to search for - and returns true if it occurs within the search String.
bool containsFile = myString.contains("file");
Do you mean:
if (haystack.contains(needle))
? Note that this won't respect word boundaries or anything like that - it just finds if one string (needle) is a substring in another (haystack).
Try using String.contains(). Also, check out the documentation for more information about the method
You should be able to use
String.contains("some string");

Exact match with sql like and the bind

I have a bind in the SQL query
SELECT * FROM users WHERE name LIKE '%?%'
the bind set the ?.
Now, if i want to search with like method everything work but if, without change the sql, i want to search the exact match i dont now how to do.
I tried some regexp int the textbox es:
_jon \jon\ [jon] and some others but nothing work properly.
Any ideas?
Change your query to
select * from users where name like '?'
If you want to do a wildcard match, put the wildcards as part of the string that you're binding to the variable. If you don't want to do a wildcard match, then don't.
Note that like and = have the same performance except when your wildcard character is first in the string (for example, '%bob') as in that case the query optimizer can't use indexes as well to find the row(s) that you're looking for.
you can't search an exact match if the sql contains % symbols, as they are wildcards. you'll need to change the sql to
select * from users where name = '?'
for an exact match
(you can also use select * from users where name like '?' but that's more inefficient)
What is keeping you from changing the SQL?
The Like condition is for 'similar' matches, while the '=' is for exact matches.

parsing string according to oracle operators with regex

Basically I was trying to replace the part of string with its actual value which comes immediately after oracle operators. I can do this for limited operators list like {=,>,<} but I wonder that is there any way out to gather all the operators rather than giving them by hands? For instance, I have this string; "a = xyz", then I will replace xyz with lets say 3. But as you know we have bunch of operator namely "like,in,exists etc". So my string can also be this: "a like xyz".
So what do you suggest me?
Thanks.
So what do you suggest me?
I suggest not to do this with regex (regex are not able to do so), but use an existing, proven SQL parser.
Have a look at this question on SO: SQL parser library for Java
Make your job simpler - require a very strict syntax on behalf of the caller.
For example, require that the string be in the form "target operator #variable#", e.g. "a = #xyz#".
Then, all you need to do is use REPLACE(input, '#xyz#', 3).
As noted above, you probably don't want to reinvent the Oracle SQL statement parser.

Keyword (OR, AND) search in Lucene

I am using Lucene in my portal (J2EE based) for indexing and search services.
The problem is about the keywords of Lucene. When you use one of them in the search query, you'll get an error.
For example:
searchTerms = "ik OR jij"
This works fine, because it will search for "ik" or "jij"
searchTerms = "ik AND jij"
This works fine, it searches for "ik" and "jij"
But when you search:
searchTerms = "OR"
searchTerms = "AND"
searchTerms = "ik OR"
searchTerms = "OR ik"
Etc., it will fail with an error:
Component Name: STSE_RESULTS Class: org.apache.lucene.queryParser.ParseException Message: Cannot parse 'OR jij': Encountered "OR" at line 1, column 0.
Was expecting one of:
...
It makes sense, because these words are keywords for Lucene are probably reserved and will act as keywords.
In Dutch, the word "OR" is important because it has a meaning for "Ondernemings Raad". It is used in many texts, and it needs to be found. For example "or" does work, but does not return texts matching the term "OR". How can I make it searchable?
How can I escape the keyword "or"? Or How can I tell Lucene to treat "or" as a search term NOT as a keyword.
I suppose you have tried putting the "OR" into double quotes?
If that doesn't work I think you might have to go so far as to change the Lucene source and then recompile the whole thing, as the operator "OR" is buried deep inside the code. Actually, compiling probably isn't even enough: you'll have to change the file QueryParser.jj in the source package that serves as input for JavaCC, then run JavaCC, then recompile the whole thing.
The good news, however, is that there's only one line to change:
| <OR: ("OR" | "||") >
becomes
| <OR: ("||") >
That way, you'll have only "||" as logical OR operator. There is a build.xml that also contains the invocation of JavaCC, but you have to download that tool yourself. I can't try it myself right now, I'm afraid.
This is perhaps a good question for the Lucene developer mailing list, but please let us know if you do that and they come up with a simpler solution ;-)
OR, NOT and AND are reserved keywords. I solved this problem just 2 days ago by lower-casing those 3 words in the user's search term before feeding it into the lucene query parser. Note that if you search and replace for these keywords make sure you use word boundaries (\b) so you don't end up changing words such as ANDROID and ORDER.
I then let the user specify NOT and AND by using - and +, just like Google does.
Escaping OR and AND with double quotes works for me. So try with a Java string like
String query = "field:\"AND\"";
I have read your question many times! =[
please look at these suggestions
How is your index stored?
Document containing Fields stored can be stored as
1)Stored 2)Tokenized 3)Indexed 4)Vector
it can make a significant difference
please use Luke, it can tell you how your indexes are stored(actually)
Luke is a must have if you are working with lucene, as it gives you a real idea of how indexes are stored,it also offers search, try it let us know with your update!
You're probably doing something wrong when you're building the query. I'll second Narayan's suggestion on getting Luke (as posted in the comments) and try running your queries with that. It has been a little while since I used Lucene, but I don't remember ever having issues with OR and AND.
Other than that, you can try escaping the input strings using QueryParser.escape(userQuery)
More On Escaping
You can escape the "OR" when it's a search term, or write your own query parser for a different syntax. Lucene offers an extensive query API in addition to the parser, with which you support your own query syntax quite easily.

Categories