What is the MySQL SQL REGEX for this regex - java

Regular regex:
foo(\((\d{1}|\d{2}|\d{3})\))?
This regex works in Java:
foo(\\((\\d{1}|\\d{2}|\\d{3})\\))?
Examples:
fooa //no match
foo(1)a //no match
foo(a) //no match
foo(1) //match
foo(999) //match
foo //match
MySQL 5.5 documentation (https://dev.mysql.com/doc/refman/5.5/en/regexp.html) says
Note:
Because MySQL uses the C escape syntax in strings (for example, “\n” to
represent the newline character), you must double any “\” that you use
in your REGEXP strings.
I tried as a test running the following on MySQL 5.x
select 'foo' REGEXP 'foo(\\((\\d{1}|\\d{2}|\\d{3})\\))?'
Here is the error message I get:
Error: You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to
use near ''foo(\\([(]\\d{1}' at line 1
I looked at Adapting a Regex to work with MySQL and tried the suggestion of replacing \d{1} etc.. with [0-9] which gave me:
select 'foo' REGEXP 'foo(\\(([0-9]|[0-9]|[0-9])\\))?'
But still getting MySQL death.

Not having an immediately availble MySQL console to verify, this should work:
'foo\\([:digit:]{1,3})\\)?'
Your other regexes have capture groups around both foo(123) and foo(123). It doesn't look like you want the capture groups in MySQL (does it even support them?), which would lead to MySQL choking.

Popping in because I ran into this and found the problem/solution.
Go go Global Preferences -> MySQL tab. Under "Use Custom Query Tokenizer" there is a "Procedure/Function Separator." If that is "|" change it to something else (like "/"). This is what's causing SQuirreL to fail parsing the REGEX.

Related

Invalid logback pattern

I was using this working pattern (logback.groovy):
{'((?:password(=|:|>))|(?:secret(=|:))|(?:salt(=|:)))','\$1*******\$3'}
to mask sensitive data. One day I needed to surround it with double quotes, like
was: password=smth
became: "password"="smth"
So I turned regexp into this (just added \" before and after keywords, and also I've tried \\"):
{'(\"?(?:password\"?(=|:|>))|(?:secret\"?(=|:))|(?:salt\"?(=|:)))','\$1*******\$3'}
But I get this error on app startup:
Failed to parse pattern
Unexpected character ('?' (code 63)): was expecting comma to separate Object entries
Can someone please explain to me what am I doing wrong?
If someone wondering here is correct version:
{'(\\\"?(?:password\\\"?(=|:|>))|(?:secret\\\"?(=|:))|(?:salt\\\"?(=|:)))','\$1*******\$3'}

Unable to capture next line character in Java

I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks
I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.

Java Special character encoding issue

I tried to insert some special character via java into oracle table and then retrieve it again--assuming my encoding will work.
Below is the code which i tried.
String s=new String("yesterday"+"\u2019"+"s");
...
statement.executeUpdate("INSERT into test1 values ('"+s+"')");
ResultSet rs=statement.executeQuery("select * from test1");
while (rs.next()) {
System.out.println(new String(rs.getString(1).getBytes("UTF-8"),"UTF-8"));
}
...
Now, when I try to see output via commandline execution it displays special character always: yesterday’s
My question is: why even after using encoding, it is not showing expected result. i.e. yesterday’s. Is above mentioned code is not correct or some modification is required?
P.S.: In eclipse, the code might result yesterday’s, but if executed via command line , it shows yesterday’s
I am using :
-- JDK1.6
-- Oracle : 11.1.0.6.0
-- NLS_Database_Parameters: NLS_CHARACTERSET WE8MSWIN1252
--Windows
Edit:
\u2019 : this is RIGHT SINGLE QUOTATION MARK & I am looking for this character only.
Check the java property "file.encoding" when you run on the commandline, it may be set to something other than "UTF-8" causing the text to display incorrectly when you output on the commandline.
Here is an illustration of what I suggested in a comment (change the character set of your client). Straight from my SQL*Plus:
SQL> select unistr('\2019') from dual;
U
-
Æ
SQL> $chcp 1252
Active code page: 1252
SQL> select unistr('\2019') from dual;
U
-
’
If this works for you, you may want to add $chcp 1252 to your [g]login.sql.
The problem is that the character encoding for the apostrophe is \u0027
I ran this in the command line:
public class Yesterday{
public static void main(String[] args) {
String s = new String("yesterday" + "\u0027" +"s");
System.out.println(s);
}
}
it resulted in:
yesterday's

Regex to find hostname from Jdbc url

I am new to regex. I would like to retrieve the Hostname from postgreSQL jdbc URL using regex.
Assume the postgreSQL url will be jdbc:postgresql://production:5432/dbname. I need to retrieve "production", which is the hostname. I want to try with regex and not with Java split function. I tried with
Pattern PortFinderPattern = Pattern.compile("[//](.*):*");
final Matcher match = PortFinderPattern.matcher(url);
if (match.find()) {
System.out.println(match.group(1));
}
But it's matching all the string from hostname till the end.
Pattern PortFinderPattern = Pattern.compile(".*:\/\/([^:]+).*");
regex without grouping :
"(?<=//)[^:]*"
[//]([\\w\\d\\-\\.]+)\:
Should be enough to find it reliably. Though this is probably a better regex:
The Hostname Regex
There are some errors in your regex:
[//] - This is only one character, because the [] marks a character class, so it will not fully match //. To match it, you need to write it like this: [/][/] or \/\/.
(.*) - This will match all characters to the end of line. You need to be more specific if you want to go till a certain character. For example you could go to the colon by fetching all characters, which are not colons, like this: ([^:]*).
:* - This makes the colon optional. I guess you forgot to put a dot( every character ) after the colon, like this: :.*.
So here is your regex corrected: \/\/([^:]*):.*.
Hope this helps.
BTW. If the port number is optional after production (:5432), then I suggest the following regex:
\/\/([^/]*)(?::\d+)?\/
To capture also Oracle and MySQL JDBC URL variants with their quirks (e.g. Oracle allowing to use # instead of // or even #//), I use this regexp to get the hostname: [/#]+([^:/#]+)([:/]+|$) Then the hostname is in group 1.
Code e.g.
String jdbcURL = "jdbc:oracle:thin:#//hostname:1521/service.domain.local";
Pattern hostFinderPattern = Pattern.compile("[/#]+([^:/#]+)([:/]+|$)");
final Matcher match = hostFinderPattern.matcher(jdbcURL);
if (match.find()) {
System.out.println(match.group(1));
}
This works for all these URLs (and other variants):
jdbc:oracle:thin:#//hostname:1521/service.domain.local
jdbc:oracle:thin:#hostname:1521/service.domain.local
jdbc:oracle:thin:#hostname/service.domain.local
jdbc:mysql://localhost:3306/sakila?profileSQL=true
jdbc:postgresql://production:5432/dbname
jdbc:postgresql://production/
jdbc:postgresql://production
This assumes that
The hostname is after // or # or a combination thereof (single / would also work, but I don't think JDBC allows that).
After the hostname either : or / or the end of the string follows.
Note that the the + are greedy, this is especially important for the middle one.

Regex backreference when string section excluded

I have a regular expression I am trying to use to rewrite an incoming REST url and am getting stuck on one use case when one section of the URL is excluded.
Here is the regex I'm currently using:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))?(?:/page/(\\d+))?$
As example I'm using "$1 - $2 - $3" as parts to use in writing new URL.
Here are the examples that are working correctly...
"/mySite/books/topic1/page/2" results in "/mySite/books - topic1 - 2"
"/mySite/books/topic1/subtopic1/page/2" results in "/mySite/books - topic1 - 2"
All the above work as intended. The problem is when the URL excludes the "topic1" part of the URL then the results are not what I need. Example:
"/mySite/books/page/2" results in "/mySite/books - page - "
What I need is the $2 to be blank, because there is no topic, and the page number still as $3. What I need as output...
"/mySite/books/page/2" results in "/mySite/books - - 2"
What can I change in my regex to satisfy that scenario without disrupting the existing ones that work correctly? This is being done in Java.
You might try to use regex pattern
^(/[^/]+/books)/(?:(?!page/)([^/]+)/)?page/(\\d+)$
It should suffice to make your second group ungreedy. Then the engine will first try to find a match without using it (trying only /page/\\d+ instead). And if that fails it tries to include the second group:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))??(?:/page/(\\d+))?$
Prepending any kind of quantifier (+, *, ? and {..} with ?) makes it ungreedy.

Categories