Unable to capture next line character in Java - java

I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks

I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.

Related

How to print unmatched parameters in Drools

In the following Drools file, I join two queries in the when expression, and print matched results.
import com.demo.drools.*;
rule "demo"
when
$book: BlockTrade()
$buys : Trade(type=="buy") from $book.trades
$sells : Trade(type=="sell", $buys.id==id,
$buys.price==price,
$buys.trader==trader) from $book.trades
then
System.out.println("buys: " + $buys);
System.out.println("sells: " + $sells);
end
It works okay, but I want to log all unmatched trades with an unmatch reason.
For example:
Trade id=1 doesn't match because $buys.type="both" doesn't match any trades in $buys or $sells
// or
Trade id=2 doesn't match because $buys.price=50, and $buys.trader="John" doesn't match any $sells
How can it be implemented?
See this other answer. If you want to log the unmatched trades, you will need to create the rules for that.
Hope it helps,

JPA Select query not returning results with one letter word

I have a query that when given a word that starts with a one-letter word followed by space character and then another word (ex: "T Distribution"), does not return results. While given "Distribution" alone returns results including the results for "T Distribution". It is the same behavior with all search terms beginning with a one-letter word followed by space character and then another word.
The problem appears when the search term is of this pattern:
"[one-letter][space][letter/word]". example: "o ring".
What would be the problem that the LIKE operator not working correctly in this case?
Here is my query:
#Cacheable(value = "filteredConcept")
#Query("SELECT NEW sina.backend.data.model.ConceptSummaryVer04(s.id, s.arabicGloss, s.englishGloss, s.example, s.dataSourceId,
s.synsetFrequnecy, s.arabicWordsCache, s.englishWordsCache, s.superId, s.categoryId, s.dataSourceCacheAr, s.dataSourceCacheEn,
s.superTypeCasheAr, s.superTypeCasheEn, s.area, s.era, s.rank, s.undiacritizedArabicWordsCache, s.normalizedEnglishWordsCache,
s.isTranslation, s.isGloss, s.arabicSynonymsCount, s.englishSynonymsCount) FROM Concept s
where s.undiacritizedArabicWordsCache LIKE %:searchTerm% AND data_source_id != 200 AND data_source_id != 31")
List<ConceptSummaryVer04> findByArabicWordsCacheAndNotConcept(#Param("searchTerm") String searchTerm, Sort sort);
the result of the query on the database itself:
link to screenshot
results on the database are returned no matter the letters case:
link to screenshot
I solved this problem.
It was due to the default configuration of the Full-text index on mysql database which is by default set to 2 (ft_min_word_len = 2).
I changed that and rebuilt the index. Then, one-letter words were returned by the query.
12.9.6 Fine-Tuning MySQL Full-Text Search
Use some quotes:
LIKE '%:searchTerm%';
Set searchTerm="%your_word%" and use it on query like this :
... s.undiacritizedArabicWordsCache LIKE :searchTerm ...

How to distinguish in quotes delimiter vs out of quotes delimiter

I have a txt file that contains the following
SELECT TOP 20 personid AS "testQu;otes"
FROM myTable
WHERE lname LIKE '%pi%' OR lname LIKE '%m;i%';
SELECT TOP 10 personid AS "testQu;otes"
FROM myTable2
WHERE lname LIKE '%ti%' OR lname LIKE '%h;i%';
............
The above query can be any legit SQl statement (on one or multiple lines , i.e. any way user wishes to type in )
I need to split this txt and put into an array
File file ... blah blah blah
..........................
String myArray [] = text.split(";");
But this does not work properly because it take into account ALL ; . I need to ignore those ; that are within ";" AND ';'. For example ; in here '%h;i%' does not count because it is inside ''. How can I split correctly ?
Assuming that each ; you want to split on is at the end of line you can try to split on each ; + line separator after it like
text.split(";"+System.lineSeparator())
If your file has other line separators then default ones you can try with
text.split(";\n")
text.split(";\r\n")
text.split(";\r")
BTW if you want to include ; in split result (if you don't want to get rid of it) you can use look-behind mechanism like
text.split("(?<=;)"+System.lineSeparator())
In case you are dynamically reading file line-by-line just check if line.endsWith(";").
I see a 'new line' after your ';' - It is generalizable to the whole text file ?
If you must/want use regular expression you could split with a regex of the form
;$
The $ means "end of line", depending of the regex implementation of Java (don't remember).
I will not use regex for this kind of task. Parsing the text and counting the number of ' or " to be able to recognize the reals ";" delimiters is sufficient.

What is the MySQL SQL REGEX for this regex

Regular regex:
foo(\((\d{1}|\d{2}|\d{3})\))?
This regex works in Java:
foo(\\((\\d{1}|\\d{2}|\\d{3})\\))?
Examples:
fooa //no match
foo(1)a //no match
foo(a) //no match
foo(1) //match
foo(999) //match
foo //match
MySQL 5.5 documentation (https://dev.mysql.com/doc/refman/5.5/en/regexp.html) says
Note:
Because MySQL uses the C escape syntax in strings (for example, “\n” to
represent the newline character), you must double any “\” that you use
in your REGEXP strings.
I tried as a test running the following on MySQL 5.x
select 'foo' REGEXP 'foo(\\((\\d{1}|\\d{2}|\\d{3})\\))?'
Here is the error message I get:
Error: You have an error in your SQL syntax; check the manual
that corresponds to your MySQL server version for the right syntax to
use near ''foo(\\([(]\\d{1}' at line 1
I looked at Adapting a Regex to work with MySQL and tried the suggestion of replacing \d{1} etc.. with [0-9] which gave me:
select 'foo' REGEXP 'foo(\\(([0-9]|[0-9]|[0-9])\\))?'
But still getting MySQL death.
Not having an immediately availble MySQL console to verify, this should work:
'foo\\([:digit:]{1,3})\\)?'
Your other regexes have capture groups around both foo(123) and foo(123). It doesn't look like you want the capture groups in MySQL (does it even support them?), which would lead to MySQL choking.
Popping in because I ran into this and found the problem/solution.
Go go Global Preferences -> MySQL tab. Under "Use Custom Query Tokenizer" there is a "Procedure/Function Separator." If that is "|" change it to something else (like "/"). This is what's causing SQuirreL to fail parsing the REGEX.

Java Regular Expressions - Matching the First Occurrence of a Pattern

I'm matching URLs against a regular expression, testing if they reflect a "shutdown" command.
Here's a URL that performs a shutdown:
/exec?debug=true&command=shutdown&f=0
Here's another, legitimate but confusing URL that performs shutdown:
/exec?commando=yes&zcommand=34&command=shutdown&p
Now, I must ensure there's only one command=... parameter and it is command=shutdown. Alternatively, I can live with ensuring the first command=... parameter is command=shutdown.
Here's my test for the requested regular expression:
/exec?version=0.4&command=shutdown&out=JSON&zcommand=1
Should match
/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown
Should fail to match
/exec?command=shutdown&out=JSON
Should match
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Should fail to match
Here's my baseline - a regular expression that passes the above tests - all but the last one:
^/exec?(.*\&)*command=shutdown(\&.*)*$
The problem is with the occurrence of more than one command=..., where the first one is not shutdown.
I tried using lookbehind:
^/exec?(.*\&)*(?<!(\&|\?)command=.*)command=shutdown(\&.*)*$
But I'm getting:
Look-behind group does not have an obvious maximum length near index 31
I even tried atomic grouping. To no avail. I can't make the following expression NOT match:
/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown
Can anyone help with a regular expression that passes all the tests?
Clarifications
I see I owe you some context.
My task is to configure a Filter that guards the entrance of all our system’s servlets, and verifies there’s an open HTTP session (in other words: that a successful Login has occurred). The filter also allows configuring which URLs do not require login.
Some exceptions are easy: /login does not need login. Calls to localhost do not need login.
But sometimes it gets complicated. Like the shutdown command that cannot require login while other commands can and should (the strange reason for that is out of the scope of my question).
Since it’s a security matter, I can’t allow users to merely append &command=shutdown to a URL and bypass the filter.
So I really need a regular expression, or otherwise I’ll need to redefine the configuration specs.
You would need to do it in multiple steps:
(1) Find match of ^(?=\/exec\?).*?(?<=[?&])command=([^&]+)
(2) Check if match is shutdown
Ok. I thank you all for your great answers! I tried some of the suggestions, struggled with others, and all in all I have to agree that even if the right regex exists, it looks terrible, non maintainable, and can serve well as a nasty university exercise, but not in a real system configuration.
I also realize that since a Filter is involved here, and the Filter already parses its own URI, it is absolutely ridiculous to glue back all the URI parts into a string and match it against a regular expression. What was I thinking??
I'll therefore redesign the Filter and its configuration.
Thanks a lot, people! I appreciate the help :)
Noam Rotem.
P.S. - why was I getting a userXXXX nick? Very strange...
This tested (and fully commented) regex solution meets all your requirements:
import java.util.regex.*;
public class TEST {
public static void main(String[] args) {
Pattern re = Pattern.compile(
" # Match URI having command=shutdown query variable value. \n" +
" ^ # Anchor to start of string. \n" +
" (?:[^:/?\\#\\s]+:)? # URI scheme (Optional). \n" +
" (?://[^/?\\#\\s]*)? # URI authority (Optional). \n" +
" [^?\\#\\s]* # URI path. \n" +
" \\? # Literal start of URI query. \n" +
" # Match var=value pairs preceding 'command=xxx'. \n" +
" (?: # Zero or more 'var=values' \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" & # var=value separator. \n" +
" )* # Zero or more 'var=values' \n" +
" command=shutdown # variable and value to match. \n" +
" # Match var=value pairs following 'command=shutdown'. \n" +
" (?: # Zero or more 'var=values' \n" +
" & # var=value separator. \n" +
" (?!command=) # only if not-'command=xxx'. \n" +
" [^&\\#\\s]* # Next var=value. \n" +
" )* # Zero or more 'var=values' \n" +
" (?:\\#\\S*)? # URI fragment (Optional). \n" +
" $ # Anchor to end of string.",
Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.COMMENTS);
String s = "/exec?version=0.4&command=shutdown&out=JSON&zcommand=1";
// Should match
// String s = "/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown";
// Should fail to match
// String s = "/exec?command=shutdown&out=JSON";
// Should match
// String s = "/exec?version=0.4&command=admin&out=JSON&zcommand=1&command=shutdown";
// Should fail to match";
Matcher m = re.matcher(s);
if (m.find()) {
// Successful match
System.out.print("Match found.\n");
} else {
// Match attempt failed
System.out.print("No match found.\n");
}
}
}
The above regex matches any RFC3986 valid URI having any scheme, authority, path, query or fragment components, but it must have one (and only one) query "command" variable whose value must be exactly, but case insensitively: "shutdown".
A carefully crafted complex regex is perfectly fine (and maintainable) to use when written with proper indentation and commented steps (like shown above). (For more information on using regex to validate a URI, see my article: Regular Expression URI Validation)
If you can live with just accepting the first match, you could just use '\\Wcommand=([^&]+) and fetch the first group.
Otherwise, you could just call Matcher.find twice to test for subsequent matches, and eventually use the first match, why do you want to do this with a single complex regex?
I am not a Java coder, but try this one (works in Perl) >>
^(?=\/exec\?)(?:[^&]+(?<![?&]command)=[^&]+&)*(?<=[?&])command=shutdown(?:&|$)
To match the first occurrence of command=shutdown use this:
Pattern.compile("^((?!command=).)+command=shutdown.*$");
The results will look like this:
"/exec?version=0.4&command=shutdown&out=JSON&zcommand=1" => false
"/exec?command=shutdown&out=JSON" => true
"/exec?version=0.4&command=startup&out=JSON&zcommand=1&commando=shutdown" => false
"/exec?commando=yes&zcommand=34&command=shutdown&p" => false
If you want to match strings that ONLY contain one 'command=' use this:
Pattern.compile("^((?!command=).)+command=shutdown((?!command=).)+$");
Please note that using "not" qualifiers in regular expressions is not something they are intended for and performance might not be the best.
If this can be done with a single regular expression, and it may well could be; it will be so complex as to be un-readable, and thus un-maintainable as the intent of the logic will be lost. Even if it is "documented" it will still be much less obvious to someone who just knows Java.
A much better approach would be to use the URI object parse the entire thing, domain and all and pull off the query parameters and then write a simple loop that walks through them and decides based on your business logic what is a shutdown and what isn't. Then it will be simple, self-documenting and probably more efficient ( not that that should be a concern ).
Try this:
Pattern p = Pattern.compile(
"^/exec\\?(?:(?:(?!\\1)command=shutdown()|(?!command=)\\w+(?:=[^&]+)?)(?:&|$))+$\\1");
Or a little more readably:
^/exec\?
(?:
(?:
(?!\1)command=shutdown()
|
(?!command=)\w+(?:=[^&]+)?
)
(?:&|$)
)+$
\1
The main body of the regex is an alternation that matches either a shutdown command or a parameter whose name is not command. If it does match a shutdown command, the empty group in that branch "captures" an empty string. It doesn't need to consume anything, because we're only using it as a checkbox, confirming en passant that one of the parameters was a shutdown command.
The negative lookahead - (?!\1) - prevents it from matching two or more shutdown commands. I don't know if that's really necessary, but it's a good opportunity to demonstrate (1) how to negate a "back-assertion", and (2) that a backreference can appear before the group it refers to in certain circumstances (what's known as a forward reference).
When the whole URL has been consumed, the backreference (\1) acts like a zero-width assertion. If one of the parameters was command=shutdown, the backreference will succeed. Otherwise it will fail even though it's only trying to match an empty string, because the group it refers to didn't participate in the match.
But I have to concur with the other responders: when your regexes get this complicated, you should be thinking seriously about switching to a different approach.
EDIT: It works for me. Here's the demo.

Categories