Mongo Query Parser Regex - java

i want to extract query part(name,find,sort,limit - split by dot(.)) from mongo query via regex
input - >
db.metrics.find(
{
"brand_name":"Apple",
"job_status.status":"SUCCESS",
'host.user':'root',
"current_time":{$gt:new Date(Date.now() - 3*60*60 * 1000)}
}
).sort({"current_time" : -1}).limit(10)
with help of 2-3 stackoverflow answer i have build below regex
regex = `\.(?=(([^']*'){2})*[^']*$)(?=(([^\"]*\"){2})*[^\"]*$)(?![^()]*\\)`
which solves my use case till certain extent
i am not able to ignore dot(.) char group inside curly braces (Date.now())
regExr.com matched screen shot
i need regex which should ignore .now() part from above query

Related

Unable to capture next line character in Java

I have a requirement of parsing through an python file which contains multiple sql queries and get the start and end positions of the query to get only the query part using JAVA
I am using .contains function to check for sql(''' as my opening character for the query and now for the closing character I have ''') but there are some cases where ''') comes in between the query when there is a variable involved which should not be detected as an end of the query.
Something like this :
spark.sql(''' SELECT .......
FROM.....
WHERE xxx IN ('''+ Variable +''')
''')
here the last but one line also gets detected as end of line if I use line.contains(" ''') ") which is wrong.
All I can think of is to check for next line character as the end of the query as each query is separated by two empty lines. So tried these if (line.contains(" ''')\n") & if (line.contains(" ''')\r\n") but none of them work for me.
Kindly let me know of any other way to do this.
Note that I do not have the privilege to change the query file.
Thanks
I believe simple contains won't solve this problem.
You will have to use Pattern if you are looking to match \n.
String query = "spark.sql(''' SELECT .......\n" +
"FROM..... \n" +
"WHERE xxx IN ('''+ Variable +''')\n" +
"''')";
Pattern pattern = Pattern.compile("^spark.sql\\('''(.*)'''\\)$", Pattern.DOTALL);
System.out.println(pattern.matcher(query).find());
Output:
true
Pattern.DOTALL tells Java to allow the dot to match newline characters, too.

Java regex: need one regex to match all the formats specified

A log file has these pattern appearing more than once in a line.
for example the file may look like
dsads utc-hour_of_year:2013-07-30T17 jdshkdsjhf utc-week_of_year:2013-W31 dskjdskf
utc-week_of_year:2013-W31 dskdsld fdsfd
dshdskhkds utc-month_of_year:2013-07 gfdkjlkdf
I want to replace all date specific info with "Y"
I tried :
replaceAll("_year:.*\s", "_year:Y ");` but it removes everything that occurs after the first replacement,due to greedy match of .*
dsads utc-hour_of_year:Y
utc-week_of_year:Y
dshdskhkds utc-month_of_year:Y
but the expected result is:
dsads utc-hour_of_year:Y jdshkdsjhf utc-week_of_year:Y dskjdskf
utc-week_of_year:Y dskdsld fdsfd
dshdskhkds utc-month_of_year:Y gfdkjlkdf
Try using a reluctant quantifier: _year:.*?\s.
.replaceAll("_year:.*?\\s", "_year:Y ")
System.out
.println("utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf"
.replaceAll("_year:.*?\\s", "_year:Y "));
utc-hour_of_year:Y dsfsdgfsgf utc-week_of_year:Y dsfsdgfsdgf
I am not sure what you are really trying to do and this answer is only based on your example. In case you want to do something else leave comment below or edit your question with more specific information/example
It removes everything after _year: because you are using .*\\s which means
.* zero or more of any characters (beside new line),
\\s and space after it
so in sentence
utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf
it will match
utc-hour_of_year:2013-07-30T17 dsfsdgfsgf utc-week_of_year:2013-W31 dsfsdgfsdgf
// ^from here to here^
because by default * quantifier is greedy. To make it reluctant you need to add ? after * so try maybe
"_year:.*?\\s"
or even better instead .*? match only non-space characters using \\S which is the same as negation of \\s that can be written as [^\\s]. Also if your data can be at the end of your input you shouldn't probably add \\s at the end of your regex and space in your replacement, so try maybe one of this ways
.replaceAll("_year:\\S*", "_year:Y")
.replaceAll("_year:\\S*\\s", "_year:Y ")

Regex to find hostname from Jdbc url

I am new to regex. I would like to retrieve the Hostname from postgreSQL jdbc URL using regex.
Assume the postgreSQL url will be jdbc:postgresql://production:5432/dbname. I need to retrieve "production", which is the hostname. I want to try with regex and not with Java split function. I tried with
Pattern PortFinderPattern = Pattern.compile("[//](.*):*");
final Matcher match = PortFinderPattern.matcher(url);
if (match.find()) {
System.out.println(match.group(1));
}
But it's matching all the string from hostname till the end.
Pattern PortFinderPattern = Pattern.compile(".*:\/\/([^:]+).*");
regex without grouping :
"(?<=//)[^:]*"
[//]([\\w\\d\\-\\.]+)\:
Should be enough to find it reliably. Though this is probably a better regex:
The Hostname Regex
There are some errors in your regex:
[//] - This is only one character, because the [] marks a character class, so it will not fully match //. To match it, you need to write it like this: [/][/] or \/\/.
(.*) - This will match all characters to the end of line. You need to be more specific if you want to go till a certain character. For example you could go to the colon by fetching all characters, which are not colons, like this: ([^:]*).
:* - This makes the colon optional. I guess you forgot to put a dot( every character ) after the colon, like this: :.*.
So here is your regex corrected: \/\/([^:]*):.*.
Hope this helps.
BTW. If the port number is optional after production (:5432), then I suggest the following regex:
\/\/([^/]*)(?::\d+)?\/
To capture also Oracle and MySQL JDBC URL variants with their quirks (e.g. Oracle allowing to use # instead of // or even #//), I use this regexp to get the hostname: [/#]+([^:/#]+)([:/]+|$) Then the hostname is in group 1.
Code e.g.
String jdbcURL = "jdbc:oracle:thin:#//hostname:1521/service.domain.local";
Pattern hostFinderPattern = Pattern.compile("[/#]+([^:/#]+)([:/]+|$)");
final Matcher match = hostFinderPattern.matcher(jdbcURL);
if (match.find()) {
System.out.println(match.group(1));
}
This works for all these URLs (and other variants):
jdbc:oracle:thin:#//hostname:1521/service.domain.local
jdbc:oracle:thin:#hostname:1521/service.domain.local
jdbc:oracle:thin:#hostname/service.domain.local
jdbc:mysql://localhost:3306/sakila?profileSQL=true
jdbc:postgresql://production:5432/dbname
jdbc:postgresql://production/
jdbc:postgresql://production
This assumes that
The hostname is after // or # or a combination thereof (single / would also work, but I don't think JDBC allows that).
After the hostname either : or / or the end of the string follows.
Note that the the + are greedy, this is especially important for the middle one.

Regex backreference when string section excluded

I have a regular expression I am trying to use to rewrite an incoming REST url and am getting stuck on one use case when one section of the URL is excluded.
Here is the regex I'm currently using:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))?(?:/page/(\\d+))?$
As example I'm using "$1 - $2 - $3" as parts to use in writing new URL.
Here are the examples that are working correctly...
"/mySite/books/topic1/page/2" results in "/mySite/books - topic1 - 2"
"/mySite/books/topic1/subtopic1/page/2" results in "/mySite/books - topic1 - 2"
All the above work as intended. The problem is when the URL excludes the "topic1" part of the URL then the results are not what I need. Example:
"/mySite/books/page/2" results in "/mySite/books - page - "
What I need is the $2 to be blank, because there is no topic, and the page number still as $3. What I need as output...
"/mySite/books/page/2" results in "/mySite/books - - 2"
What can I change in my regex to satisfy that scenario without disrupting the existing ones that work correctly? This is being done in Java.
You might try to use regex pattern
^(/[^/]+/books)/(?:(?!page/)([^/]+)/)?page/(\\d+)$
It should suffice to make your second group ungreedy. Then the engine will first try to find a match without using it (trying only /page/\\d+ instead). And if that fails it tries to include the second group:
^(/[^/]+/(?:books))/([^/]+?)(?:/(?:(?!page).+?))??(?:/page/(\\d+))?$
Prepending any kind of quantifier (+, *, ? and {..} with ?) makes it ungreedy.

match a string of characters between tags:

I have the following strings:
<PAUL SAINT-KARL 1997-05-07>
<BOB DEAN 2001-05-07>
<GUY JEDDY 2007-05-07>
I want a java regex that would match this type of pattern "name and date" and then extract the name and date separately.
I able to match them separately with the following java regex:
1) (\d{4}-\d{2}-\d{2})>
2) <([ A-Z&#;0-9-]*+)
What I'm looking for is one regex that would identify the full text pattern as provided, and then extract the subsections, such as the actual name, and the date.
I'm looking to use Matcher.group() to retrieve the complete match from the target string.
Thanks
Try this:
"<([ A-Z&#;0-9-]*?) (\\d{4}-\\d{2}-\\d{2})>"
I changed the *+ to *? to make the * match lazily.

Categories