Searching Strings containing a regex in createCriteria Method - java

I'm using Grails for my web app project. I know the createCriteria method can perform search on existing entries in database. Let's say I have a domain "some_domain" which includes a string variable "domain_string". I want to find out all "domain_strings" that contain either a 7-digit or 10-digit number starting with "1" or "7". (e.g. domain_string1 = ".........1234567.......", domain_string2 = ".......7192839265......", etc)
In my code:
some_domain.createCriteria().list() {
rlike("domain_string", "%/^(1|7){7,10}/%")
}
I've used java regex here and the grails doc tells me that rlike is for regex input. But I can't get the exact output by the code because I'm not familiar with the groovy syntax. Any suggestions for that? Thanks a lot in advance.

You can use
rlike("domain_string", /([^0-9]|^)[17][0-9]{6}([0-9]{3})?([^0-9]|$)/)
See the regex demo.
Details:
([^0-9]|^) - either a non-digit char or start of string
[17] - 1 or 7
[0-9]{6} - any six digits
([0-9]{3})? - an optional occurrence of three digits
([^0-9]|$) - either a non-digit char or end of string.

Groovy regex by java native rules would look like:
def RE = /\D*[17]\d+\D*/
def domain_strings = [ ".........1234567.......", ".......7192839265......", ".......3192839265......", , ".......4192839265......" ]
domain_strings.each{
boolean match = it ==~ RE
println "$it matches? -> $match"
}
prints:
.........1234567....... matches? -> true
.......7192839265...... matches? -> true
.......3192839265...... matches? -> false
.......4192839265...... matches? -> false
You should check your DB SQL dialect if can consume such expressions as-is.

Related

Regular expression: Replace everything before first occurence

I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk
You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");
Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com

how to check regex starts and ends with regex

I am having the regex for capturing string if they are in between double quote and not start or end with /.
But the regex solution which I wanted.
The regex should not capture
Condition 1. Capture text between two double or single quotes.
Condition 2. But it shouldn't capture if starts with [ and ends with ]
Condition 3. But it shouldn't if starts with /" and ends with /' or starts with /" and ends with /'
Example:
REGEX: \"(\/?.)*?\"
Input: Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])
output:
captured output:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
5. "test"
6. "in"
Expected result:
1. "test"
2. "m2m:cin.as"
3. "payloads_ul.test"
4. [/"Dimming Value/"]
Condition 1 explanation:
Capture the text between double or single quotes.
example:
input : "test","m2m:cin.as"
output: "test","m2m:cin.as"
Condition 2 explanation:
If the regex is between starts with [ and ends with ] but it is having double or single quote then also it should not capture.
example:
input: ["test"]
output: it should not capture
Condition 3 explanation:
In the above-expected result for the input "[/"Dimming Value/"]" there is a two-time double quote but is capturing only one excluding /". So, the output is [/"Dimming Value/"]. Like this, I want if /' (single quote preceded by /).
Note:
For input "[/"Dimming Value/"]" or '[/'Dimming Value/']', here although the text is between double quote and single quote and having [ and ] it should not ignore the string. The output should be [/"Dimming Value/"].
As I understood, you want to capture text between double quotes, except:
if initial double quotes prefixed by [ or final double quotes suffixed by ]
doubles quotes prefixed by / should not be the begin or end of matched text
I don't know if you want also capture text between single quotes, because you text is not complete clear.
For create a non capture group with negative matching of prefixed chars, you need a group of type Negative Lookbehind, with syntax (?<!prefix that you dont want), but this is not present on java or javascript regex engine.
The best regex that I build to return what you want for you example (but only work on PHP or python (you can check it on site regex101.com or similar)) is:
(?<![\[/])\"(?!\])(\/?.)*?\"(?![\]/])
I added the restriction for don't match if initial double quotes suffixed by ] to prevent match "][" on text ["test"]["in"]
Anyway, this will not solve your problem, since will not work within java or javascript engine!
Do you have any way to process the results, and exclude the bad matches?
If so, you can match bad prefix and bad suffix and exclude it from the results:
[\[]?\"(\/?.)*?\"[\]]?
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
["test"]
["in"]
Full javascript code, including pos processing:
'Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson("test"), "m2m:cin.as"),"payloads_ul.test"),"[/"Dimming Value/"]",input["test"]["in"])'
.match(/[\[]?\"(\/?.)*?\"[\]]?/g).filter(s => !s.startsWith('[') && !s.endsWith(']'))
this will return:
"test"
"m2m:cin.as"
"payloads_ul.test"
"[/"Dimming Value/"]"
EDIT:
equivalent java code:
CharSequence yourStringHere = "Functions.getJsonPath(Functions.getJsonPath(Functions.getJsonPath(Functions.unescapeJson(\"test\"), \"m2m:cin.as\"),\"payloads_ul.test\"),\"[/\"Dimming Value/\"]\",input[\"test\"][\"in\"])";
Matcher m = Pattern.compile("[\\[]?\\\"(\\/?.)*?\\\"[\\]]?")
.matcher(yourStringHere);
while (m.find()) {
String s = m.group();
if (!s.startsWith("[") && !s.endsWith("]")) {
allMatches.add(s);
}
}

Matching groups with lookahead expression

I have problem with matching groups that contain lookahead expression. I don't know why this expressions doesn't work:
"""((?<=^)(.*)(?=\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%))((?<=[\w:]\s)(\w+)(?=\s[cr]))"""
When I compile them separately, for example:
"""(?<=^)(.*)(?=\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%)"""
I get the correct result
My sample text:
May 5 23:00:01 10.14.3.10 %ASA-6-302015: Built inbound UDP connection
Expressions have been checked with this tool: http://regex-testdrive.com/en/dotest
My Scala code:
import scala.util.matching.Regex
val text = "May 5 23:00:01 10.14.3.10 %ASA-6-302015: Built inbound UDP connection"
val regex = new Regex("""((?<=^)(.*)(?=\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%))((?<=[\w:]\s)(\w+)(?=\s[cr]))""")
val result = regex.findAllIn(text)
Does anyone know solution of this problem?
Multiple matching
You may fix the pattern as
^.*?(?=\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%)|(?<=[\w:]\s)\w+(?=\s[cr])
See the regex demo. The main point is to introduce the | alternation operator to match either of the 2 subpatterns. Note you do not need to put the ^ start of string anchor into a lookbehind, as ^ is already a zero-width assertion. Also, there are too many groupings that you do not seem to use any way. Also, to match a literal dot you need to escape it (. -> \.).
To obtain the multiple matches, you may use the following code snippet:
val text = "May 5 23:00:01 10.14.3.10 %ASA-6-302015: Built inbound UDP connection"
val regex = """^.*?(?=\s\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}\s%)|(?<=[\w:]\s)\w+(?=\s[cr])""".r
val result = regex.findAllIn(text)
result.foreach { x => println(x) }
// => May 5 23:00:01
// UDP
See the Scala online demo.
Note that once a pattern is used with .FindAllIn, it is not anchored by default, so you will get all the matches there are in the input string.
Capturing groups
Another approach you may use is matching the whole line while capturing the necessary bits with capturing groups:
val text = "May 5 23:00:01 10.14.3.10 %ASA-6-302015: Built inbound UDP connection"
val regex = """^(.*?)\s+\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%.*[\w:]\s+(\w+)\s+[cr].*""".r
val results = text match {
case regex(date, protocol) => Array(date, protocol)
case _ => Array[String]()
}
// Demo printing
results.foreach { m =>
println(m)
}
See another Scala demo. Since match requires a full string match, .* is added at the end of the pattern, and only relevant pairs of unescaped (...) are kept in the pattern. See the regex demo here.
your matches are not next to each other,
try this:
"""((?<=^)(.*)(?=\s\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}\s%)).*((?<=[\w:]\s)(\w+)(?=\s[cr]))"""
I just added the .* between them, it works on the link you sent :)

Java Regular expression not of some character followed by a word or starting with word

I would like to find a pattern where [^#$%_-]COMMENT should be true and string "COMMENT" should also be true.
Case 1 : #COMMENT true
Case 2 : #COMMENT false
Case 3 : COMMENT true
For Case 3 i am getting false
My regular expression is [^#\'\"$%_-|]COMMENT
Try a lookbehind assertion (?<![#$%'_"-])COMMENT
Stringed "(?<![#$%'_\"-])COMMENT"
If you actually want to match the character before, as well as comment,
it would be this \S?(?<![#$%_-])COMMENT
Stringed "\\S?(?<![#$%'_\"-])COMMENT"
Load up the class with [stuff not allowed].
When you use the class inside a negative assertion, you don't need the
negated class [^] anymore since it is positively addressed as not allowed
via a negative assertion.
You can use this.
^[^#$%-]?COMMENT$
[^#$%-]? -> not contain any of these characters.
String c1 = "#COMMENT";
String c2 = "#COMMENT";
String c3 = "COMMENT";
System.out.println(c1.matches("^[^#$%-]?COMMENT$"));
System.out.println(c2.matches("^[^#$%-]?COMMENT$"));
System.out.println(c3.matches("^[^#$%-]?COMMENT$"));
the corrrect java regex for our case would be
String regex = "[^#\'\\"\$%\-|]COMMENT"
You can test the output at regex101

IPV6 address into compressed form in Java

I have used Inet6Address.getByName("2001:db8:0:0:0:0:2:1").toString() method to compress IPv6 address, and the output is 2001:db8:0:0:0:0:2:1 ,but i need 2001:db8::2:1 . , Basically the compression output should based on RFC 5952 standard , that is
Shorten as Much as Possible : For example, 2001:db8:0:0:0:0:2:1 must be shortened to
2001:db8::2:1.Likewise, 2001:db8::0:1 is not acceptable,
because the symbol "::" could have been used to produce a
shorter representation 2001:db8::1.
Handling One 16-Bit 0 Field : The symbol "::" MUST NOT be used to shorten just one 16-bit 0 field.
For example, the representation 2001:db8:0:1:1:1:1:1 is correct, but
2001:db8::1:1:1:1:1 is not correct.
Choice in Placement of "::" : = When there is an alternative choice in the placement of a "::", the
longest run of consecutive 16-bit 0 fields MUST be shortened (i.e.,
the sequence with three consecutive zero fields is shortened in 2001:
0:0:1:0:0:0:1). When the length of the consecutive 16-bit 0 fields
are equal (i.e., 2001:db8:0:0:1:0:0:1), the first sequence of zero
bits MUST be shortened. For example, 2001:db8::1:0:0:1 is correct
representation.
I have also checked another post in Stack overflow, but there was no condition specified (example choice in placement of ::).
Is there any java library to handle this? Could anyone please help me?
Thanks in advance.
How about this?
String resultString = subjectString.replaceAll("((?::0\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)", "::$2").replaceFirst("^0::","::");
Explanation without Java double-backslash hell:
( # Match and capture in backreference 1:
(?: # Match this group:
:0 # :0
\b # word boundary
){2,} # twice or more
) # End of capturing group 1
:? # Match a : if present (not at the end of the address)
(?! # Now assert that we can't match the following here:
\S* # Any non-space character sequence
\b # word boundary
\1 # the previous match
:0 # followed by another :0
\b # word boundary
) # End of lookahead. This ensures that there is not a longer
# sequence of ":0"s in this address.
(\S*) # Capture the rest of the address in backreference 2.
# This is necessary to jump over any sequences of ":0"s
# that are of the same length as the first one.
Input:
2001:db8:0:0:0:0:2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1:0:0:0:1
2001:db8:0:0:1:0:0:1
2001:db8:0:0:1:0:0:0
Output:
2001:db8::2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1::1
2001:db8::1:0:0:1
2001:db8:0:0:1::
(I hope the last example is correct - or is there another rule if the address ends in 0?)
I recently ran into the same problem and would like to (very slightly) improve on Tim's answer.
The following regular expression offers two advantages:
((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)
Firstly, it incorporates the change to match multiple zeroes. Secondly, it also correctly matches addresses where the longest chain of zeroes is at the beginning of the address (such as 0:0:0:0:0:0:0:1).
Guava's InetAddresses class has toAddrString() which formats according to RFC 5952.
java-ipv6 is almost what you want. As of version 0.10 it does not check for the longest run of zeroes to shorten with :: - for instance 0:0:1:: is shortened to ::1:0:0:0:0:0. It is a very decent library for the handling of IPv6 addresses, though, and this problem should be fixed with version 0.11, such that the library is RFC 5952 compliant.
The open-source IPAddress Java library can do as described, it provides numerous ways of producing strings for IPv4 and/or IPv6, including the canonical string which for IPv6 matches rfc 5952. Disclaimer: I am the project manager of that library.
Using the examples you list, sample code is:
IPAddress addr = new IPAddressString("2001:db8:0:0:0:0:2:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8::2:1
addr = new IPAddressString("2001:db8:0:1:1:1:1:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8:0:1:1:1:1:1
addr = new IPAddressString("2001:0:0:1:0:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:0:0:1::1
addr = new IPAddressString("2001:db8:0:0:1:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
//2001:db8::1:0:0:1
After performing some tests, I think the following captures all the different IPv6 scenarios:
"((?:(?::0|0:0?)\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)" -> "::$2"
Not quite elegant but this is my proposal (based on chrixm work):
public static String shortIpv6Form(String fullIP) {
fullIP = fullIP.replaceAll("^0{1,3}", "");
fullIP = fullIP.replaceAll("(:0{1,3})", ":");
fullIP = fullIP.replaceAll("(0{4}:)", "0:");
//now we have full form without unnecessaires zeros
//Ex:
//0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200:0:0:0:0:0:0
//0000:0000:0000:1200:0000:0000:0000:8351 -> 0:0:0:1200:0:0:0:8351
//0000:125f:0000:94dd:e53f:0000:61a9:0000 -> 0:125f:0:94dd:e53f:0:61a9:0
//0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
//compress to short notation
fullIP = fullIP.replaceAll("((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)", "::$2");
return fullIP;
}
results:
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351 ->
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351
7469:125f:0000:0000:e53f:cfe7:0000:0000 -> 7469:125f::e53f:cfe7:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:0000:0000:94dd:0000:0000:0000:8351 -> 7469:0:0:94dd::8351
0469:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
469:125f:8eb6:94dd:0:cfe7:61a9:8351
0069:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
69:125f:8eb6:94dd:0:cfe7:61a9:8351
0009:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
9:125f:8eb6:94dd:0:cfe7:61a9:8351
0000:0000:8eb6:94dd:e53f:0007:6009:8350 ->
::8eb6:94dd:e53f:7:6009:8350 0000:0000:8eb6:94dd:e53f:0007:6009:8300
-> ::8eb6:94dd:e53f:7:6009:8300 0000:0000:8eb6:94dd:e53f:0007:6009:8000 ->
::8eb6:94dd:e53f:7:6009:8000 7469:0000:0000:0000:e53f:0000:0000:8300
-> 7469::e53f:0:0:8300 7009:100f:8eb6:94dd:e000:cfe7:6009:8351 -> 7009:100f:8eb6:94dd:e000:cfe7:6009:8351
7469:100f:8006:900d:e53f:cfe7:61a9:8351 ->
7469:100f:8006:900d:e53f:cfe7:61a9:8351
7000:1200:8e00:94dd:e53f:cfe7:0000:0001 ->
7000:1200:8e00:94dd:e53f:cfe7:0:1
0000:0000:0000:0000:0000:0000:0000:0000 -> ::
0000:0000:0000:94dd:0000:0000:0000:0000 -> 0:0:0:94dd::
0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200::
0000:0000:0000:1200:0000:0000:0000:8351 -> ::1200:0:0:0:8351
0000:125f:0000:94dd:e53f:0000:61a9:0000 ->
0:125f:0:94dd:e53f:0:61a9:0 7469:0000:8eb6:0000:e53f:0000:61a9:0000
-> 7469:0:8eb6:0:e53f:0:61a9:0 0000:125f:0000:94dd:0000:cfe7:0000:8351 ->
0:125f:0:94dd:0:cfe7:0:8351 0000:025f:0000:94dd:0000:cfe7:0000:8351
-> 0:25f:0:94dd:0:cfe7:0:8351 0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
0000:000f:0000:94dd:0000:cfe7:0000:8351 -> 0:f:0:94dd:0:cfe7:0:8351
0000:0000:0000:0000:0000:0000:0000:0001 -> ::1

Categories