Modification of java regex

Modification of java regex - java

I was using the below regex to substitute file names
Regex -> .*\/([A-Z0-9_]{1,9})_(O).*.cmd
Substitution -> $1
The file names were like below:
File Name | Substituted Name
---------------------------------- ------------------
/V3/OGM_REC_Offline_Level0_4D.cmd OGM_REC
/V2/PIE_PROD_Online_Level1_6D.cmd PIE_PROD
/V3/BR2_OnDemand.cmd BR2
/opt/STING_Online_Inc0_1W.cmd STING
Then the files changed and I modified the regex
Regex -> .*\/([A-Z0-9_]{1,9})(_O|Full).*.cmd
Substitution -> $1
Additional new file names
File Name | Substituted Name
---------------------- ------------------
/opt/RSU10Full.cmd RSU10
/V4/REZ40_1Full.cmd REZ40_1
Now, it seems there are new files are getting updated with below name formats
/app/OMGIT_FullOnDemand_4W.cmd
/admin/FOC_STG_Full_6D.cmd
I've modified the regex again, but it's not getting successful
Regex -> .*\/([A-Z0-9_]{1,9})(_O|Full|_Full).*.cmd
Substitution -> $1

I suggest using a version with a lazy limiting quantifier {1,9}? and optional _:
.*/([A-Z0-9_]{1,9}?)(_O|_?Full).*[.]cmd
This way, we match as few characters with [A-Z0-9_]{1,9}? as possible to return a valid captured subtext, and _?Full part can hold the optional underscore.
See the regex demo

I've noticed that unnecessary tail is allways started with: (optional) _, letter in uppercase, letter in lowercase.
So, universal solution is:
.*\/([^a-z]*?)[_]?[A-Z][a-z].*

Related

Searching Strings containing a regex in createCriteria Method

I'm using Grails for my web app project. I know the createCriteria method can perform search on existing entries in database. Let's say I have a domain "some_domain" which includes a string variable "domain_string". I want to find out all "domain_strings" that contain either a 7-digit or 10-digit number starting with "1" or "7". (e.g. domain_string1 = ".........1234567.......", domain_string2 = ".......7192839265......", etc)
In my code:
some_domain.createCriteria().list() {
rlike("domain_string", "%/^(1|7){7,10}/%")
}
I've used java regex here and the grails doc tells me that rlike is for regex input. But I can't get the exact output by the code because I'm not familiar with the groovy syntax. Any suggestions for that? Thanks a lot in advance.

You can use
rlike("domain_string", /([^0-9]|^)[17][0-9]{6}([0-9]{3})?([^0-9]|$)/)
See the regex demo.
Details:
([^0-9]|^) - either a non-digit char or start of string
[17] - 1 or 7
[0-9]{6} - any six digits
([0-9]{3})? - an optional occurrence of three digits
([^0-9]|$) - either a non-digit char or end of string.

Groovy regex by java native rules would look like:
def RE = /\D*[17]\d+\D*/
def domain_strings = [ ".........1234567.......", ".......7192839265......", ".......3192839265......", , ".......4192839265......" ]
domain_strings.each{
boolean match = it ==~ RE
println "$it matches? -> $match"
}
prints:
.........1234567....... matches? -> true
.......7192839265...... matches? -> true
.......3192839265...... matches? -> false
.......4192839265...... matches? -> false
You should check your DB SQL dialect if can consume such expressions as-is.

Regular expression: Replace everything before first occurence

I have the following regular expression that I'm using to remove the dev. part of my URL.
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll(".*\\.(?=.*\\.)", ""));
Outputs: mydomain.com but this is giving me issues when the domains are in the vein of dev.mydomain.com.pe or dev.mydomain.com.uk in those cases I am getting only the .com.pe and .com.uk parts.
Is there a modifier I can use on my regex to make sure it only takes what is before the first . (dot included)?
Desired output:
dev.mydomain.com -> mydomain.com
stage.mydomain.com.pe -> mydomain.com.pe
test.mydomain.com.uk -> mydomain.com.uk

You may use
^[^.]+\.(?=.*\.)
See the regex demo and the regex graph:
Details
^ - start of string
[^.]+ - 1 or more chars other than dots
\. - a dot
(?=.*\.) - followed with any 0 or more chars other than line break chars as many as possible and then a ..
Java usage example:
String result = domain.replaceFirst("^[^.]+\\.(?=.*\\.)", "");

Following regex will work for you. It will find first part (if exists), captures rest of the string as 2nd matching group and replaces the string with 2nd matching group. .*? is non-greedy search that will match until it sees first dot character.
(.*?\.)?(.*\..*)
Regex Demo
sample code:
String domain = "dev.mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "stage.mydomain.com.pe";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "test.mydomain.com.uk";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
domain = "mydomain.com";
System.out.println(domain.replaceAll("(.*?\\.)?(.*\\..*)", "$2"));
output:
mydomain.com
mydomain.com.pe
mydomain.com.uk
mydomain.com

Java regex for replacement with delimiter as : or $

I've jobs running following non-standard naming conventions with some job examples below:
=========================================
Job Name | New Name
----------------------------- ----------
JOB:/Level0_APP1_12345_0/ JOB
JOB:Level1_DBASW_t323dk23_p1 JOB
JOB$SAV: JOB
backup:SYNC1 backup
QUERY:logs QUERY
QUERY$maps: QUERY
QUERY: QUERY
FS1:\ FS1:\ -- No change in name
PS:\MXMI PS:\MXMI -- No change in name
========================================
The delimiter is either (;) or ($) whichever comes first. Also, the regex should not job which have (:\) in name, as shown in last 2 examples.
I've used the below, but without success
Regex:
(:|\$[a-zA-Z\/0-9]+)|(\$[a-zA-Z\/0-9]+)|(:$)
(.*)((\:|\$)([a-zA-Z\/0-9]+|$))
(.*)((\:|\$)(.*|$))
Substitution -> $1

I would use a simple regex here:
^(.*?)(?::(?!\\)|\$).*
It matches:
^ - start of string
(.*?) - capture into Group 1 as few symbols (other than a newline) as possible before the first...
(?::(?!\\)|\$) - either : that is not followed by \ (with (?::(?!\\)) or a literal $ (with \$)
.* - match the rest of the line
See IDEONE demo:
List<String> strs = Arrays.asList("JOB:/Level0_APP1_12345_0/", "JOB:Level1_DBASW_t323dk23_p1",
"JOB$SAV:", "backup:SYNC1","QUERY:logs","QUERY$maps:","QUERY:","FS1:\\","PS:\\MXMI");
for (String str : strs)
System.out.println(str.replaceAll("^(.*?)(?::(?!\\\\)|\\$).*", "$1"));
Output:
JOB
JOB
JOB
backup
QUERY
QUERY
QUERY
FS1:\
PS:\MXMI

Try this:
^(\w+(?=:\\.+):\\.+|[^:$]+)
The first capturing group ($1) is what you are looking for

Match a single senerio with ANTLR and skip everything else as noise

I defined a simple grammar using an ANTLR V4 Eclipse Plugin. I want to parse a file that contains Coldfusion cfscript code, and find every instance of a property definition. For example:
property name="productTypeID" ormtype="string" length="32" fieldtype="id" generator="uuid" unsavedvalue="" default="";
That is, a property keyword followed by any number of attributes, line terminated with a semicolon.
.g4 file
grammar CFProperty;
property : 'property ' (ATR'='STRING)+EOL; // match keyword property followed by an attribute definition
ATR : [a-zA-Z]+; // match lower and upper-case identifiers name
STRING: '"' .*? '"'; // match any string
WS : [ \t\r\n]+ -> skip; // skip spaces, tabs, newlines
EOL : ';'; // end of the property line
I put together a simple java project that uses the generated parser, tree-walker etc to printout the occurrences of those matches.
The input I'm testing this with is:
"property id=\"actionID\" name=\"actionName\" attr=\"actionAttr\" hbMethod=\"HBMethod\"; public function some funtion {//some text} property name=\"actionID\" name=\"actionName\" attr=\"actionAttr\" hbMethod=\"HBMethod\"; \n more noise "
My issue is that this is only matching:
property id="actionID" name="actionName" attr="actionAttr" hbMethod="HBMethod";
And because it doesn't understand everthing else to be noise, it doesn't match the second instance of the property definition.
How can I match on multiple instances of the property definition and match on everything else in-between as noise to be skipped?

You can use lexer mode to do what you want. One mode for property and stuffs and one mode for noise. The idea behind mode is to go from a mode (a state) to another following token we found during lexing operation.
To do this, you have to cut your grammar in two files, the parser in one file and the lexer in the other.
Here is the lexer part (named TestLexer.g4 in my case)
lexer grammar TestLexer;
// Normal mode
PROPERTY : 'property';
EQUALS : '=';
ATR : [a-zA-Z]+; // match lower and upper-case identifiers name
STRING: '"' .*? '"'; // match any string
WS : [ \t\r\n]+ -> skip; // skip spaces, tabs, newlines
EOL : ';' -> pushMode(NOISE); // when ';' is found, go to noise mode where everything is skip
mode NOISE;
NOISE_PROPERTY : 'property' -> type(PROPERTY), popMode; // when 'property' is found, we say it's a PROPERTY token and we go back to normal mode
ALL : .+? -> skip; // skip all other stuffs
Here is the parser part (named Test.g4 in my case)
grammar Test;
options { tokenVocab=TestLexer; }
root : property+;
property : PROPERTY (ATR EQUALS STRING)+ EOL; // match keyword property followed by an attribute definition
This should do the work :)

IPV6 address into compressed form in Java

I have used Inet6Address.getByName("2001:db8:0:0:0:0:2:1").toString() method to compress IPv6 address, and the output is 2001:db8:0:0:0:0:2:1 ,but i need 2001:db8::2:1 . , Basically the compression output should based on RFC 5952 standard , that is
Shorten as Much as Possible : For example, 2001:db8:0:0:0:0:2:1 must be shortened to
2001:db8::2:1.Likewise, 2001:db8::0:1 is not acceptable,
because the symbol "::" could have been used to produce a
shorter representation 2001:db8::1.
Handling One 16-Bit 0 Field : The symbol "::" MUST NOT be used to shorten just one 16-bit 0 field.
For example, the representation 2001:db8:0:1:1:1:1:1 is correct, but
2001:db8::1:1:1:1:1 is not correct.
Choice in Placement of "::" : = When there is an alternative choice in the placement of a "::", the
longest run of consecutive 16-bit 0 fields MUST be shortened (i.e.,
the sequence with three consecutive zero fields is shortened in 2001:
0:0:1:0:0:0:1). When the length of the consecutive 16-bit 0 fields
are equal (i.e., 2001:db8:0:0:1:0:0:1), the first sequence of zero
bits MUST be shortened. For example, 2001:db8::1:0:0:1 is correct
representation.
I have also checked another post in Stack overflow, but there was no condition specified (example choice in placement of ::).
Is there any java library to handle this? Could anyone please help me?
Thanks in advance.

How about this?
String resultString = subjectString.replaceAll("((?::0\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)", "::$2").replaceFirst("^0::","::");
Explanation without Java double-backslash hell:
( # Match and capture in backreference 1:
(?: # Match this group:
:0 # :0
\b # word boundary
){2,} # twice or more
) # End of capturing group 1
:? # Match a : if present (not at the end of the address)
(?! # Now assert that we can't match the following here:
\S* # Any non-space character sequence
\b # word boundary
\1 # the previous match
:0 # followed by another :0
\b # word boundary
) # End of lookahead. This ensures that there is not a longer
# sequence of ":0"s in this address.
(\S*) # Capture the rest of the address in backreference 2.
# This is necessary to jump over any sequences of ":0"s
# that are of the same length as the first one.
Input:
2001:db8:0:0:0:0:2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1:0:0:0:1
2001:db8:0:0:1:0:0:1
2001:db8:0:0:1:0:0:0
Output:
2001:db8::2:1
2001:db8:0:1:1:1:1:1
2001:0:0:1::1
2001:db8::1:0:0:1
2001:db8:0:0:1::
(I hope the last example is correct - or is there another rule if the address ends in 0?)

I recently ran into the same problem and would like to (very slightly) improve on Tim's answer.
The following regular expression offers two advantages:
((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)
Firstly, it incorporates the change to match multiple zeroes. Secondly, it also correctly matches addresses where the longest chain of zeroes is at the beginning of the address (such as 0:0:0:0:0:0:0:1).

Guava's InetAddresses class has toAddrString() which formats according to RFC 5952.

java-ipv6 is almost what you want. As of version 0.10 it does not check for the longest run of zeroes to shorten with :: - for instance 0:0:1:: is shortened to ::1:0:0:0:0:0. It is a very decent library for the handling of IPv6 addresses, though, and this problem should be fixed with version 0.11, such that the library is RFC 5952 compliant.

The open-source IPAddress Java library can do as described, it provides numerous ways of producing strings for IPv4 and/or IPv6, including the canonical string which for IPv6 matches rfc 5952. Disclaimer: I am the project manager of that library.
Using the examples you list, sample code is:
IPAddress addr = new IPAddressString("2001:db8:0:0:0:0:2:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8::2:1
addr = new IPAddressString("2001:db8:0:1:1:1:1:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:db8:0:1:1:1:1:1
addr = new IPAddressString("2001:0:0:1:0:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
// 2001:0:0:1::1
addr = new IPAddressString("2001:db8:0:0:1:0:0:1").getAddress();
System.out.println(addr.toCanonicalString());
//2001:db8::1:0:0:1

After performing some tests, I think the following captures all the different IPv6 scenarios:
"((?:(?::0|0:0?)\\b){2,}):?(?!\\S*\\b\\1:0\\b)(\\S*)" -> "::$2"

Not quite elegant but this is my proposal (based on chrixm work):
public static String shortIpv6Form(String fullIP) {
fullIP = fullIP.replaceAll("^0{1,3}", "");
fullIP = fullIP.replaceAll("(:0{1,3})", ":");
fullIP = fullIP.replaceAll("(0{4}:)", "0:");
//now we have full form without unnecessaires zeros
//Ex:
//0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200:0:0:0:0:0:0
//0000:0000:0000:1200:0000:0000:0000:8351 -> 0:0:0:1200:0:0:0:8351
//0000:125f:0000:94dd:e53f:0000:61a9:0000 -> 0:125f:0:94dd:e53f:0:61a9:0
//0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
//compress to short notation
fullIP = fullIP.replaceAll("((?:(?:^|:)0+\\b){2,}):?(?!\\S*\\b\\1:0+\\b)(\\S*)", "::$2");
return fullIP;
}
results:
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351 ->
7469:125f:8eb6:94dd:e53f:cfe7:61a9:8351
7469:125f:0000:0000:e53f:cfe7:0000:0000 -> 7469:125f::e53f:cfe7:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:125f:0000:0000:000f:c000:0000:0000 -> 7469:125f::f:c000:0:0
7469:0000:0000:94dd:0000:0000:0000:8351 -> 7469:0:0:94dd::8351
0469:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
469:125f:8eb6:94dd:0:cfe7:61a9:8351
0069:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
69:125f:8eb6:94dd:0:cfe7:61a9:8351
0009:125f:8eb6:94dd:0000:cfe7:61a9:8351 ->
9:125f:8eb6:94dd:0:cfe7:61a9:8351
0000:0000:8eb6:94dd:e53f:0007:6009:8350 ->
::8eb6:94dd:e53f:7:6009:8350 0000:0000:8eb6:94dd:e53f:0007:6009:8300
-> ::8eb6:94dd:e53f:7:6009:8300 0000:0000:8eb6:94dd:e53f:0007:6009:8000 ->
::8eb6:94dd:e53f:7:6009:8000 7469:0000:0000:0000:e53f:0000:0000:8300
-> 7469::e53f:0:0:8300 7009:100f:8eb6:94dd:e000:cfe7:6009:8351 -> 7009:100f:8eb6:94dd:e000:cfe7:6009:8351
7469:100f:8006:900d:e53f:cfe7:61a9:8351 ->
7469:100f:8006:900d:e53f:cfe7:61a9:8351
7000:1200:8e00:94dd:e53f:cfe7:0000:0001 ->
7000:1200:8e00:94dd:e53f:cfe7:0:1
0000:0000:0000:0000:0000:0000:0000:0000 -> ::
0000:0000:0000:94dd:0000:0000:0000:0000 -> 0:0:0:94dd::
0000:1200:0000:0000:0000:0000:0000:0000 -> 0:1200::
0000:0000:0000:1200:0000:0000:0000:8351 -> ::1200:0:0:0:8351
0000:125f:0000:94dd:e53f:0000:61a9:0000 ->
0:125f:0:94dd:e53f:0:61a9:0 7469:0000:8eb6:0000:e53f:0000:61a9:0000
-> 7469:0:8eb6:0:e53f:0:61a9:0 0000:125f:0000:94dd:0000:cfe7:0000:8351 ->
0:125f:0:94dd:0:cfe7:0:8351 0000:025f:0000:94dd:0000:cfe7:0000:8351
-> 0:25f:0:94dd:0:cfe7:0:8351 0000:005f:0000:94dd:0000:cfe7:0000:8351 -> 0:5f:0:94dd:0:cfe7:0:8351
0000:000f:0000:94dd:0000:cfe7:0000:8351 -> 0:f:0:94dd:0:cfe7:0:8351
0000:0000:0000:0000:0000:0000:0000:0001 -> ::1

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Modification of java regex - java

I suggest using a version with a lazy limiting quantifier {1,9}? and optional _: ./([A-Z0-9_]{1,9}?)(_O|_?Full).[.]cmd This way, we match as few characters with [A-Z0-9_]{1,9}? as possible to return a valid captured subtext, and _?Full part can hold the optional underscore. See the regex demo

I've noticed that unnecessary tail is allways started with: (optional) _, letter in uppercase, letter in lowercase. So, universal solution is: .\/([^a-z]?)[_]?[A-Z][a-z].*

Related

Searching Strings containing a regex in createCriteria Method

Regular expression: Replace everything before first occurence

Java regex for replacement with delimiter as : or $

Match a single senerio with ANTLR and skip everything else as noise

IPV6 address into compressed form in Java

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Modification of java regex - java

I suggest using a version with a lazy limiting quantifier {1,9}? and optional _: .*/([A-Z0-9_]{1,9}?)(_O|_?Full).*[.]cmd This way, we match as few characters with [A-Z0-9_]{1,9}? as possible to return a valid captured subtext, and _?Full part can hold the optional underscore. See the regex demo

I've noticed that unnecessary tail is allways started with: (optional) _, letter in uppercase, letter in lowercase. So, universal solution is: .*\/([^a-z]*?)[_]?[A-Z][a-z].*

Related

Searching Strings containing a regex in createCriteria Method

Regular expression: Replace everything before first occurence

Java regex for replacement with delimiter as : or $

Match a single senerio with ANTLR and skip everything else as noise

IPV6 address into compressed form in Java

Categories

Resources

I suggest using a version with a lazy limiting quantifier {1,9}? and optional _: ./([A-Z0-9_]{1,9}?)(_O|_?Full).[.]cmd This way, we match as few characters with [A-Z0-9_]{1,9}? as possible to return a valid captured subtext, and _?Full part can hold the optional underscore. See the regex demo

I've noticed that unnecessary tail is allways started with: (optional) _, letter in uppercase, letter in lowercase. So, universal solution is: .\/([^a-z]?)[_]?[A-Z][a-z].*