Using a special email regular expression - java

I have some emails in the form:
staticN123#sub1.mydomain.com
staticN456#sub2.mydomain.com
staticN789#sub3-sub.mydomain.com
The dynamic is the number after the (N or M or F) character, and the subDomain between the # and mydomain.com
I want to make a regular expression that matches this form in a string, and if it's a match, get the number after the N character.

staticN([0-9]+)#.+\.mydomain\.com
instead of [0-9]+ you can also use \d+ which is the same.
the .+ after the # could match too much. eventually you'd like to replace that with [^\.]+ to exclude sub.sub domains.
update:
^staticN(\d+)#[a-z0-9_-]+\.mydomain\.com$
adding ^ and $ to match start and end of the search string to avoid false match to e.g. somthingwrong_staticN123#sub.mydomain.com.xyz
you can test this regexp here link to rubular
--
applying changes discussed in comments below:
^(?:.+<)?static[NMF](\d+)#[a-z0-9_-]+\.mydomain\.com>?$
code example to answer the question in one of the comments:
// input
String str = "reply <staticN123#sub1.mydomain.com";
// example 1
String nr0 = str.replaceAll( "^(?:.+<)?static[NMF](\\d+)#[a-z0-9_-]+\\.mydomain\\.com>?$", "$1" );
System.out.println( nr0 );
// example 2 (precompile regex is faster if it's used more than once afterwards)
Pattern p = Pattern.compile( "^(?:.+<)?static[NMF](\\d+)#[a-z0-9_-]+\\.mydomain\\.com>?$" );
Matcher m = p.matcher( str );
boolean b = m.matches();
String nr1 = m.group( 1 ); // m.group only available after m.matches was called
System.out.println( nr1 );

Related

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance
To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo
Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four
Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases
Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

How do get value between () in java string just before .?

I have to replace file name abc(1).jpg to abc(2).jpg . Here is the code
String example = "my attachements with some name (56).jpg";
Matcher m = Pattern.compile("\\((\\d{1,}).\\)").matcher(example);
int a = 0;
while(m.find()) {
a=Integer.parseInt(m.group(1));
String p = example.replace(String.valueOf(a), String.valueOf(a+1));
}
It is working fien as per given use case . But fails in case of abc(ab)(1)(ab).jpg for this case it just changed to abc(ab)(2)(ab).jpg . Which is not required . So how do i can verify that numeric bracket is just before dot i.e .
You may use
String example = "my attachements with some name (56).jpg";
Matcher m = Pattern.compile("(?<=\\()\\d+(?=\\)\\.)").matcher(example);
example = m.replaceAll(r -> String.valueOf(Integer.parseInt(m.group())+1) );
System.out.println( example );
// => my attachements with some name (57).jpg
See the Java demo. The regex used is
(?<=\()\d+(?=\)\.)
See the regex demo. It matches
(?<=\() - a location immediately preceded with (
\d+ - then consumes 1+ digits
(?=\)\.) - immediately followed with ). char sequence.
If you need to tell the regex to match the dot that is the last dot in the string (where it is most likely the extension delimiter) replace (?=\)\.) with (?=\)\.[^.]*$). See this regex demo.
You can use a lookahead regex for this:
"\\((\\d+)\\)(?=\\.)"
(?=\.) is a lookahead condition that asserts presence of dot right after closing )
RegEx Demo

Nested regexps and replace

I have strings like this <p0=v0 p1=v1 p2=v2 ....> and I want to swap pX with vX to have something like <v0=p0 v1=p1 v2=p2 ....> using regexps.
I want only pairs in <> to be swapped.
I wrote:
Pattern pattern = Pattern.compile("<(\\w*)=(\\w*)>");
Matcher matcher = pattern.matcher("<p1=v1>");
System.out.println(matcher.replaceAll("$2=$1"));
But it works only with a single pair pX=vX
Could someone explain me how to write regexp that works for multiple pairs?
Simple, use groups:
String input = "<p0=v0 p1=v1 p2=v2>";
// |group 1
// ||matches "p" followed by one digit
// || |... followed by "="
// || ||group 2
// || |||... followed by "v", followed by one digit
// || ||| |replaces group 2 with group 1,
// || ||| |re-writes "=" in the middle
System.out.println(input.replaceAll("(p[0-9])=(v[0-9])", "$2=$1"));
Output:
<v0=p0 v1=p1 v2=p2>
You can use this pattern:
"((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)"
To ensure that the pairs are after an opening angle bracket, the pattern start with:
(?:<|\\G(?<!\\A))
that means: an opening angle bracket OR at the end of the last match
\\G is an anchor for the position immediatly after the last match or the begining of the string (in other words, it is the last position of the regex engine in the string, that is zero at the start of the string). To avoid a match at the start of the string I added a negative lookbehind (?<!\\A) -> not preceded by the start of the string.
This trick forces each pair to be preceded by an other pair or by a <.
example:
String subject = "p5=v5 <p0=v0 p1=v1 p2=v2 p3=v3> p4=v4";
String pattern = "((?:<|\\G(?<!\\A))\\s*)(p[0-9]+)(\\s*=\\s*)(v[0-9]+)";
String result = subject.replaceAll(pattern, "$1$4$3$2");
If you need p and v to have the same number you can change it to:
String pattern = "((?:<|\\G(?<!\\A))\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$5$4$2");
If parts between angle brackets can contain other things (that are not pairs):
String pattern = "((?:<|\\G(?<!\\A))(?:[^\s>]+\\s*)*?\\s*)(p([0-9]+))(\\s*=\\s*)(v\\3)";
String result = subject.replaceAll(pattern, "$1$4$3$2");
Note: all these patterns only checks if there is an opening angle bracket, but don't check if there is a closing angle bracket. If a closing angle bracket is missing, all pairs will be replaced until there is no more contiguous pairs for the two first patterns and until the next closing angle bracket or the end of the string for the third pattern.
You can check the presence of a closing angle bracket by adding (?=[^<>]*>) at the end of each pattern. However adding this will make your pattern not performant at all. It is better to search parts between angle brackets with (?<=<)[^<>]++(?=>) and to perform the replacement of pairs in a callback function. You can take a look at this post to implement it.
To replace everything between < and > (let's call it tag) is - imho - not possible if the same pattern can occur outside the tag.
Instead to replace everything at once, I'd go for two regexes:
String str = "<p1=v1 p2=v2> p3=v3 <p4=v4>";
Pattern insideTag = Pattern.compile("<(.+?)>");
Matcher m = insideTag.matcher(str);
while(m.find()) {
str = str.replace(m.group(1), m.group(1).replaceAll("(\\w*)=(\\w*)", "$2=$1"));
}
System.out.println(str);
//prints: <v1=p1 v2=p2> p3=v3 <v4=p4>
The matcher grabs everything between < and > and for each match it replaces the content of the first capturing group with the swapped one on the original string, but only if it matches (\w*)=(\w*), of course.
Trying it with
<p1=v1 p2=v2 just some trash> p3=v3 <p4=v4>
gives the output
<v1=p1 v2=p2 just some trash> p3=v3 <v4=p4>
This should work to swap only those pairs between < and >:
String string = "<p0=v0 p1=v1 p2=v2> a=b c=d xyz=abc <foo=bar baz=bat>";
Pattern pattern1 = Pattern.compile("<[^>]+>");
Pattern pattern2 = Pattern.compile("(\\w+)=(\\w+)");
Matcher matcher1 = pattern1.matcher(string);
StringBuffer sbuf = new StringBuffer();
while (matcher1.find()) {
Matcher matcher2 = pattern2.matcher(matcher1.group());
matcher1.appendReplacement(sbuf, matcher2.replaceAll("$2=$1"));
}
matcher1.appendTail(sbuf);
System.out.println(sbuf);
OUTPUT:
<v0=p0 v1=p1 v2=p2> a=b c=d xyz=abc <bar=foo bat=baz>
If Java can do the \G anchor, this will work for unnested <>'s
Find: ((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)
Replace (globally): $1$3=$2
Regex explained
( # (1 start)
(?:
(?! \A | < )
\G # Start at last match
|
< # Or, <
)
[^<>]*?
) # (1 end)
( \w+ ) # (2)
=
( \w+ ) # (3)
(?= [^<>]*? > ) # There must be a closing > ahead
Perl test case
$/ = undef;
$str = <DATA>;
$str =~ s/((?:(?!\A|<)\G|<)[^<>]*?)(\w+)=(\w+)(?=[^<>]*?>)/$1$3=$2/g;
print $str;
__DATA__
<p0=v0 p1=v1 p2=v2 ....>
Output >>
<v0=p0 v1=p1 v2=p2 ....>

split string for returning only the latter part

I have a string like this:
abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;
I want to split first on ; and then on :
Finally the output should be only the latter part around : i.e. my output should be
def, ghi, jkl, pqr, stu, yza,aaa,bbb
This can be done using Split twice i.e. once with ; and then with : and then pattern match to find just the right part next to the :. Howvever, is there a better and optimized solution to achieve this?
So basically you want to fetch the content between ; and :, with : on the left and ; on the right.
You can use this regex: -
"(?<=:)(.*?)(?=;)"
This contains a look-behind for : and a look-ahead for ;. And matches the string preceded by a colon(:) and followed by a semi-colon (;).
Regex Explanation: -
(?<= // Look behind assertion.
: // Check for preceding colon (:)
)
( // Capture group 1
. // Any character except newline
* // repeated 0 or more times
? // Reluctant matching. (Match shortest possible string)
)
(?= // Look ahead assertion
; // Check for string followed by `semi-colon (;)`
)
Here's the working code: -
String str = "abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;";
Matcher matcher = Pattern.compile("(?<=:)(.*?)(?=;)").matcher(str);
StringBuilder builder = new StringBuilder();
while (matcher.find()) {
builder.append(matcher.group(1)).append(", ");
}
System.out.println(builder.substring(0, builder.lastIndexOf(",")));
OUTPUT: -
def,ghi,jkl, pqr,stu, yza,aaa,bbb
String[] tabS="abc:def,ghi,jkl;mno:pqr,stu;vwx:yza,aaa,bbb;".split(";");
StringBuilder sb = new StringBuilder();
Pattern patt = Pattern.compile("(.*:)(.*)");
String sep = ",";
for (String string : tabS) {
sb.append(patt.matcher(string).replaceAll("$2 ")); // ' ' after $2 == ';' replaced
sb.append(sep);
}
System.out.println(sb.substring(0,sb.lastIndexOf(sep)));
output
def,ghi,jkl ,pqr,stu ,yza,aaa,bbb
Don't pattern match unless you have to in Java; if you can't have the ':' character in the field name (abc in your example), you can use indexOf(":") to figure out the "right part".

Why isn't this lookahead assertion working in Java?

I come from a Perl background and am used to doing something like the following to match leading digits in a string and perform an in-place increment by one:
my $string = '0_Beginning';
$string =~ s|^(\d+)(?=_.*)|$1+1|e;
print $string; # '1_Beginning'
With my limited knowledge of Java, things aren't so succinct:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
string.replaceFirst( p.toString(), oneMore.toString() ); //
The regex doesn't match here... but it did in Perl.
What am I doing wrong here?
Actually it matches. You can find out by printing
System.out.println(p.matcher(string).find());
The issue is with line
String digit = string.replaceFirst( p.toString(), "$1" );
which is actually a do-nothing, because it replaces the first group (which is all you match, the lookahead is not part of the match) with the content of the first group.
You can get the desired result (namely the digit) via the following code
Matcher m = p.matcher(string);
String digit = m.find() ? m.group(1) : "";
Note: you should check m.find() anyways if nothing matches. In this case you may not call parseInt and you'll get an error. Thus the full code looks something like
Pattern p = Pattern.compile("^(\\d+)(?=_.*)");
String string = "0_Beginning";
Matcher m = p.matcher(string);
if (m.find()) {
String digit = m.group(1);
Integer oneMore = Integer.parseInt(digit) + 1;
string = m.replaceAll(oneMore.toString());
System.out.println(string);
} else {
System.out.println("No match");
}
Let's see what you are doing here.
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
You declare and initialize String and pattern objects.
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
(You are converting the pattern back into a string, and replaceFirst creates a new Pattern from this. Is this intentional?)
As Howard says, this replaces the first match of the pattern in the string with the contents of the first group, and the match of the pattern is just 0 here, as the first group. Thus digit is equal to string, ...
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
... and your parsing fails here.
string.replaceFirst( p.toString(), oneMore.toString() ); //
This would work (but convert the pattern again to string and back to pattern).
Here how I would do this:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher matcher = p.matcher(string);
StringBuffer result = new StringBuffer();
while(matcher.find()) {
int number = Integer.parseInt(matcher.group());
m.appendReplacement(result, String.valueOf(number + 1));
}
m.appendTail(result);
return result.toString(); // 1_Beginning
(Of course, for your regex the loop will only execute once, since the regex is anchored.)
Edit: To clarify my statement about string.replaceFirst:
This method does not return a pattern, but uses one internally. From the documentation:
Replaces the first substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceFirst(repl)
Here we see that a new pattern is compiled from the first argument.
This also shows us another way to do what you did want to do:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher m = p.matcher(string);
if(m.find()) {
digit = m.group();
int oneMore = Integer.parseInt( digit ) + 1
return m.replaceFirst(string, String.valueOf(oneMore));
}
This only compiles the pattern once, instead of thrice like in your original program - but still does the matching twice (once for find, once for replaceFirst), instead of once like in my program.

Categories