how to extract a part of string using regex

how to extract a part of string using regex - java

Am trying to extract last three strings i.e. 05,06,07. However my regex is working the other way around which is extracting the first three strings. Can someone please help me rectify my mistake in the code.
Pattern p = Pattern.compile("^((?:[^,]+,){2}(?:[^,]+)).+$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result;
if (m.matches()) {
result = m.group(1);
}
System.out.println(result);
My current output:
CgIn,f,CgIn.util:srv2
Expected output:
05,06,07

You may fix it as
Pattern p = Pattern.compile("[^,]*(?:,[^,]*){2}$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result = "";
if (m.find()) {
result = m.group(0);
}
System.out.println(result);
See the Java demo
The regex is
[^,]*(?:,[^,]*){2}$
See the regex demo.
Pattern details
[^,]* - 0+ chars other than ,
(?:,[^,]*){2} - 2 repetitions of
, - a comma
[^,]* - 0+ chars other than ,
$ - end of string.
Note that you should use Matcher#find() with this regex to find a partial match.

Related

Regular Expression in Java. Splitting a string using pattern and matcher

I am trying to get all the matching groups in my string.
My regular expression is "(?<!')/|/(?!')". I am trying to split the string using regular expression pattern and matcher. string needs to be split by using /, but '/'(surrounded by ') this needs to be skipped. for example "One/Two/Three'/'3/Four" needs to be split as ["One", "Two", "Three'/'3", "Four"] but not using .split method.
I am currently the below
// String to be scanned to find the pattern.
String line = "Test1/Test2/Tt";
String pattern = "(?<!')/|/(?!')";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
But it always saying "NO MATCH". where i am doing wrong? and how to fix that?
Thanks in advance

To get the matches without using split, you might use
[^'/]+(?:'/'[^'/]*)*
Explanation
[^'/]+ Match 1+ times any char except ' or /
(?: Non capture group
'/'[^'/]* Match '/' followed by optionally matching any char except ' or /
)* Close group and optionally repeat it
Regex demo | Java demo
String regex = "[^'/]+(?:'/'[^'/]*)*";
String string = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Output
One
Two
Three'/'3
Four
Edit
If you do not want to split don't you might also use a pattern to not match / but only when surrounded by single quotes
[^/]+(?:(?<=')/(?=')[^/]*)*
Regex demo

Try this.
String line = "One/Two/Three'/'3/Four";
Pattern pattern = Pattern.compile("('/'|[^/])+");
Matcher m = pattern.matcher(line);
while (m.find())
System.out.println(m.group());
output:
One
Two
Three'/'3
Four

Here is simple pattern matching all desired /, so you can split by them:
(?<=[^'])\/(?=')|(?<=')\/(?=[^'])|(?<=[^'])\/(?=[^'])
The logic is as follows: we have 4 cases:
/ is sorrounded by ', i.e. `'/'
/ is preceeded by ', i.e. '/
/ is followed by ', i.e. /'
/ is sorrounded by characters other than '
You want only exclude 1. case. So we need to write regex for three cases, so I have written three similair regexes and used alternation.
Explanation of the first part (other two are analogical):
(?<=[^']) - positiva lookbehind, assert what preceeds is differnt frim ' (negated character class [^']
\/ - match / literally
(?=') - positiva lookahead, assert what follows is '\
Demo with some more edge cases

Try something like this:
String line = "One/Two/Three'/'3/Four";
String pattern = "([^/]+'/'\d)|[^/]+";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
boolean found = false;
while(m.find()) {
System.out.println("Found value: " + m.group() );
found = true;
}
if(!found) {
System.out.println("NO MATCH");
}
Output:
Found value: One
Found value: Two
Found value: Three'/'3
Found value: Four

How to replace character in the string using regex in java?

I want to replace every x in the end of line or string and behind every letters except aiueo with nya.
Expected input and output:
Input: bapakx
Output: bapaknya
I've tried this one:
String myString = "bapakx";
String regex = "[^aiueo]x(\\s|$)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(myString);
if(m.find()){
myString = m.replaceAll("nya");
}
But the output is not bapaknya but bapanya. The k character is also replaced. How can I solve this?

To get consonant back Use a zero width lookbehind in your regex as:
String regex = "(?<=[^aiueo])x(?=\\s|$)";
Here (?<=[^aiueo]) will only assert presence of consonant before x but won't match it.
Alternatively you can use capture groups:
String regex = "([^aiueo])x(\\s|$)";
and use it as:
myString = m.replaceAll("$1nya");

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?

You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa

Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

How to get a dollar sign in Java regex

I have been lookinig through this : https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
However I still have difficulties to write the right command to get all the expression folllowing this pattern :
<$FB $TWTR are getting plummetted>
(<> just signal the beginning of the sentence-tweet actually as I am parsing twitter). I want to extract FB TWTR.
Any help much appreciated.

Here is a 2-step approach: we extract <...> groups with a regex and then split the chunks into words and see if they start with $.
String s = "<$FB $TWTR are getting plummetted>";
Pattern pattern = Pattern.compile("<([^>]+)>");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
String[] chks = matcher.group(1).split(" ");
for (int i = 0; i<chks.length; i++)
{
if (chks[i].startsWith("$"))
System.out.println(chks[i].substring(1));
}
}
See demo
And here is a 1-regex approach (see demo), use only if you feel confident with regex:
String s = "<$FB $TWTR are getting plummetted>";
Pattern pattern = Pattern.compile("(?:<|(?!^)\\G)[^>]*?\\$([A-Z]+)");
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
System.out.println(matcher.group(1));
}
The regex used here is (?:<|(?!^)\G)[^>]*?\$([A-Z]+).
It matches:
(?:<|(?!^)\G) - A literal < and then at the end of each successful match
[^>]*? - 0 or more characters other than > (as few as possible)
\$ - literal $
([A-Z]+) - match and capture uppercase letters (replace with what best suits your purpose, perhaps \\w).

Java Regex group matches spaces

I have this regex and my output seems to be matching each single space but the capturing group is only alpha chars. I must be missing something.
String regexstring = new String("1234567 Mike Peloso ");
Pattern pattern = Pattern.compile("[A-Za-z]*");
Matcher matcher = pattern.matcher(regexstring);
while(matcher.find())
{
System.out.println(Integer.toString(matcher.start()));
String someNumberStr = matcher.group();
System.out.println(someNumberStr);
}

There is no capturing group, but you need to use the + quantifier (meaning 1 or more times). The * quantifier matches the preceding element zero or more times and creates a disaster of output...
Pattern pattern = Pattern.compile("[A-Za-z]+");
And then print the match result:
while (matcher.find()) {
System.out.println(matcher.start());
System.out.println(matcher.group());
}
Working Demo

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how to extract a part of string using regex - java

Related

Regular Expression in Java. Splitting a string using pattern and matcher

How to replace character in the string using regex in java?

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

How to get a dollar sign in Java regex

Java Regex group matches spaces

Categories

Resources