Java Regex / blank spaces issue - java

String regex = "(\\s*T\\s*R\\s*A\\s*)*";
Pattern p = Pattern.compile(regex);
Trying to match "TRA", "T R A", "T R A", etc. Works fine for first case, with no spaces, but not for anything with spaces (just ignores). Not sure what I'm doing wrong.
EDIT
Essentially, I'm trying to match all occurrences of TRA, whether or not there are an arbitrary number of spaces between each letter (or occurrence).
For example: "TRATTR A T RA T RA" has 4 occurrences, and I want to match them all with one regex.

You should use:
String regex = "(\\s*T\\s*R\\s*A\\s*)";
instead of:
String regex = "(\\s*T\\s*R\\s*A\\s*)*";
Your regex is trying to match 0 or more occurrences of the given text and as per your question you're just trying to match it once.
Update: To match multiple occurrences use code like this:
String regex = "(\\s*T\\s*R\\s*A\\s*)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("T R A T R A T R A");
while (m.find())
System.out.printf("name=[%s]%n", m.group(1));

For your goal, the correct regex would be (\\s*T\\s*R\\s*A\\s*)+, as it requires at least one occurence of TRA group and won't match out the empty string.
Example:
String regex = "(\\s*T\\s*R\\s*A\\s*)+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("S T R A T R A T R A N G E");
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("No match");
}
Output:
T R A T R A T R A

This works for me:
String regex = "(\\s*T\\s*R\\s*A\\s*)";

Related

No match for Java Regular Expression

I am running into an issue where my code is unable to find regex occurrences. Code:
String content = "This\ is\ an\ example.=This is an example\nThis\ is\ second\:=This is second"
String regex = "\"^.*(?=\\=)\"gm";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
mKeys turns out to be empty. I have already validated my regex here https://regex101.com/r/YResRc/3. I am expecting the list to contain two keys from the content.
Your content contains no " quotes, and no text gm, so why would you expect that regex to match?
FYI: Syntaxes like "foo"gm or /foo/gm are something other languages do for regex literals. Java doesn't do that.
The g flag is implied by the fact that you're using a find() loop, and m is the MULTILINE flag that affects ^ and $ and you can specify that using the (?m) pattern, or by adding a second parameter to compile(), i.e. one of these ways:
Pattern p = Pattern.compile("foo", Pattern.MULTILINE);
Pattern p = Pattern.compile("(?m)foo");
Your regex should simply be:
(?m)^.*(?==)
which means: Match everything from the beginning of a line up to the last = sign on the line.
Test
String content = "This is an example.=This is an example\nThis is second:=This is second";
String regex = "(?m)^.*(?==)";
Matcher m = Pattern.compile(regex).matcher(content);
List<String> mKeys = new ArrayList<>();
while (m.find()) {
mKeys.add(m.group());
}
System.out.println(mKeys);
Output
[This is an example., This is second:]

how to extract a part of string using regex

Am trying to extract last three strings i.e. 05,06,07. However my regex is working the other way around which is extracting the first three strings. Can someone please help me rectify my mistake in the code.
Pattern p = Pattern.compile("^((?:[^,]+,){2}(?:[^,]+)).+$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result;
if (m.matches()) {
result = m.group(1);
}
System.out.println(result);
My current output:
CgIn,f,CgIn.util:srv2
Expected output:
05,06,07
You may fix it as
Pattern p = Pattern.compile("[^,]*(?:,[^,]*){2}$");
String line = "CgIn,f,CgIn.util:srv2,1,11.65,42,42,42,42,04,05,06,07";
Matcher m = p.matcher(line);
String result = "";
if (m.find()) {
result = m.group(0);
}
System.out.println(result);
See the Java demo
The regex is
[^,]*(?:,[^,]*){2}$
See the regex demo.
Pattern details
[^,]* - 0+ chars other than ,
(?:,[^,]*){2} - 2 repetitions of
, - a comma
[^,]* - 0+ chars other than ,
$ - end of string.
Note that you should use Matcher#find() with this regex to find a partial match.

Get matched pattern value in java regex

I'm doing some transliteration with java and everything works great, but it would be nice to have matched pattern. Is it possible?
For example:
for surname GULEVSKAIA I generate such pattern
(^g+(yu|u|y)l+(io|e|ye|yo|jo|ye)(v|b|w)+(s|c)+(k|c)+a(ya|ia|ja|a|y)(a)*)
can I somehow get information, that actually matched
g
u
l
e
...
etc
As you can see, sometimes it is NOT one letter.
You may achieve this , once pattern is matched , retrive the macthed string using group() method of Matcher class passing 0 as value. then convert that string to chars array and print those characters like below
String line = "gulevskaia";
String pattern = "(^g+(yu|u|y)l+(io|e|ye|yo|jo|ye)(v|b|w)+(s|c)+(k|c)+a(ya|ia|ja|a|y)(a)*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
char chars[] =m.group(0).toCharArray();
for(int i=0;i<chars.length;i++)
System.out.println(chars[i]);
}

replace special characters in java

I have an Arabic string, that I need to remove all special characters, LATIN ALPHABET , punctuation e.g. (, . ;) ,and Arabic punctuation e.g. (َ ً ُ ِ) I have wrote the following code
String input = "some text";
Pattern p = Pattern.compile("[\\p{P}\\w]");
java.util.regex.Matcher m = p.matcher(input);
while (m.find()) {
}
m.reset();
input = m.replaceAll(" ");
p = Pattern.compile("[\\p{Mn}\\p{Nd}\\p{InLatin-1Supplement}]+");
m = p.matcher(input);
while (m.find()) {
}
m.reset();
input = m.replaceAll("");
it worked will for almost all characters, but I still have problems removing or replacing those ($ ^ + < > |), I don't want to remove each one apart by repeating replaceAll statement, I even tried
Pattern p = Pattern.compile("[^\\p{L}\\p{Nd}]+");
also kept finding those in the resulting text ($ ^ + < > |), any way to do it?

A sample regular expression

I have sample content string repeated in a file which I wanna to retrieve its double value from it.the string content is "(AIC)|234.654 |" which I wanna retrieve the 234.654 from that...the "(AIC)|" is always fixed but the numbers change in other occasions so I am using regular expression as follow..but it says there is no match using below expression..any help would be appreciated
String contents="(AIC)|234.654 |";
Pattern p = Pattern.compile("AIC\\u0029{1}\\u007C{1}\\d+u002E{1}\\d+");
Matcher m = p.matcher(contents);
boolean b = m.find();
String t=m.group();
The above expression doest find any match and throw exception..
Thanks for any help
Your code has several typos, but beside them, you say you need to match the number inside the brackets, but you are referring to the whole match with .group(). You need to set a capturing group to access that number with .group(1).
Here is a fixed code:
String content="(AIC)|234.654 |";
Pattern p = Pattern.compile("AIC\\)\\|(\\d+\\.\\d+)");
Matcher m = p.matcher(content);
if (m.find())
{
System.out.println(m.group(1));
}
See IDEONE demo
If the number can be integer, just use an optional non-capturing group around the decimal part: Pattern.compile("AIC\\)\\|(\\d+(?:\\.\\d+)?)");
I think this regex should do the work:
(?<=\|)[\d\.]*(?=\s*\|)
It will only match digits and dots after a | and before an optional space and another |
And the complete code:
String content="(AIC)|234.654 |";
Pattern p = Pattern.compile("(?<=\\|)[\\d\\.]*(?=\\s*\\|)");
Matcher m = p.matcher(content);
boolean b = m.find();
String t=m.group();

Categories