Get matched pattern value in java regex - java

I'm doing some transliteration with java and everything works great, but it would be nice to have matched pattern. Is it possible?
For example:
for surname GULEVSKAIA I generate such pattern
(^g+(yu|u|y)l+(io|e|ye|yo|jo|ye)(v|b|w)+(s|c)+(k|c)+a(ya|ia|ja|a|y)(a)*)
can I somehow get information, that actually matched
g
u
l
e
...
etc
As you can see, sometimes it is NOT one letter.

You may achieve this , once pattern is matched , retrive the macthed string using group() method of Matcher class passing 0 as value. then convert that string to chars array and print those characters like below
String line = "gulevskaia";
String pattern = "(^g+(yu|u|y)l+(io|e|ye|yo|jo|ye)(v|b|w)+(s|c)+(k|c)+a(ya|ia|ja|a|y)(a)*)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
char chars[] =m.group(0).toCharArray();
for(int i=0;i<chars.length;i++)
System.out.println(chars[i]);
}

Related

A sample regular expression

I have sample content string repeated in a file which I wanna to retrieve its double value from it.the string content is "(AIC)|234.654 |" which I wanna retrieve the 234.654 from that...the "(AIC)|" is always fixed but the numbers change in other occasions so I am using regular expression as follow..but it says there is no match using below expression..any help would be appreciated
String contents="(AIC)|234.654 |";
Pattern p = Pattern.compile("AIC\\u0029{1}\\u007C{1}\\d+u002E{1}\\d+");
Matcher m = p.matcher(contents);
boolean b = m.find();
String t=m.group();
The above expression doest find any match and throw exception..
Thanks for any help
Your code has several typos, but beside them, you say you need to match the number inside the brackets, but you are referring to the whole match with .group(). You need to set a capturing group to access that number with .group(1).
Here is a fixed code:
String content="(AIC)|234.654 |";
Pattern p = Pattern.compile("AIC\\)\\|(\\d+\\.\\d+)");
Matcher m = p.matcher(content);
if (m.find())
{
System.out.println(m.group(1));
}
See IDEONE demo
If the number can be integer, just use an optional non-capturing group around the decimal part: Pattern.compile("AIC\\)\\|(\\d+(?:\\.\\d+)?)");
I think this regex should do the work:
(?<=\|)[\d\.]*(?=\s*\|)
It will only match digits and dots after a | and before an optional space and another |
And the complete code:
String content="(AIC)|234.654 |";
Pattern p = Pattern.compile("(?<=\\|)[\\d\\.]*(?=\\s*\\|)");
Matcher m = p.matcher(content);
boolean b = m.find();
String t=m.group();

How do I build a regex to match these `long` values?

How do I build a regular expression for a long data type in Java, I currently have a regex expression for 3 double values as my pattern:
String pattern = "(max=[0-9]+\\.?[0-9]*) *(total=[0-9]+\\.?[0-9]*) *(free=[0-9]+\\.?[0-9]*)";
I am constructing the pattern using the line:
Pattern a = Pattern.compile("control.avgo:", Pattern.CASE_INSENSITIVE);
I want to match the numbers following the equals signs in the example text below, from the file control.avgo.
max=259522560, total=39325696, free=17979640
What do I need to do to correct my code to match them?
Could it be that you actually need
Pattern a = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
instead of
Pattern a = Pattern.compile("control.avgo:", Pattern.CASE_INSENSITIVE);
because your current code uses "control.avgo:" as the regex, and not the pattern you have defined.
You need to address several errors, including:
Your pattern specifies real numbers, but your question asks for long integers.
Your pattern omits the commas in the string being searched.
The first argument to Pattern.compile() is the regular expression, not the string being searched.
This will work:
String sPattern = "max=([0-9]+), total=([0-9]+), free=([0-9]+)";
Pattern pattern = Pattern.compile( sPattern, Pattern.CASE_INSENSITIVE );
String source = "control.avgo: max=259522560, total=39325696, free=17979640";
Matcher matcher = pattern.matcher( source );
if ( matcher.find()) {
System.out.println("max=" + matcher.group(1));
System.out.println("total=" + matcher.group(2));
System.out.println("free=" + matcher.group(3));
}
If you want to convert the numbers you find to a numeric type, use Long.valueOf( String ).
In case you only need to find any numerical preceded by "="...
String test = "3.control.avgo: max=259522560, total=39325696, free=17979640";
// looks for the "=" sign preceding any numerical sequence of any length
Pattern pattern = Pattern.compile("(?<=\\=)\\d+");
Matcher matcher = pattern.matcher(test);
// keeps on searching until cannot find anymore
while (matcher.find()) {
// prints out whatever found
System.out.println(matcher.group());
}
Output:
259522560
39325696
17979640

java find() always returning true

I am trying to find a pattern in the string in java. Below is the code written as-
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]?"; //[A-Za-z2-9\W]?
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));
I am expecting the boolean value to be false, but instead it is always returning as true. Any input or idea where I am going wrong.?
The ? makes the entire character group optional. So your regex essentially means "find any character* ... or not". And the "or not" part means it matches the empty string.
* not really "any", just those characters that are represented in ASCII.
[A-Za-z]? means "zero or one letters". It will always match somewhere in the string; even if there aren't any letters, it will match zero of them.
The below regex should work;
[A-Za-z]?-----> once or not at all
Reference :
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]";// to find letter
String regex ="[A-Za-z]+$";// to find last string..
String regex ="[^0-9,;]";//means non digits and , ;
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));

Java Regex / blank spaces issue

String regex = "(\\s*T\\s*R\\s*A\\s*)*";
Pattern p = Pattern.compile(regex);
Trying to match "TRA", "T R A", "T R A", etc. Works fine for first case, with no spaces, but not for anything with spaces (just ignores). Not sure what I'm doing wrong.
EDIT
Essentially, I'm trying to match all occurrences of TRA, whether or not there are an arbitrary number of spaces between each letter (or occurrence).
For example: "TRATTR A T RA T RA" has 4 occurrences, and I want to match them all with one regex.
You should use:
String regex = "(\\s*T\\s*R\\s*A\\s*)";
instead of:
String regex = "(\\s*T\\s*R\\s*A\\s*)*";
Your regex is trying to match 0 or more occurrences of the given text and as per your question you're just trying to match it once.
Update: To match multiple occurrences use code like this:
String regex = "(\\s*T\\s*R\\s*A\\s*)";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("T R A T R A T R A");
while (m.find())
System.out.printf("name=[%s]%n", m.group(1));
For your goal, the correct regex would be (\\s*T\\s*R\\s*A\\s*)+, as it requires at least one occurence of TRA group and won't match out the empty string.
Example:
String regex = "(\\s*T\\s*R\\s*A\\s*)+";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher("S T R A T R A T R A N G E");
if (m.find()) {
System.out.println(m.group());
} else {
System.out.println("No match");
}
Output:
T R A T R A T R A
This works for me:
String regex = "(\\s*T\\s*R\\s*A\\s*)";

Why isn't this lookahead assertion working in Java?

I come from a Perl background and am used to doing something like the following to match leading digits in a string and perform an in-place increment by one:
my $string = '0_Beginning';
$string =~ s|^(\d+)(?=_.*)|$1+1|e;
print $string; # '1_Beginning'
With my limited knowledge of Java, things aren't so succinct:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
string.replaceFirst( p.toString(), oneMore.toString() ); //
The regex doesn't match here... but it did in Perl.
What am I doing wrong here?
Actually it matches. You can find out by printing
System.out.println(p.matcher(string).find());
The issue is with line
String digit = string.replaceFirst( p.toString(), "$1" );
which is actually a do-nothing, because it replaces the first group (which is all you match, the lookahead is not part of the match) with the content of the first group.
You can get the desired result (namely the digit) via the following code
Matcher m = p.matcher(string);
String digit = m.find() ? m.group(1) : "";
Note: you should check m.find() anyways if nothing matches. In this case you may not call parseInt and you'll get an error. Thus the full code looks something like
Pattern p = Pattern.compile("^(\\d+)(?=_.*)");
String string = "0_Beginning";
Matcher m = p.matcher(string);
if (m.find()) {
String digit = m.group(1);
Integer oneMore = Integer.parseInt(digit) + 1;
string = m.replaceAll(oneMore.toString());
System.out.println(string);
} else {
System.out.println("No match");
}
Let's see what you are doing here.
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
You declare and initialize String and pattern objects.
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
(You are converting the pattern back into a string, and replaceFirst creates a new Pattern from this. Is this intentional?)
As Howard says, this replaces the first match of the pattern in the string with the contents of the first group, and the match of the pattern is just 0 here, as the first group. Thus digit is equal to string, ...
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
... and your parsing fails here.
string.replaceFirst( p.toString(), oneMore.toString() ); //
This would work (but convert the pattern again to string and back to pattern).
Here how I would do this:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher matcher = p.matcher(string);
StringBuffer result = new StringBuffer();
while(matcher.find()) {
int number = Integer.parseInt(matcher.group());
m.appendReplacement(result, String.valueOf(number + 1));
}
m.appendTail(result);
return result.toString(); // 1_Beginning
(Of course, for your regex the loop will only execute once, since the regex is anchored.)
Edit: To clarify my statement about string.replaceFirst:
This method does not return a pattern, but uses one internally. From the documentation:
Replaces the first substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceFirst(repl)
Here we see that a new pattern is compiled from the first argument.
This also shows us another way to do what you did want to do:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher m = p.matcher(string);
if(m.find()) {
digit = m.group();
int oneMore = Integer.parseInt( digit ) + 1
return m.replaceFirst(string, String.valueOf(oneMore));
}
This only compiles the pattern once, instead of thrice like in your original program - but still does the matching twice (once for find, once for replaceFirst), instead of once like in my program.

Categories