Why isn't this lookahead assertion working in Java? - java

I come from a Perl background and am used to doing something like the following to match leading digits in a string and perform an in-place increment by one:
my $string = '0_Beginning';
$string =~ s|^(\d+)(?=_.*)|$1+1|e;
print $string; # '1_Beginning'
With my limited knowledge of Java, things aren't so succinct:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
string.replaceFirst( p.toString(), oneMore.toString() ); //
The regex doesn't match here... but it did in Perl.
What am I doing wrong here?

Actually it matches. You can find out by printing
System.out.println(p.matcher(string).find());
The issue is with line
String digit = string.replaceFirst( p.toString(), "$1" );
which is actually a do-nothing, because it replaces the first group (which is all you match, the lookahead is not part of the match) with the content of the first group.
You can get the desired result (namely the digit) via the following code
Matcher m = p.matcher(string);
String digit = m.find() ? m.group(1) : "";
Note: you should check m.find() anyways if nothing matches. In this case you may not call parseInt and you'll get an error. Thus the full code looks something like
Pattern p = Pattern.compile("^(\\d+)(?=_.*)");
String string = "0_Beginning";
Matcher m = p.matcher(string);
if (m.find()) {
String digit = m.group(1);
Integer oneMore = Integer.parseInt(digit) + 1;
string = m.replaceAll(oneMore.toString());
System.out.println(string);
} else {
System.out.println("No match");
}

Let's see what you are doing here.
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
You declare and initialize String and pattern objects.
String digit = string.replaceFirst( p.toString(), "$1" ); // To get the digit
(You are converting the pattern back into a string, and replaceFirst creates a new Pattern from this. Is this intentional?)
As Howard says, this replaces the first match of the pattern in the string with the contents of the first group, and the match of the pattern is just 0 here, as the first group. Thus digit is equal to string, ...
Integer oneMore = Integer.parseInt( digit ) + 1; // Evaluate ++digit
... and your parsing fails here.
string.replaceFirst( p.toString(), oneMore.toString() ); //
This would work (but convert the pattern again to string and back to pattern).
Here how I would do this:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher matcher = p.matcher(string);
StringBuffer result = new StringBuffer();
while(matcher.find()) {
int number = Integer.parseInt(matcher.group());
m.appendReplacement(result, String.valueOf(number + 1));
}
m.appendTail(result);
return result.toString(); // 1_Beginning
(Of course, for your regex the loop will only execute once, since the regex is anchored.)
Edit: To clarify my statement about string.replaceFirst:
This method does not return a pattern, but uses one internally. From the documentation:
Replaces the first substring of this string that matches the given regular expression with the given replacement.
An invocation of this method of the form str.replaceFirst(regex, repl) yields exactly the same result as the expression
Pattern.compile(regex).matcher(str).replaceFirst(repl)
Here we see that a new pattern is compiled from the first argument.
This also shows us another way to do what you did want to do:
String string = "0_Beginning";
Pattern p = Pattern.compile( "^(\\d+)(?=_.*)" );
Matcher m = p.matcher(string);
if(m.find()) {
digit = m.group();
int oneMore = Integer.parseInt( digit ) + 1
return m.replaceFirst(string, String.valueOf(oneMore));
}
This only compiles the pattern once, instead of thrice like in your original program - but still does the matching twice (once for find, once for replaceFirst), instead of once like in my program.

Related

How to replace multiple consecutive occurrences of a character with a maximum allowed number of occurences?

CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
String replace = "-";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
boolean isMatch = matcher.find();
StringBuffer buffer = new StringBuffer();
for (int i = 0; i < content.length(); i++) {
while (matcher.find()) {
matcher.appendReplacement(buffer, replace);
}
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
In the above code content is input string,
I am trying to find repetitive occurrences from string and want to replace it with max no of occurrences
For Example
input -("abaaadccc",2)
output - "abaadcc"
here aaaand cccis replced by aa and cc as max allowed repitation is 2
In the above code, I found such occurrences and tried replacing them with -, it's working, But can someone help me How can I get current char and replace with allowed occurrences
i.e If aaa is found it is replaced by aa
or is there any alternative method w/o using regex?
You can declare the second group in a regex and use it as a replacement:
String result = "aaabbbccaaa".replaceAll("(([a-zA-Z])\\2)\\2+", "$1");
Here's how it works:
( first group - a character repeated two times
([a-zA-Z]) second group - a character
\2 a character repeated once
)
\2+ a character repeated at least once more
Thus, the first group captures a replacement string.
It isn't hard to extrapolate this solution for a different maximum value of allowed repeats:
String input = "aaaaabbcccccaaa";
int maxRepeats = 4;
String pattern = String.format("(([a-zA-Z])\\2{%s})\\2+", maxRepeats-1);
String result = input.replaceAll(pattern, "$1");
System.out.println(result); //aaaabbccccaaa
Since you defined a group in your regex, you can get the matching characters of this group by calling matcher.group(1). In your case it contains the first character from the repeating group so by appending it twice you get your expected result.
CharSequence content = new StringBuffer("aaabbbccaaa");
String pattern = "([a-zA-Z])\\1\\1+";
Pattern patt = Pattern.compile(pattern, Pattern.CASE_INSENSITIVE);
Matcher matcher = patt.matcher(content);
StringBuffer buffer = new StringBuffer();
while (matcher.find()) {
System.out.println("found : "+matcher.start()+","+matcher.end()+":"+matcher.group(1));
matcher.appendReplacement(buffer, matcher.group(1)+matcher.group(1));
}
matcher.appendTail(buffer);
System.out.println(buffer.toString());
Output:
found : 0,3:a
found : 3,6:b
found : 8,11:a
aabbccaa

Java Regexp "\\d+" (Digits Only) not working

Input string: 07-000
JAVA Regexp: \\d+ (digits only)
Expected Result: 07000 (digits only from input string)
Then why does this Java code return 07 only?
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
String result = null;
if (matcher.find()) {
result = matcher.group();
}
System.out.println(result);
I guess that what you want to achieve is rather this:
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
StringBuilder result = new StringBuilder();
// Iterate over all the matches
while (matcher.find()) {
// Append the new match to the current result
result.append(matcher.group());
}
System.out.println(result);
Output:
07000
Indeed matcher.find() will return the next subsequence in the input that matches with the pattern so if you call it only once, you will get only the first subsequence which is 07 here. So if you want to get everything you need to loop until it returns false indicating that there is no more matches available.
However in this particular case, it would be better to call directly myString.replaceAll("\\D+", "") which will replace by an empty String any non digit characters.
Then why does this Java code return 07 only?
It returns only 07 because that is the first group found by your regex, you need a while loop to get all groups and later you can concatenate them to get all numbers in one string.
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher("07-000");
StringBuilder sb = new StringBuilder();
while (matcher.find())
{
sb.append( matcher.group() );
}
System.out.println( "All the numbers are : " + sb.toString() );

java find() always returning true

I am trying to find a pattern in the string in java. Below is the code written as-
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]?"; //[A-Za-z2-9\W]?
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));
I am expecting the boolean value to be false, but instead it is always returning as true. Any input or idea where I am going wrong.?
The ? makes the entire character group optional. So your regex essentially means "find any character* ... or not". And the "or not" part means it matches the empty string.
* not really "any", just those characters that are represented in ASCII.
[A-Za-z]? means "zero or one letters". It will always match somewhere in the string; even if there aren't any letters, it will match zero of them.
The below regex should work;
[A-Za-z]?-----> once or not at all
Reference :
http://docs.oracle.com/javase/1.5.0/docs/api/java/util/regex/Pattern.html
String line = "10011011001;0110,1001,1001,0,10,11";
String regex ="[A-Za-z]";// to find letter
String regex ="[A-Za-z]+$";// to find last string..
String regex ="[^0-9,;]";//means non digits and , ;
//create a pattern obj
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(line);
boolean a = m.find();
System.out.println("The value of a is::"+a +" asdsd "+m.group(0));

Using Regex to ignore a pattern in java

I have a sentence: "we:PR show:V".
I want to match only those characters after ":" and before "\\s" using regex pattern matcher.
I used following pattern:
Pattern pattern=Pattern.compile("^(?!.*[\\w\\d\\:]).*$");
But it did not work.
What is the best pattern to get the output?
For a situation such as this, if you are using java, it may be easier to do something with substrings:
String input = "we:PR show:V";
String colon = ":";
String space = " ";
List<String> results = new ArrayList<String>();
int spaceLocation = -1;
int colonLocation = input.indexOf(colon);
while (colonLocation != -1) {
spaceLocation = input.indexOf(space);
spaceLocation = (spaceLocation == -1 ? input.size() : spaceLocation);
results.add(input.substring(colonLocation+1,spaceLocation);
if(spaceLocation != input.size()) {
input = input.substring(spaceLocation+1, input.size());
} else {
input = new String(); //reached the end of the string
}
}
return results;
This will be faster than trying to match on regex.
The following regex assumes that any non-whitespace characters following a colon (in turn preceded by non-colon characters) are a valid match:
[^:]+:(\S+)(?:\s+|$)
Use like:
String input = "we:PR show:V";
Pattern pattern = Pattern.compile("[^:]+:(\\S+)(?:\\s+|$)");
Matcher matcher = pattern.matcher(input);
int start = 0;
while (matcher.find(start)) {
String match = matcher.group(1); // = "PR" then "V"
// Do stuff with match
start = matcher.end( );
}
The pattern matches, in order:
At least one character that isn't a colon.
A colon.
At least non-whitespace character (our match).
At least one whitespace character, or the end of input.
The loop continues as long as the regex matches an item in the string, beginning at the index start, which is always adjusted to point to after the end of the current match.

JAVA RegEx on _ delimited string

OK, I need a RegEx that traps the first word up to underscore character but then capture the next words that may have a underscore character. So, here is a group and the expected result:
gear_Armor_Blessed_Robes = "gear", "Armor" and "Blessed_Robes"
gear_Armor_Chain_Coif = "gear", "Armor" and "Chain_Coif"
gear_Armor_Chain_Hauberk = "gear", "Armor" and "Chain_Hauberk"
gear_Armor_Chain_Shirt = "gear", "Armor" and "Chain_Shirt"
gear_Armor_Chain_Leggings = "gear", "Armor" and "Chain_Leggings"
There's no need to use a regex for this, just use indexOf and substring:
String s = "Armor_Blessed_Robes";
int idx = s.indexOf("_");
System.out.println(s.substring(0, idx)); // Armor
System.out.println(s.substring(idx + 1)); // Blessed_Robes
With regex, you'd have to use the following, which is a tad more complicated and harder to read:
Pattern p = Pattern.compile("([^_]+)_(.+)");
Matcher m = p.matcher(s);
if (m.find()) {
String first = m.group(1); // Armor
String second = m.group(2); // Blessed_Robes
}
You can split along _, limiting the number of splits to 3:
assert Arrays.equals("gear_Armor_Blessed_Robes".split("_", 3),
new String[] { "gear", "Armor", "Blessed_Robes" });
It should give you a String[] that contains the 3 Strings as specified in your question.

Categories