java.lang.StringIndexOutOfBoundsException: from java.util.regex.Matcher - java

I am trying to use regex to remove nbsp; from my string . Following is the program.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final StringBuffer testRegex =
new StringBuffer("<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font>" +
"<BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>Test</p><strong>" +
"<FONT color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
"<br><p>TestTest</p><br><BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ffcc66\">TestTestTestTestTest</font><br>" +
"<p>TestTestTestTest</p></blockquote><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003333\">TestTestTest</font></p><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003399\">TestTest</font></p><p> </p>");
//"This is test<P>Tag Tag</P>";
public static void main(String[] args) {
System.out.println("***Testing***");
String temp = checkRegex(testRegex);
System.out.println("***FINAL = "+temp);
}
private static String checkRegex(StringBuffer sample){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(sample);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String group = matcher.group();
System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);
String substring = sample.substring(start, end);
System.out.println(" Substring = "+substring);
String replacedSubString = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubString);
sample.replace(start, end, replacedSubString);
System.out.println(" NEW SAMPLE = "+sample);
}
System.out.println("********WHILE OVER ********");
return sample.toString();
}
}
I am getting java.lang.StringIndexOutOfBoundsException at line while (matcher.find()). I am currently using java Pattern and Matcher to find nbsp; and the replace it with " ". Does anyone know what causes this ? What should I do to remove the extra nbsp; from my string ?
Thanks

Use matcher.reset(); after sample.replace(start, end, replacedSubString);
This is because when you replace the string sample, the end would point to an invalid position.So,you need to use matcher.reset(); after every replace.
For example if start is 0 and end is 5 and when you replace with ,the end would point to an invalid position and then find method would throw a StringIndexOutOfBoundsException exception if end points to position outside the string length.
If string is huge,reset can cause a major performance bottleneck because reset would again start matching from beginning.You can instead use
matcher.region(start,sample.length());
This would start matching from the last matched position!

You need to create a new StringBuffer to hold the replaced string, then use appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) methods in Matcher class to do the replacement. There is probably way to do this in-place, but the approach above is the most straight-forward way to do this.
This is your checkRegex method re-written:
private static String checkRegex(String inputString){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(inputString);
// Create a new StringBuffer to hold the string after replacement
StringBuffer replacedString = new StringBuffer();
while (matcher.find()) {
// matcher.group() returns the substring that matches the whole regex
String substring = matcher.group();
System.out.println(" Substring = "+substring);
String replacedSubstring = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubstring);
// appendReplacement is a clean approach to append the text which comes
// before a match, and append the replacement text for the matched text
// Note that appendReplacement will interpret $ in the replacement string
// with special meaning (for referring to text matched by capturing group).
// Matcher.quoteReplacement is necessary to provide a literal string as
// replacement
matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));
System.out.println(" NEW SAMPLE = "+replacedString);
}
// appendTail is used to append the text after the last match to the
// replaced string.
matcher.appendTail(replacedString);
System.out.println("********WHILE OVER ********");
return replacedString.toString();
}

// change the group and it is source string is automatically updated
There is no way what so ever to change any string in Java, so what you're asking for is impossible.
To remove or replace a pattern with a string can be achieved with a call like
someString = someString.replaceAll(toReplace, replacement);
To transform the matched substring, as seems to be indicated by your line
m.group().replaceAll("something","");
The best solution is probably to use a StringBuffer for the result
Matcher.appendReplacement and Matcher.appendTail.
Example:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"

Related

How can i replace this?

How can I replace this
String str = "KMMH12DE1433";
String pattern = "^[a-z]{2}([0-9]{2})[a-z]{1,2}([0-9]{4})$";
String str2 = str.replaceAll(pattern, "repl");
Log.e("Founded_words2",str2);
What I got: KMMH12DE1433
What I want: MH12DE1433
Try it like this using a proper java.util.regex.Pattern and a java.util.regex.Matcher:
String str = "KMMH12DE1433";
//Make the pattern, case-insensitive using (?i)
Pattern pattern = Pattern.compile("(?i)[a-z]{2}([0-9]{2})[a-z]{1,2}([0-9]{4})");
//Create the Matcher
Matcher m = pattern.matcher(str);
//Check if we find anything
if(m.find()) {
//Use what you found - with proper capturing groups you
//gain access to parts of your pattern as needed
System.out.println("Found this: " + m.group());
}
If you just want to remove the first two characters and if the first two characters will always be uppercase letters:
String str = "KMMH12DE1433";
String pattern = "^[A-Z]{2}";
String str2 = str.replaceAll(pattern, "");
Log.e("Output string: ", str2);
try this :
String a = "KMMH12DE1433";
String pattern = "^[A-Z]{2}";
String rs = a.replaceAll(pattern,"");
Please change like this
String ans=str.substring(0);

How to find match for exact word using pattern matcher in java

I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true
For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";
Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";
If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true
Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";

Extract content after "=" and before "&", Regex expression in java

guys, I wanna extract the content in a string, the content is before "&" and after the "=", like this example:
asdfaf=afl10109&adsfjkl
I want to extract "afl10109" out of the string, can anyone teach me how to do this, I am very new to regex expression...
Use replaceAll() to replace the whole input with just what you want:
String target = str.replaceAll(".*=(.*)&.*", "$1");
The target is captured in a group (group number 1), which is then referenced in the replacement string.
try
public static void main(String args[]) {
String input="asdfaf=afl10109&adsfjkl";
Pattern pattern = Pattern.compile("=[^&]*&");
Matcher m = pattern.matcher(input);
while (m.find()) {
String str = m.group();
System.out.println( str.substring(1,str.length()-1));
}
}
This is not regex but you can also use split()
String str = "asdfaf=afl10109&adsfjkl";
System.out.println(str.split("=")[1].split("&")[0]);
Output:
afl10109
Using good old String#substring()
String str = "foo=bar&baz";
int begin = str.indexOf('=');
if (begin != -1) {
int end = str.indexOf('&', begin);
if (end != -1) {
System.out.println(str.substring(begin+1, end)); // bar
}
}

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?
You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").
The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

Splitting a string java

I have a string in format:
<+923451234567>: Hi here is the text.
Now I want to get the mobile number(without any non-alphanumeric characters) ie 923451234567 in the start of the string in-between < > symbols, and also the text ie Hi here is the text.
Now I can place a hardcoded logic, which I am currently doing.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
String[] splitted = cpaMessage.getText().split(">: ", 2);
String mobileNumber=MyUtils.removeNonDigitCharacters(splitted[0]);
String text=splitted[1];
How can I neatly get the required strings from the string with regular expression? So that I don't have to change the code whenever the format of the string changes.
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
Pattern pattern = Pattern.compile("<\\+?([0-9]+)>: (.*)");
Matcher matcher = pattern.matcher(stringReceivedInSms);
if(matcher.matches()) {
String phoneNumber = matcher.group(1);
String messageText = matcher.group(2);
}
Use a regex that matches the pattern - <\\+?(\\d+)>: (.*)
Use the Pattern and Matcher java classes to match the input string.
Pattern p = Pattern.compile("<\\+?(\\d+)>: (.*)");
Matcher m = p.matcher("<+923451234567>: Hi here is the text.");
if(m.matches())
{
System.out.println(m.group(1));
System.out.println(m.group(2));
}
You need to use regex, the following pattern will work:
^<\\+?(\\d++)>:\\s*+(.++)$
Here is how you would use it -
public static void main(String[] args) throws IOException {
final String s = "<+923451234567>: Hi here is the text.";
final Pattern pattern = Pattern.compile(""
+ "#start of line anchor\n"
+ "^\n"
+ "#literal <\n"
+ "<\n"
+ "#an optional +\n"
+ "\\+?\n"
+ "#match and grab at least one digit\n"
+ "(\\d++)\n"
+ "#literal >:\n"
+ ">:\n"
+ "#any amount of whitespace\n"
+ "\\s*+\n"
+ "#match and grap the rest of the string\n"
+ "(.++)\n"
+ "#end anchor\n"
+ "$", Pattern.COMMENTS);
final Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
}
}
I have added the Pattern.COMMENTS flag so the code will work with the comments embedded for future reference.
Output:
923451234567
Hi here is the text.
You can get your phone number by just doing :
stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">"))
So try this snippet:
public static void main(String[] args){
String stringReceivedInSms="<+923451234567>: Hi here is the text.";
System.out.println(stringReceivedInSms.substring(stringReceivedInSms.indexOf("<+") + 2, stringReceivedInSms.indexOf(">")));
}
You don't need to split your String.

Categories