Java Regex pattern not matching brackets

Java Regex pattern not matching brackets - java

I am currently trying to test the regex pattern matching the following
[#123456]
[#aabc36]
I have tried #[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3} and successfully match #aabc36 but when it comes to adding the brackets [] , it fails.
I have tried below pattern for matching
[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}]
The below is my method for regex replacement
public String replaceColor(String text , String bbcode , String imageLocation ){
//"\\[("+bbcode+")\\]" for [369] , [sosad]
// String imageLocation = "file:///android_asset/smileyguy.png";
// builder.append("<img src=\"" + imageLocation + "\" />");
StringBuffer imageBuffer = new StringBuffer ("");
// Pattern pattern = Pattern.compile("\\"+bbcode);
Pattern pattern = Pattern.compile(Pattern.quote(bbcode));
Matcher matcher = pattern.matcher(text);
//populate the replacements map ...
StringBuilder builder = new StringBuilder();
int i = 0;
while (matcher.find()) {
//String orginal = replacements.get(matcher.group(1));
imageBuffer.append("<img src=\"" + imageLocation + "\" />");
String replacement = imageBuffer.toString();
builder.append(text.substring(i, matcher.start()));
if (replacement == null) {
builder.append(matcher.group(0));
} else {
builder.append(replacement);
}
i = matcher.end();
}
builder.append(text.substring(i, text.length()));
return builder.toString();
}

To match [ , ] literally, you should escape them. Otherwise it is used as metacharacter that represents a set of characters.
\[#[A-Fa-f0-9]{6}\]|\[[A-Fa-f0-9]{3}\]
In Java string litearls, \ should be escaped.
"\\[#[A-Fa-f0-9]{6}\\]|\\[[A-Fa-f0-9]{3}\\]"

You need to escape the brackets with a \ in order to match on them as they are a regex symbol:
\[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\]
In a Java string you will also need to escape the backslash so:
String pattern = "\\[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\\]";

If you want to include brackets in the pattern to match you must escape them with a . But because java already uses \ as an escape character you must use two of them "\[...\]"

Related

How can I extract substring from the given url using regex in Android Studio

I'm trying to extract CANseIqFMnf from the URL https://www.instagram.com/p/CANseIqFMnf/ using regex in Android studio. Please help me to get a regex expression eligible for Android Studio.
Here is the code for my method:
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String REGEX = "/p\//";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(url);
boolean match = matcher.matches();
if (match){
Log.e("success", "start = " + matcher.start() + " end = " + matcher.end() );
}else{
Log.e("failed", "failed");
}
But it gives me failed in return!

Method 1
You just need to use replaceAll method in String, no need to compile a pattern and complicate things:
String input = "https://www.instagram.com/p/CANseIqFMnf/";
String output = input.replaceAll("https://www.instagram.com/p/", "").replaceAll("/", "");
Log.v(TAG, output);
Note that the first replaceAll is to remove the url and the second replaceAll is to remove any slashes /
Method 2
Pattern pattern = Pattern.compile("https://www.instagram.com/p/(.*?)/");
Matcher matcher = pattern.matcher("https://www.instagram.com/p/CANseIqFMnf/");
while(matcher.find()) {
System.out.println(matcher.group(1));
}
Note that if matcher.find() returns true then if you used modifiers like this in your REGEX (.*?) then the part found there will be in group(1), and group(0) will hold the entire regex match which is in your case the entire url.

Alternate option w/o regex can be implemented in a simpler manner as below using java.nio.file.Paths APIs
public class Url {
public static void main(String[] args) {
String url = "https://www.instagram.com/p/CANseIqFMnf/";
String name = java.nio.file.Paths.get(url).getFileName().toString();
System.out.println(name);
}
}

How to put Java Regex matches to Resultant String?

How to tokenize an String like in lexer in java?
Please refer to the above question. I never used java regex . How to put the all substring into new string with matched characters (symbols like '(' ')' '.' '<' '>' ") separated by single space . for e.g. before regex
String c= "List<String> uncleanList = Arrays.asList(input1.split("x"));" ;
I want resultant string like this .
String r= " List < String > uncleanList = Arrays . asList ( input1 . split ( " x " ) ) ; "

Referring to the code that you linked to, matcher.group() will give you a single token. Simple use a StringBuilder to append this token and a space to get a new string where the tokens are space-separated.
String c = "List<String> uncleanList = Arrays.asList(input1.split(\"x\"));" ;
Pattern pattern = Pattern.compile("\\w+|[+-]?[0-9\\._Ee]+|\\S");
Matcher matcher = pattern.matcher(c);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String token = matcher.group();
sb.append(token).append(" ");
}
String r = sb.toString();
System.out.println(r);

String c = "List<String> uncleanList = Arrays.asList(input1.split('x'));";
Matcher matcher = Pattern.compile("\\<|\\>|\\\"|\\.|\\(|\\)").matcher(c);
while(matcher.find()){
String symbol = matcher.group();
c = c.replace(symbol," " + symbol + " ");
}
Actually if you look deeply You can figure out that you have to separate only not alphabet symbols and space ((?![a-zA-Z]|\ ).)

How to find match for exact word using pattern matcher in java

I have shared my sample code here. here i am trying to find word "engine" with different strings. i used word boundary to match the words in string.
it matches word if it starts with #engine(example).
it should only match with exact word.
private void checkMatch() {
String source1 = "search engines has ";
String source2 = "search engine exact word";
String source3 = "enginecheck";
String source4 = "has hashtag #engine";
String key = "engine";
System.out.println(isContain(source1, key));
System.out.println(isContain(source2, key));
System.out.println(isContain(source3, key));
System.out.println(isContain(source4, key));
}
private boolean isContain(String source, String subItem) {
String pattern = "\\b" + subItem + "\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
**Expected output**
false
true
false
false
**actual output**
false
true
false
true

For this case, you have to use regex OR instead of word boundary. \\b matches between a word char and non-word char (vice-versa). So your regex should find a match in #engine since # is a non-word character.
private boolean isContain(String source, String subItem) {
String pattern = "(?m)(^|\\s)" + subItem + "(\\s|$)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(source);
return m.find();
}
or
String pattern = "(?<!\\S)" + subItem + "(?!\\S)";

Change your pattern as below.
String pattern = "\\s" + subItem + "\\b";

If you are looking for a literal text enclosed with spaces or start/end of the string, you can split the string with a mere whitespace pattern like \s+ and check if any of the chunks equals the search text.
Java demo:
String s = "Can't start the #engine here, but this engine works";
String searchText = "engine";
boolean found = Arrays.stream(s.split("\\s+"))
.anyMatch(word -> word.equals(searchText));
System.out.println(found); // => true

Change the regexp to
String pattern = "\\s"+subItem + "\\s";
I'm using the
\s A whitespace character: [ \t\n\x0B\f\r]
For more info look into the java.util.regex.Pattern javadoc
Also if you want to support strings like these:
"has hashtag engine"
"engine"
You can improve it by adding the ending/starting line terminators (^ and $)
by using this pattern:
String pattern = "(^|\\s)"+subItem + "(\\s|$)";

Replace different Regex-Matches with Match-based results in Java

One common usage for regex is the replacement of the matches with something that is based on the matches.
For example a commit-text with ticket numbers ABC-1234: some text (ABC-1234) has to be replaced with <ABC-1234>: some text (<ABC-1234>) (<> as example for some surroundings.)
This is very simple in Java
String message = "ABC-9913 - Bugfix: Some text. (ABC-9913)";
String finalMessage = message;
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
if (matcher.find()) {
String ticket = matcher.group();
finalMessage = finalMessage.replace(ticket, "<" + ticket + ">");
}
System.out.println(finalMessage);
results in<ABC-9913> - Bugfix: Some text. (<ABC-9913>).
But if there are different matches in the input String, this is different. I tried a slightly different code replacing if (matcher.find()) { with while (matcher.find()) {. The result is messed up with doubled replacements (<<ABC-9913>>).
How can I replace all matching values in an elegant way?

You can simply use replaceAll:
String input = "ABC-1234: some text (ABC-1234)";
System.out.println(input.replaceAll("ABC-\\d+", "<$0>"));
prints:
<ABC-1234>: some text (<ABC-1234>)
$0 is a reference to the matched string.
Java regex reference (see "Groups and capturing").

The problem is that the replace() method transforms the string over and over again.
A better way is to replace one match at a time. The matcher class has an appendReplacement-method for this.
String message = "ABC-9913, ABC-9915 - Bugfix: Some text. (ABC-9913,ABC-9915)";
Matcher matcher = Pattern.compile("ABC-\\d+").matcher(message);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
String ticket = matcher.group();
matcher.appendReplacement(sb, "<" + ticket + ">");
}
matcher.appendTail(sb);
System.out.println(sb);

java.lang.StringIndexOutOfBoundsException: from java.util.regex.Matcher

I am trying to use regex to remove nbsp; from my string . Following is the program.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class MyTest {
private static final StringBuffer testRegex =
new StringBuffer("<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font></p><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ff6600\">Test</font>" +
"<BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>Test</p><strong>" +
"<FONT color=#333333>TestTest</font></strong></p><br><p>Test</p></blockquote>" +
"<br><p>TestTest</p><br><BLOCKQUOTE style=\"MARGIN-RIGHT: 0px\" dir=ltr><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #ffcc66\">TestTestTestTestTest</font><br>" +
"<p>TestTestTestTest</p></blockquote><br><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003333\">TestTestTest</font></p><p>" +
"<FONT style=\"BACKGROUND-COLOR: #003399\">TestTest</font></p><p> </p>");
//"This is test<P>Tag Tag</P>";
public static void main(String[] args) {
System.out.println("***Testing***");
String temp = checkRegex(testRegex);
System.out.println("***FINAL = "+temp);
}
private static String checkRegex(StringBuffer sample){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(sample);
while (matcher.find()) {
int start = matcher.start();
int end = matcher.end();
String group = matcher.group();
System.out.println("start = "+start+" end = "+end+"" +"***GROUP = "+group);
String substring = sample.substring(start, end);
System.out.println(" Substring = "+substring);
String replacedSubString = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubString);
sample.replace(start, end, replacedSubString);
System.out.println(" NEW SAMPLE = "+sample);
}
System.out.println("********WHILE OVER ********");
return sample.toString();
}
}
I am getting java.lang.StringIndexOutOfBoundsException at line while (matcher.find()). I am currently using java Pattern and Matcher to find nbsp; and the replace it with " ". Does anyone know what causes this ? What should I do to remove the extra nbsp; from my string ?
Thanks

Use matcher.reset(); after sample.replace(start, end, replacedSubString);
This is because when you replace the string sample, the end would point to an invalid position.So,you need to use matcher.reset(); after every replace.
For example if start is 0 and end is 5 and when you replace with ,the end would point to an invalid position and then find method would throw a StringIndexOutOfBoundsException exception if end points to position outside the string length.
If string is huge,reset can cause a major performance bottleneck because reset would again start matching from beginning.You can instead use
matcher.region(start,sample.length());
This would start matching from the last matched position!

You need to create a new StringBuffer to hold the replaced string, then use appendReplacement(StringBuffer sb, String replacement) and appendTail(StringBuffer sb) methods in Matcher class to do the replacement. There is probably way to do this in-place, but the approach above is the most straight-forward way to do this.
This is your checkRegex method re-written:
private static String checkRegex(String inputString){
Pattern pattern = Pattern.compile("<[^>]+? [^<]+?>");
Matcher matcher = pattern.matcher(inputString);
// Create a new StringBuffer to hold the string after replacement
StringBuffer replacedString = new StringBuffer();
while (matcher.find()) {
// matcher.group() returns the substring that matches the whole regex
String substring = matcher.group();
System.out.println(" Substring = "+substring);
String replacedSubstring = substring.replaceAll(" "," ");
System.out.println("Replaced Substring = "+replacedSubstring);
// appendReplacement is a clean approach to append the text which comes
// before a match, and append the replacement text for the matched text
// Note that appendReplacement will interpret $ in the replacement string
// with special meaning (for referring to text matched by capturing group).
// Matcher.quoteReplacement is necessary to provide a literal string as
// replacement
matcher.appendReplacement(replacedString, Matcher.quoteReplacement(replacedSubstring));
System.out.println(" NEW SAMPLE = "+replacedString);
}
// appendTail is used to append the text after the last match to the
// replaced string.
matcher.appendTail(replacedString);
System.out.println("********WHILE OVER ********");
return replacedString.toString();
}

// change the group and it is source string is automatically updated
There is no way what so ever to change any string in Java, so what you're asking for is impossible.
To remove or replace a pattern with a string can be achieved with a call like
someString = someString.replaceAll(toReplace, replacement);
To transform the matched substring, as seems to be indicated by your line
m.group().replaceAll("something","");
The best solution is probably to use a StringBuffer for the result
Matcher.appendReplacement and Matcher.appendTail.
Example:
String regex = "ipsum";
String sourceString = "lorem ipsum dolor sit";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(sourceString);
StringBuffer sb = new StringBuffer();
while (m.find()) {
// For example: transform match to upper case
String replacement = m.group().toUpperCase();
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
sourceString = sb.toString();
System.out.println(sourceString); // "lorem IPSUM dolor sit"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex pattern not matching brackets - java

To match [ , ] literally, you should escape them. Otherwise it is used as metacharacter that represents a set of characters. \[#[A-Fa-f0-9]{6}\]|\[[A-Fa-f0-9]{3}\] In Java string litearls, \ should be escaped. "\\[#[A-Fa-f0-9]{6}\\]|\\[[A-Fa-f0-9]{3}\\]"

You need to escape the brackets with a \ in order to match on them as they are a regex symbol: \[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\] In a Java string you will also need to escape the backslash so: String pattern = "\\[#[A-Fa-f0-9]{6}|[A-Fa-f0-9]{3}\\]";

If you want to include brackets in the pattern to match you must escape them with a . But because java already uses \ as an escape character you must use two of them "\[...\]"

Related

How can I extract substring from the given url using regex in Android Studio

How to put Java Regex matches to Resultant String?

How to find match for exact word using pattern matcher in java

Replace different Regex-Matches with Match-based results in Java

java.lang.StringIndexOutOfBoundsException: from java.util.regex.Matcher

Categories

Resources