How to get match within [ and ] in String using regex - java

I have a requirement to get the matching string within square brackets [].
For eg., in a String input like "[***]qwerty",
I should get the match as "***" string.
The regex I used in vain is "\\[(.+)\\]"
My Java code is as below:
Pattern pattern = Pattern.compile(regex_custom_delimiter_pattern); //see regex above
Matcher matcher = pattern.matcher("[***]qwerty");
String delimiter = null;
if (matcher.find()) {
delimiter = matcher.group(0);
}
Any help is appreciated..wondering what I'm missing in the regex that I used :(

That should work correctly, but you can use a more efficient expression if the value between [ and ] doesn't contain [ or ] literally:
\\[([^\\]]+)]
Or if the value can contain [ or ] then:
\\[(.+?)\\]
Also your main problem is that you are getting group 0 matcher.group(0) which is the entire match, your value is stored in group 1 so you need matcher.group(1).

You need group 1 instead of group 0. Group 0 is the whole match.
delimiter = matcher.group(0);

Related

java regular expression to extract uuid within square brackets

I have string inside brackets like following format:
[space string space]
I want to extract the string if the string is in UUID format.
example : [ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]
With java regular expression how can I get d6a413f4-059c-11e8-ba89-0ed5f89f718b ?
For your given example, you could use a lookaround to match what is between the [ and the ]:
(?<=\[ ).*?(?= \])
Explanation
(?= \]) positive lookbehind to assert that what is before is [
.*? match any character zero or more times non greedy
(?= \]) positive lookahead to assert that what follows is ]
For example:
String regex = "(?<=\\[ ).*?(?= \\])";
String string = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
Java example output
Using regex
\[ ([a-f0-9]{8}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{4}-[a-f0-9]{12}) ]
Regex101
Why you don't want to do this
If you know that your string will definitely have the right format then you can just use substring to get the UUID
class Main {
public static void main(String... args) {
String s = "[ d6a413f4-059c-11e8-ba89-0ed5f89f718b ]";
System.out.println(s.substring(2, s.length()-2));
}
}
Try it online!
This will be faster than using the regex option.
Regex to check if given String contains valid UUID:
"\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]"
So, what is going on in this regex:
\\[ - character ‘[‘ and whitespace after it
[a-f0-9]{8} – characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly eight times (123e5670 part)
\\- - ‘-‘ character
(?:[a-f0-9]{4}\\-){3} – non-capturing group that you want to be present exactly three times (this non-capturing group should contain exactly 4 characters that are in the range from ‘a’ to ‘f’ or from ‘0’ to ‘9’. After these 4 characters there must be present ‘-‘ character) (a234-b234-c234- part)
[a-f0-9]{12} - characters from ‘a’ to ‘f’ and from ‘0’ to ‘9’ exactly twelve times (d23456789012 part)
\\] – whitespace and ‘]’ character
After searching String for match with find() method, you only print capturing group #1 with group(1) method ( capturing group #1 is contained in parenthesis () )
Your UUID is in capture group 1. Here is a simple example how you can get UUID from source String:
String source = "[ 123e5670-a234-b234-c234-d23456789012 ]";
Pattern p = Pattern.compile("\\[ ([a-f0-9]{8}\\-(?:[a-f0-9]{4}\\-){3}[a-f0-9]{12}) \\]");
Matcher m = p.matcher(source);
if(m.find()) {
System.out.println( m.group(1));
}

Splitting strings delimited by [[ ]] in java?

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.
You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology
A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");
It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]
Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Regular expression to get characters before brackets or comma

I'm pulling my hair out a bit with this.
Say I have a string 7f8hd::;;8843fdj fls "": ] fjisla;vofje]]} fd)fds,f,f
I want to now extract this 7f8hd::;;8843fdj fls "": from the string based on the premise that the string ends with either a } or ] or , or ) but all those characters could be present I only need the first one.
I have tried without success to create a regular expression with a Matcher and Pattern class but I just can't seem to get it right.
The best I could come up with is below but my reg exp just doesn't seem to work like I think it should.
String line = "7f8hd::;;8843fdj fls "": ] fjisla;vofje]]} fd)fds,f,f";
Matcher m = Pattern.compile("(.*?)\\}|(.*?)\\]|(.*?)\\)|(.*?),").matcher(line);
while (matcher.find()) {
System.out.println(matcher.group());
}
I'm clearly not understanding reg exp correctly. Any help would be great.
^[^\]}),]*
matches from the start of the string until (but excluding) the first ], }, ) or ,.
In Java:
Pattern regex = Pattern.compile("^[^\\]}),]*");
Matcher regexMatcher = regex.matcher(line);
if (regexMatcher.find()) {
System.out.println(regexMatcher.group());
}
(You can actually remove the backslashes ([^]}),]), but I like to keep them there for clarity and for compatibility since not all regex engines recognize that idiom.)
Explanation:
^ # Match the start of the string
[^\]}),]* # Match zero or more characters except ], }, ) or ,
you could just cut the rest part by replaceAll:
String newStr = yourStr.replaceAll("[\\])},].*", "");
or by split() and get the first element.
String newStr = yourStr.split("[\\])},]")[0];
You can use this (as java string):
"(.+?)[\\]},)].*"
here is a fiddle
Could you try the regular expression (.*?)[}\]),](.*?) I tested it on rubular and worked against your example.

Find words in string surrounded by "[" and "]":

I need help with a simple task in java. I have the following sentence:
Where Are You [Employee Name]?
your have a [Shift] shift..
I need to extract the strings that are surrounded by [ and ] signs.
I was thinking of using the split method with " " parameter and then find the single words, but I have a problem using that if the phrase I'm looking for contains: " ". using indexOf might be an option as well, only I don't know what is the indication that I have reached the end of the String.
What is the best way to perform this task?
Any help would be appreciated.
Try with regex \[(.*?)\] to match the words.
\[: escaped [ for literal match as it is a meta char.
(.*?) : match everything in a non-greedy way.
Sample code:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift.");
while(m.find()) {
System.out.println(m.group());
}
Here you go Java regular expression that extract text between two brackets including white spaces:
import java.util.regex.*;
class Main
{
public static void main(String[] args)
{
String txt="[ Employee Name ]";
String re1=".*?";
String re2="( )";
String re3="((?:[a-z][a-z]+))"; // Word 1
String re4="( )";
String re5="((?:[a-z][a-z]+))"; // Word 2
String re6="( )";
Pattern p = Pattern.compile(re1+re2+re3+re4+re5+re6,Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
Matcher m = p.matcher(txt);
if (m.find())
{
String ws1=m.group(1);
String word1=m.group(2);
String ws2=m.group(3);
String word2=m.group(4);
String ws3=m.group(5);
System.out.print("("+ws1.toString()+")"+"("+word1.toString()+")"+"("+ws2.toString()+")"+"("+word2.toString()+")"+"("+ws3.toString()+")"+"\n");
}
}
}
if you want to ignore white space remove "( )";
This is a Scanner base solution
Scanner sc = new Scanner("Where Are You [Employee Name]? your have a [Shift] shift..");
for (String s; (s = sc.findWithinHorizon("(?<=\\[).*?(?=\\])", 0)) != null;) {
System.out.println(s);
}
output
Employee Name
Shift
Use a StringBuilder (I assume you don't need synchronization).
As you suggested, indexOf() using your square bracket delimiters will give you a starting index and an ending index. use substring(startIndex + 1, endIndex - 1) to get exactly the string you want.
I'm not sure what you meant by the end of the String, but indexOf("[") is the start and indexOf("]") is the end.
That's pretty much the use case for a regular expression.
Try "(\\[[\\w ]*\\])" as your expression.
Pattern p = Pattern.compile("(\\[[\\w ]*\\])");
Matcher m = p.matcher("Where Are You [Employee Name]? your have a [Shift] shift..");
if (m.find()) {
String found = m.group();
}
What does this expression do?
First it defines a group (...)
Then it defines the starting point for that group. \[ matches [ since [ itself is a 'keyword' for regular expressions it has to be masked by \ which is reserved in Java Strings and has to be masked by another \
Then it defines the body of the group [\w ]*... here the regexpression [] are used along with \w (meaning \w, meaning any letter, number or undescore) and a blank, meaning blank. The * means zero or more of the previous group.
Then it defines the endpoint of the group \]
and closes the group )

Regular expression for a string starting with some string

I have some string, that has this type: (notice)Any_other_string (notes that : () has in this string`.
So, I want to separate this string to 2 part : (notice) and the rest. I do as follow :
private static final Pattern p1 = Pattern.compile("(^\\(notice\\))([a-z_A-Z1-9])+");
String content = "(notice)Stack Over_Flow 123";
Matcher m = p1.matcher(content);
System.out.println("Printing");
if (m.find()) {
System.out.println(m.group(0));
System.out.println(m.group(1));
}
I hope the result will be (notice) and Stack Over_Flow 123, but instead, the result is : (notice)Stack and (notice)
I cannot explain this result. Which regex is suitable for my purpose?
Issue 1: group(0) will always return the entire match - this is specified in the javadoc - and the actual capturing groups start from index 1. Simply replace it with the following:
System.out.println(m.group(1));
System.out.println(m.group(2));
Issue 2: You do not take spaces and other characters, such as underscores, into account (not even the digit 0). I suggest using the dot, ., for matching unknown characters. Or include \\s (whitespace) and _ into your regex. Either of the following regexes should work:
(^\\(notice\\))(.+)
(^\\(notice\\))([A-Za-z0-9_\\s]+)
Note that you need the + inside the capturing group, or it will only find the last character of the second part.

Categories