Splitting strings delimited by [[ ]] in java? - java

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.

You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology

A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");

It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]

Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Related

Java | Split words and round brackets with its content into elements of a String Array using regex

Hopefully you can help me out, since I'm really bad at regex, so
Given these examples of String input patterns:
"string1 string2 (more strings here)"
"string1 (more words)"
"str1 str2 str3 [...] strn [...] (words. again.)"
I want to end up with a String[] that looks like this:
["string1", "string2", "(more strings here)"]
Basically it should detect words and everything (also non characters) in round brackets as an individual group and put it in an String Array.
I understand that this captures the round brackets and their content: (\((.*?)\))
and this captures the words: (\w+)
but i have no idea how to combine them. Or is there a better alternative in Java?
Pattern pattern =
Pattern.compile("([\\w]+|\\(.*?\\))"); // match continous word characters or all strings between "(" and ")"
Matcher matcher =
pattern.matcher("string1 (more words)"); // input string
List<String> stringArrayList = new ArrayList<>();
// run matcher again and again to find the next match of regex on the input
while (matcher.find()) {
stringArrayList.add(matcher.group());
}
String[] output = stringArrayList.toArray(new String[0]); // final output
for (String entry :
output) {
System.out.println(entry); // printing
}
You could match the string with the following regular expression (with the case-indifferent flag set), catching the matches in an array.
"\\([^)]*\\)|[a-z\\d]+"
Start your Java engine! (click "Java")
The following link to regex101.com uses the equivalent regex for the PCRE (PHP) engine. I've included that to allow the reader to examine how each part of the regex works. (Move the cursor around to see interesting details pop up on the screen.)
Start your PCRE engine!

Java Regex , How to get equivalent value of the particular substring using regex in java

I have following string:
mydata ="\nmyName=ram\nmySalaryL=$2,256.00\n";
How to get my name and salary values using regex?
Try using Regex Capture Groups. They are placed in parenthesis and give you access to a matched substring.
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
A reasonable regex might be:
java.util.regex.Pattern regex = Pattern.compile("\nmyName=(\w+)\nmySalayL=$((\d{1,3},)*(\d{1,3})\.\d{2}))\n");
java.util.regex.Matcher match = regex.matcher(inputString);
if(match.matches()) {
String myName = match.group(1);
String mySalary = match.group(2);
}
Please pay attention that capture group 0 is the entire string and matches() needs to be called before accessing the capture groups, because it does the actual regex-matching.
No need to use regular expressions at all! It is enough to split by \n character and then by = to get those information:
String mydata ="\nmyName=ram\nmySalaryL=$2,256.00\n";
String[] arr = mydata.split("\n");
System.out.println(arr[1].split("=")[1]);
System.out.println(arr[2].split("=")[1]);

How to split comma-separated string but exclude some words containing comma in Java

Assume that we have below string:
"test01,test02,test03,exceptional,case,test04"
What I want is to split the string into string array, like below:
["test01","test02","test03","exceptional,case","test04"]
How can I do that in Java?
This negative lookaround regex should work for you:
(?<!exceptional),|,(?!case)
Working Demo
Java Code:
String[] arr = str.split("(?<!exceptional),|,(?!case)");
Explanation:
This regex matches a comma if any one of these 2 conditions meet:
comma is not preceded by word exceptional using negative lookbehind (?<!exceptional)
comma is not followed by word case using negative lookahead (?!case)
That effectively disallows splitting on comma when it is surrounded by exceptional and case on either side.
#anubhava's answer is great—use it. For completion, here's a general solution that is applicable to many solutions and uses a beautifully simple regex:
exceptional,case|(,)
The left side of the alternation | matches complete exceptional,case. We will ignore these matches. The right side matches and captures commas to Group 1, and we know they are the right ones because they were not matched by the expression on the left. We then replace these commas by something distinctive, and split on that string.
This program shows how to use the regex (see the results at the bottom of the online demo):
String subject = "somethingelse,case,test02,test03,exceptional,case,test04,exceptional,notcase";
Pattern regex = Pattern.compile("exceptional,case|(,)");
Matcher m = regex.matcher(subject);
StringBuffer b= new StringBuffer();
while (m.find()) {
if(m.group(1) != null) m.appendReplacement(b, "##SplitHere##");
else m.appendReplacement(b, m.group(0));
}
m.appendTail(b);
String replaced = b.toString();
String[] splits = replaced.split("##SplitHere##");
for (String split : splits) System.out.println(split);
Reference
How to match (or replace) a pattern except in situations s1, s2, s3...
Article about matching a pattern unless...
How can Java understand the exceptional,case is a single word and not to split ?
Still If there would have been some other recurring character like "" you could have split it.
For ex. if It was
"test01","test02","test03","exceptional,case","test04"
You could split it using ","
So in your case it is not possible, unless you use regular expression.
Here's a dead-simple answer, don't know why I didn't think of it yesterday:
(?<!exceptional(?=,case)),
Explanation
A comma (the last character of the regex) that is not preceded by exceptional followed by ,case
String s1 = "test01.test02.test03.{i}.case.test04.test03.{i}.test03.{i}.test03.{i}";
String[] arr1 = s1.split("(?<!)\\.|\\.(?!\\{i})");
Output:
test01
test02
test03.{i}
case
test04
test03.{i}
test03.{i}
test03.{i}
You probably want to use split()
Like this:
String[] array = "test01,test02,test03,exceptional,case,test04".split(",");

Help in writing a Regular expression for a string

Hi please help me out in getting regular expression for the
following requirement
I have string type as
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
String sStr = "Every 1 form(s) - Earth: (Air,Fire) ";
from these strings after using regex I need to get values as "Air,Earth,Water sea,Fire" and "Air,Fire"
that means after
String vStrRegex ="Air,Earth,Water sea,Fire";
String sStrRegex ="Air,Fire";
All the strings that are input will be seperated by ":" and values needed are inside brackets always
Thanks
The regular expression would be something like this:
: \((.*?)\)
Spelt out:
Pattern p = Pattern.compile(": \\((.*?)\\)");
Matcher m = p.matcher(vStr);
// ...
String result = m.group(1);
This will capture the content of the parentheses as the first capture group.
Try the following:
\((.*)\)\s*$
The ending $ is important, otherwise you'll accidentally match the "(s)".
If you have each string separately, try this expression: \(([^\(]*)\)\s*$
This would get you the content of the last pair of brackets, as group 1.
If the strings are concatenated by : try to split them first.
Ask yourself if you really need a regex. Does the text you need always appear within the last two parentheses? If so, you can keep it simple and use substring instead:
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
int lastOpeningParens = vStr.lastIndexOf('(');
int lastClosingParens = vStr.lastIndexOf(')');
String text = vStr.substring(lastOpeningParens + 1, lastClosingParens);
This is much more readable than a regex.
I assume that there are only whitespace characters between : and the opening bracket (:
Pattern regex = Pattern.compile(":\\s+\\((.+)\\)");
You'll find your results in capturing group 1.
Try this regex:
.*\((.*)\)
$1 will contain the required string

How to find and replace a substring?

For example I have such a string, in which I must find and replace multiple substrings, all of which start with #, contains 6 symbols, end with ' and should not contain ) ... what do you think would be the best way of achieving that?
Thanks!
Edit:
just one more thing I forgot, to make the replacement, I need that substring, i.e. it gets replaces by a string generated from the substring being replaced.
yourNewText=yourOldText.replaceAll("#[^)]{6}'", "");
Or programmatically:
Matcher matcher = Pattern.compile("#[^)]{6}'").matcher(yourOldText);
StringBuffer sb = new StringBuffer();
while(matcher.find()){
matcher.appendReplacement(sb,
// implement your custom logic here, matcher.group() is the found String
someReplacement(matcher.group());
}
matcher.appendTail(sb);
String yourNewString = sb. toString();
Assuming you just know the substrings are formatted like you explained above, but not exactly which 6 characters, try the following:
String result = input.replaceAll("#[^\\)]{6}'", "replacement"); //pattern to replace is #+6 characters not being ) + '
You must use replaceAll with the right regular expression:
myString.replaceAll("#[^)]{6}'", "something")
If you need to replace with an extract of the matched string, use a a match group, like this :
myString.replaceAll("#([^)]{6})'", "blah $1 blah")
the $1 in the second String matches the first parenthesed expression in the first String.
this might not be the best way to do it but...
youstring = youstring.replace("#something'", "new stringx");
youstring = youstring.replace("#something2'", "new stringy");
youstring = youstring.replace("#something3'", "new stringz");
//edited after reading comments, thanks

Categories