Help in writing a Regular expression for a string - java

Hi please help me out in getting regular expression for the
following requirement
I have string type as
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
String sStr = "Every 1 form(s) - Earth: (Air,Fire) ";
from these strings after using regex I need to get values as "Air,Earth,Water sea,Fire" and "Air,Fire"
that means after
String vStrRegex ="Air,Earth,Water sea,Fire";
String sStrRegex ="Air,Fire";
All the strings that are input will be seperated by ":" and values needed are inside brackets always
Thanks

The regular expression would be something like this:
: \((.*?)\)
Spelt out:
Pattern p = Pattern.compile(": \\((.*?)\\)");
Matcher m = p.matcher(vStr);
// ...
String result = m.group(1);
This will capture the content of the parentheses as the first capture group.

Try the following:
\((.*)\)\s*$
The ending $ is important, otherwise you'll accidentally match the "(s)".

If you have each string separately, try this expression: \(([^\(]*)\)\s*$
This would get you the content of the last pair of brackets, as group 1.
If the strings are concatenated by : try to split them first.

Ask yourself if you really need a regex. Does the text you need always appear within the last two parentheses? If so, you can keep it simple and use substring instead:
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
int lastOpeningParens = vStr.lastIndexOf('(');
int lastClosingParens = vStr.lastIndexOf(')');
String text = vStr.substring(lastOpeningParens + 1, lastClosingParens);
This is much more readable than a regex.

I assume that there are only whitespace characters between : and the opening bracket (:
Pattern regex = Pattern.compile(":\\s+\\((.+)\\)");
You'll find your results in capturing group 1.

Try this regex:
.*\((.*)\)
$1 will contain the required string

Related

Java Regex , How to get equivalent value of the particular substring using regex in java

I have following string:
mydata ="\nmyName=ram\nmySalaryL=$2,256.00\n";
How to get my name and salary values using regex?
Try using Regex Capture Groups. They are placed in parenthesis and give you access to a matched substring.
https://docs.oracle.com/javase/tutorial/essential/regex/groups.html
A reasonable regex might be:
java.util.regex.Pattern regex = Pattern.compile("\nmyName=(\w+)\nmySalayL=$((\d{1,3},)*(\d{1,3})\.\d{2}))\n");
java.util.regex.Matcher match = regex.matcher(inputString);
if(match.matches()) {
String myName = match.group(1);
String mySalary = match.group(2);
}
Please pay attention that capture group 0 is the entire string and matches() needs to be called before accessing the capture groups, because it does the actual regex-matching.
No need to use regular expressions at all! It is enough to split by \n character and then by = to get those information:
String mydata ="\nmyName=ram\nmySalaryL=$2,256.00\n";
String[] arr = mydata.split("\n");
System.out.println(arr[1].split("=")[1]);
System.out.println(arr[2].split("=")[1]);

split string based on text qualifier regex java

I want to split a string based on text qualifier for example
"1","10411721","MikeTison","08/11/2009","21/11/2009","2800.00","002934538","051","New York","10411720-002",".\Images\b.jpg",".\RTF\b.rtf"
Qualifer="
Spliter = ,
I want to split string based on Spliter , but if Spliter comes inside qualifier " than ignore it and return string including Spliter .
Regular expression i am using is (?:|,)(\"(?:[^\"]+|\"\")*\"|[^,]*)
but this regular expression only returns commas,please help me in this perspective as i am new to regular expressions
please note that if we have newline characters in string ie \r\n than it should ignore newline character
"1","10411","Muis","a","21/11/2009","2800.06","0029683778","03005136851","Awan","10411720-001",".\Images\a.jpg",".\RTF\a.rtf"
"2","08/10/2009","07:32","Call","On-Net","030092343242342376543","Monk","00:00","1.500","0.000","10.000","0.200"
"2","08/10/2009","02:50","Call","Off-Net","030092343242342376543","Une","08:00","1.500","2.000","20.000","3.500"
"2","09/10/2009","03:55","SMS","On-Net","030092343242342376543","Mink","00:00","1.500","0.000","5.000","100.500"
"2","09/10/2009","12:30","Call","Off-Net","030092343242342376543","Zog","01:01","3.500","3.000","70.000","6.500"
"2","09/10/2009","09:11","Call","On-Net","030092343242342376543","Monk","02:30","2.00","2.000","90.000","4.000"
Probably easiest solution is not searching for place to split, but finding elements which you want to return. In your case these elements
starts "
ends with "
have no " inside.
So you try with something like
String data = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
Pattern p = Pattern.compile("\"([^\"]+)\"");
Matcher m = p.matcher(data);
while(m.find()){
System.out.println(m.group(1));
}
Output:
1
10411721
MikeTison
08/11/2009
21/11/2009
2800.00
002934538
051
New York
10411720-002
.\Images\b.jpg
.\RTF\b.rtf
You can split using this regex:
String[] arr = input.split( "(?=(([^\"]*\"){2})*[^\"]*$),+" );
This regex will split on commas if those are outside double quotes by using a lookahead to make sure there are even number of quotes after a comma.
Remove the first and the last character of the whole string. Then split with ","
String test = "\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
if (test.length() > 0)
test = test.substring(1, test.length()-1);
System.out.println(Arrays.toString(test.split("\",\"")));
This works even if you have new line character..try it out
String str="\"1\",\"10411721\",\"MikeTison\",\"08/11/2009\",\"21/11/2009\",\"2800.00\",\"002934538\",\"051\",\"New York\",\"10411720-002\",\".\\Images\\b.jpg\",\".\\RTF\\b.rtf\"";
System.out.println(Arrays.toString(str.split(",(?=([^\"]*\"[^\"]*\")*[^\"]*$)")));

Splitting strings delimited by [[ ]] in java?

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.
You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology
A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");
It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]
Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Regular expression match a-alphanumeric&b-digits&c-digits

I have query about java regular expressions. Actually, I am new to regular expressions.
So I need help to form a regex for the statement below:
Statement: a-alphanumeric&b-digits&c-digits
Possible matching Examples: 1) a-90485jlkerj&b-34534534&c-643546
2) A-RT7456ffgt&B-86763454&C-684241
Use case: First of all I have to validate input string against the regular expression. If the input string matches then I have to extract a value, b value and c value like
90485jlkerj, 34534534 and 643546 respectively.
Could someone please share how I can achieve this in the best possible way?
I really appreciate your help on this.
you can use this pattern :
^(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)$
In the case what you try to match is not the whole string, just remove the anchors:
(?i)a-([0-9a-z]++)&b-([0-9]++)&c-([0-9]++)
explanations:
(?i) make the pattern case-insensitive
[0-9]++ digit one or more times (possessive)
[0-9a-z]++ the same with letters
^ anchor for the string start
$ anchor for the string end
Parenthesis in the two patterns are capture groups (to catch what you want)
Given a string with the format a-XXX&b-XXX&c-XXX, you can extract all XXX parts in one simple line:
String[] parts = str.replaceAll("[abc]-", "").split("&");
parts will be an array with 3 elements, being the target strings you want.
The simplest regex that matches your string is:
^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)
With your target strings in groups 1, 2 and 3, but you need lot of code around that to get you the strings, which as shown above is not necessary.
Following code will help you:
String[] texts = new String[]{"a-90485jlkerj&b-34534534&c-643546", "A-RT7456ffgt&B-86763454&C-684241"};
Pattern full = Pattern.compile("^(?i)a-([\\da-z]+)&b-(\\d+)&c-(\\d+)");
Pattern patternA = Pattern.compile("(?i)([\\da-z]+)&[bc]");
Pattern patternB = Pattern.compile("(\\d+)");
for (String text : texts) {
if (full.matcher(text).matches()) {
for (String part : text.split("-")) {
Matcher m = patternA.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()).split("&")[0]);
}
m = patternB.matcher(part);
if (m.matches()) {
System.out.println(part.substring(m.start(), m.end()));
}
}
}
}

Regular Expression - Java

For the string value "ABCD_12" (including quotes), I would like to extract only the content and exclude out the double quotes i.e. ABCD_12 . My code is:
private static void checkRegex()
{
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9])+\"");
Matcher findMatches = stringPattern.matcher("\"ABC_12\"");
if (findMatches.matches())
System.out.println("Match found" + findMatches.group(0));
}
Now I have tried doing findMatches.group(1);, but that only returns the last character in the string (I did not understand why !).
How can I extract only the content leaving out the double quotes?
Try this regex:
Pattern.compile("\"([a-zA-Z_0-9]+)\"");
OR
Pattern.compile("\"([^\"]+)\"");
Problem in your code is a misplaced + outside right parenthesis. Which is causing capturing group to capture only 1 character (since + is outside) and that's why you get only last character eventually.
A nice simple (read: non-regex) way to do this is:
String myString = "\"ABC_12\"";
String myFilteredString = myString.replaceAll("\"", "");
System.out.println(myFilteredString);
gets you
ABC_12
You should change your pattern to this:
final Pattern stringPattern = Pattern.compile("\"([a-zA-Z_0-9]+)\"");
Note that the + sign was moved inside the group, since you want the character repetition to be part of the group. In the code you posted, what you were actually searching for was a repetition of the group, which consisted in a single occurence of a single characters in [a-zA-Z_0-9].
If your pattern is strictly any text in between double quotes, then you may be better off using substring:
String str = "\"ABC_12\"";
System.out.println(str.substring(1, str.lastIndexOf('\"')));
Assuming it is a bit more complex (double quotes in between a larger string), you can use the split() function in the Pattern class and use \" as your regex - this will split the string around the \" so you can easily extract the content you want
Pattern p = Pattern.compile("\"");
// Split input with the pattern
String[] result =
p.split(str);
for (int i=0; i<result.length; i++)
System.out.println(result[i]);
}
http://docs.oracle.com/javase/1.4.2/docs/api/java/util/regex/Pattern.html#split%28java.lang.CharSequence%29

Categories