nested brackets in Regex - java

I'm using regex in java to extract data from user entries like these:
String entry1 = "add to xx16,John Doe";
String entry2 = "add to ab20,John Doe;Richard Roe;John Stiles";
They can have multiple names, but if they do have them they have to be seperated by semicolons. Now I want a regex to give me back those parameters. I came up with that
Pattern pattern = Pattern.compile("add to ([a-z|\\d]*),([a-zA-Z]*\\s[a-zA-Z]*)[;([a-zA-Z]*\\s[a-zA-Z]*)]*");
Matcher matcher = pattern.matcher(entry);
matcher.matches();
//get inputs with matcher.group();
It works well with entries like entry1, but it does not with entry2. Does anyone see my mistake ?

You can't have an unlimited variable number of groups like that. Just capture them all then split.
Since you are not testing whether or not the matcher actually matches, I assume you don't care too much about validating the format of the input and just want to grab the values. So you could do something like this:
Pattern pattern = Pattern.compile("add to (\\w+),(.*)");
Matcher matcher = pattern.matcher(entry);
matcher.matches(); // FIXME: check if it matches
String[] names = matcher.group(2).split(";");

Skiping the first 7 characters ( "add to " ) from the beginning using the regular expression (?:^.{7}) and then splitting it with either with comma or semicolon [,;]
String entry1 = "add to xx16,John Doe";
String entry2 = "add to ab20,John Doe;Richard Roe;John Stiles";
String[] str = entry1.split("(?:^.{7})|[,;]");
for(String st : str ){
System.out.println(st);
}
str = entry2.split("(?:^.{7})|[,;]");
for(String st : str ){
System.out.println(st);
}
output:
xx16
John Doe
ab20
John Doe
Richard Roe
John Stiles

Related

Regex to split by special characters with exceptions JAVA

I am very new to regular expressions and Im having difficulties with this one:
I want to split a String when found this patern but also this one "text here" and this one "text here"^^ (this should be considered as one in the output).
Note these symbols: ^^
The three cases can be repeated each many times or can be one after the other and are always separated by spaces.
Example:
<\herewouldbeurl> "HEY THERE" "Asioc-project.org/."^^<\anotherurl/>
would produce:
1.<\herewouldbeurl>
2."HEY THERE"
3."Asioc-project.org/."^^<\anotherurl/>
Ive found this: "\s+(?=(?:(?<=[a-zA-Z])\"(?=[A-Za-z])|\"[^\"]\"|[^\"])$)" but does not work for the third case.
Any ideas?
Don't use split(). Use a find() loop.
String input = "<\\herewouldbeurl> \"HEY THERE\" \"Asioc-project.org/.\"^^<\\anotherurl/>";
Pattern p = Pattern.compile("\".*?\"(?:\\^\\^)?");
Matcher m = p.matcher(input);
int start = 0;
while (m.find()) {
String s = input.substring(start, m.start()).trim();
if (! s.isEmpty())
System.out.println(s);
System.out.println(m.group());
start = m.end();
}
String s = input.substring(start).trim();
if (! s.isEmpty())
System.out.println(s);
Output
<\herewouldbeurl>
"HEY THERE"
"Asioc-project.org/."^^
<\anotherurl/>

Finding Upper Case in String Array and extracting it out

I have an array input like this which is an email id in reverse order along with some data:
MOC.OOHAY#ABC.PQRqwertySDdd
MOC.OOHAY#AB.JKLasDDbfn
MOC.OOHAY#XZ.JKGposDDbfn
I want my output to come as
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
How should I filter the string since there is no pattern?
There is a pattern, and that is any upper case character which is followed either by another upper case letter, a period or else the # character.
Translated, this would become something like this:
String[] input = new String[]{"MOC.OOHAY#ABC.PQRqwertySDdd","MOC.OOHAY#AB.JKLasDDbfn" , "MOC.OOHAY#XZ.JKGposDDbfn"};
Pattern p = Pattern.compile("([A-Z.]+#[A-Z.]+)");
for(String string : input)
{
Matcher matcher = p.matcher(string);
if(matcher.find())
System.out.println(matcher.group(1));
}
Yields:
MOC.OOHAY#ABC.PQR
MOC.OOHAY#AB.JKL
MOC.OOHAY#XZ.JKG
Why do you think there is no pattern?
You clearly want to get the string till you find a lowercase letter.
You can use the regex (^[^a-z]+) to match it and extract.
Regex Demo
Simply split on [a-z], with limit 2:
String s1 = "MOC.OOHAY#ABC.PQRqwertySDdd";
String s2 = "MOC.OOHAY#AB.JKLasDDbfn";
String s3 = "MOC.OOHAY#XZ.JKGposDDbfn";
System.out.println(s1.split("[a-z]", 2)[0]);
System.out.println(s2.split("[a-z]", 2)[0]);
System.out.println(s3.split("[a-z]", 2)[0]);
Demo.
You can do it like this:
String arr[] = { "MOC.OOHAY#ABC.PQRqwertySDdd", "MOC.OOHAY#AB.JKLasDDbfn", "MOC.OOHAY#XZ.JKGposDDbfn" };
for (String test : arr) {
Pattern p = Pattern.compile("[A-Z]*\\.[A-Z]*#[A-Z]*\\.[A-Z.]*");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.println(m.group());
}
}

split string to parts by regex

I need to split string to parts by regex.
String is: AA2 DE3 or AA2 and I need this 2.
String code = "AA2 DE3";
String[] parts = code.split("^(AA(\\d)+){1}( )?(\\w*)?$");
and here length of parts is 0.
I tried
String[] parts = code.split("^((AA){1}(\\d)+){1}( )?(\\w*)?$");
but also 0.
It looks like wrong regex. Although it works fine in PHP.
edit
In fact I need to get the number after "AA" but there may be additional word after it.
Assuming that you only want to extract the number and don't care to validate the rest:
Pattern pattern = Pattern.compile("^AA(\\d+)");
Matcher matcher = pattern.matcher(code);
String id = null;
if (matcher.find()) {
id = matcher.group(1);
}
Note that I rewrite (\d)+ to (\d+) to capture all the digits. When there are more than one digit, your regex captures only the last digit.
If you want to keep your validation:
Pattern pattern = Pattern.compile("^AA(\\d+) ?\\w*$");
With String.split, the regex specifies what goes in-between the parts. In your case, your regex matches the entire string, so there is nothing else, hence it returns nothing.
If you want to match this regex, use:
Pattern pattern = Pattern.compile("^(AA(\\d)+){1}( )?(\\w*)?$");
Matcher matcher = pattern.matcher(code);
if(!matcher.matches()) {
// the string doesn't match your regex; handle this
} else {
String part1 = matcher.group(1);
String part2 = matcher.group(2);
// repeat the above line similarly for the third and forth groups
// do something with part1/part2/...
}
It is indeed better to use Pattern and Matcher APIs for this.
This is purely from academic purpose in case you must use String#split only. You can use this lookbehind based regex for split:
(?<=AA\\d{1,999}) *
Code:
String[] toks = "AA2 DE3".split( "(?<=AA\\d{1,999}) *" ); // [AA2, DE3]
OR
String[] toks = "AA2".split( "(?<=AA\\d{1,999}) *" ); // [AA2]
If you'd like String#split() to handle the Pattern/Matcher for you, you can use:
String[] inputs = { "AA2 DE3", "AA3", "BB45 FG6", "XYZ321" };
try {
for (String input : inputs) {
System.out.println(
input.split(" ")[0].split("(?=\\d+$)", 2)[1]
);
}
} catch (ArrayIndexOutOfBoundsException e) {
System.err.println("Input format is incorrect.");
}
}
Output :
2
3
45
321
If the input is guaranteed to start with AA, you can also use
System.out.println(
input.split(" ")[0].split("(?<=^AA)")[1]
);

Java Regex: how to capture multiple matches in the same line

I am trying to match a regex pattern in Java, and I have two questions:
Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.
For example I have this input line:
1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0 2819357575609397706
And I am interested in these strings:
Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
Screen+Refresh+Rate%7C120HZ
Assuming the known beginning is filter=**, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&) should get you what you need. Use Matcher.find() to get all occurrences of the pattern in a given string. Using the test string you provided, the following:
final Pattern p = Pattern.compile("(?:filter=\\*\\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1));
}
Will output:
1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**
If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.
String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
"Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
"%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
String[] pair = eq.split(param);
params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
System.out.format("%s = %s\n", param.getKey(), param.getValue());
}
Output
subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.
in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:
"filter=([^&]+)&"
Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+ in java:
Pattern p = Pattern.compile("(?<=filter=\\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
EDIT:
Added [^&*]+ to the end of the regex to prevent the ** from being included in the second match.
EDIT2:
Changed regular expression to use lookbehind.
The regex you're looking for is
Screen\+Refresh\+Rate[^&]*
You could use Matcher.find() to find all matches.
are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&".
your can try the following:
String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Pattern p = Pattern.compile("filter=(?:\\**)([^&]+?)(?:\\**)&");
Matcher matcher = p.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}

Java split regular expression

If I have a string, e.g.
setting=value
How can I remove the '=' and turn that into two separate strings containing 'setting' and 'value' respectively?
Thanks very much!
Two options spring to mind.
The first split()s the String on =:
String[] pieces = s.split("=", 2);
String name = pieces[0];
String value = pieces.length > 1 ? pieces[1] : null;
The second uses regexes directly to parse the String:
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p.matcher(s);
if (m.matches()) {
String name = m.group(1);
String value = m.group(2);
}
The second gives you more power. For example you can automatically lose white space if you change the pattern to:
Pattern p = Pattern.compile("\\s*(.*?)\\s*=\\s*(.*)\\s*");
You don't need a regular expression for this, just do:
String str = "setting=value";
String[] split = str.split("=");
// str[0] == "setting", str[1] == "value"
You might want to set a limit if value can have an = in it too; see the javadoc

Categories