Exclude null regex groups - java

Given the following line, I would like to extract some values using Pattern class in Java:
user1#machine1:command#user2#machine2:command....
Two commands are extracted:
one to be executed on machine1 using user1
one to be executed on machine2 using user2
If I use the following regex
"([^#]+)#([^:]+):([^#]+)(?:#([^#]+)#([^:]+):([^#]+))*"
the elements in group 1, 4, 7, ... are users
the elements in group 2, 5, 8, ... are machines
the elements in group 3, 6, 9, ... are commands
The only problem is that for only one command, the matcher detects null groups for 4, 5, 6.
Is there any Regex option for not receiving null values, for that particular situation?

Instead of using one regex for finding all the users, groups, and commands at once, I'd suggest splitting the process in two: First, find blocks of user#group:command, then identify the parts in that block. This way it will work for any number of blocks.
First, trim down your regex to match just one "block":
Pattern p = Pattern.compile("([^#]+)#([^:]+):([^#]+)");
String input = "user1#machine1:command1#user2#machine2:command2#user3#machine3:command3";
Then, either, use String.split("#") to split the blocks and use the regex to match that block:
for (String block : input.split("#")) {
Matcher m = p.matcher(block);
if (m.matches()) {
System.out.println(m.groupCount());
for (int i = 0; i < m.groupCount(); i++) {
System.out.println(m.group(i + 1));
}
}
}
Or just repeatedly find more matches in the original string:
Matcher m = p.matcher(input);
while (m.find()) {
System.out.println(m.groupCount());
for (int i = 0; i < m.groupCount(); i++) {
System.out.println(m.group(i + 1));
}
}

Why not just check?
if (myMatcher.find()) {
if (myMatcher.group(4) == null) {
// TODO
}
// etc

I think you have a bigger problem when there are 3 or more commands. You should probably just .split("#") the string first, and then deal with each one individually.

Related

How can I split a string without knowing the split characters a-priori?

For my project I have to read various input graphs. Unfortunately, the input edges have not the same format. Some of them are comma-separated, others are tab-separated, etc. For example:
File 1:
123,45
67,89
...
File 2
123 45
67 89
...
Rather than handling each case separately, I would like to automatically detect the split characters. Currently I have developed the following solution:
String str = "123,45";
String splitChars = "";
for(int i=0; i < str.length(); i++) {
if(!Character.isDigit(str.charAt(i))) {
splitChars += str.charAt(i);
}
}
String[] endpoints = str.split(splitChars);
Basically I pick the first row and select all the non-numeric characters, then I use the generated substring as split characters. Is there a cleaner way to perform this?
Split requires a regexp, so your code would fail for many reasons: If the separator has meaning in regexp (say, +), it'll fail. If there is more than 1 non-digit character, your code will also fail. If you code contains more than exactly 2 numbers, it will also fail. Imagine it contains hello, world - then your splitChars string becomes " , " - and your split would do nothing (that would split the string "test , abc" into two, nothing else).
Why not make a regexp to fetch digits, and then find all sequences of digits, instead of focussing on the separators?
You're using regexps whether you want to or not, so let's make it official and use Pattern, while we are at it.
private static final Pattern ALL_DIGITS = Pattern.compile("\\d+");
// then in your split method..
Matcher m = ALL_DIGITS.matcher(str);
List<Integer> numbers = new ArrayList<Integer>();
// dont use arrays, generally. List is better.
while (m.find()) {
numbers.add(Integer.parseInt(m.group(0)));
}
//d+ is: Any number of digits.
m.find() finds the next match (so, the next block of digits), returning false if there aren't any more.
m.group(0) retrieves the entire matched string.
Split the string on \\D+ which means one or more non-digit characters.
Demo:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "123,45", "67,89", "125 89", "678 129" };
for (String s : arr) {
System.out.println(Arrays.toString(s.split("\\D+")));
}
}
}
Output:
[123, 45]
[67, 89]
[125, 89]
[678, 129]
Why not split with [^\d]+ (every group of nondigfit) :
for (String n : "123,456 789".split("[^\\d]+")) {
System.out.println(n);
}
Result:
123
456
789

Extract string between a set of multiple limiters with groups

As title says, I've a string and I want to extract some data from It.
This is my String:
text = "|tab_PRO|1|1|#tRecordType#||0|tab_PRO|";
and I want to extract all the data between the pipes: tab_PRO, 1, 1...and so on
.
I've tried:
Pattern p = Pattern.compile("\\|(.*?)\\|");
Matcher m = p.matcher(text);
while(m.find())
{
for(int i = 1; i< 10; i++) {
test = m.group(i);
System.out.println(test);
}
}
and with this i get the first group that's tab_PRO. But i also get an error
java.lang.IndexOutOfBoundsException: No group 2
Now, probably I didn't understand quite well how the groups works, but I thought that with this I could get the remaining data that I need. I'm not able to understand what I'm missing.
Thanks in advance
Use String.split(). Take into account it expects a regex as an argument, and | is a reserved regex operand, so you'll need to escape it with a \. So, make it two \ so \| won't be interpreted as if you're using an - invalid - escape sequence for the | character:
String[] parts = text.split("\\|");
See it working here:
https://ideone.com/WibjUm
If you want to go with your regex approach, you'll need to group and capture every repetition of characters after every | and restrict them to be anything except |, possibly using a regex like \\|([^\\|]*).
In your loop, you iterate over m.find() and just use capture group 1 because its the only group every match will have.
String text = "|tab_PRO|1|1|#tRecordType#||0|tab_PRO|";
Pattern p = Pattern.compile("\\|([^\\|]*)");
Matcher m = p.matcher(text);
while(m.find()){
System.out.println(m.group(1));
}
https://ideone.com/RNjZRQ
Try using .split() or .substring()
As mentioned in the comments, this is easier done with String.split.
As for your own code, you are unnecessarily using the inner loop, and that's leading to that exception. You only have one group, but the for loop will cause you to query more than one group. Your loop should be as simple as:
Pattern p = Pattern.compile("(?<=\\|)(.*?)\\|");
Matcher m = p.matcher(text);
while (m.find()) {
String test = m.group(1);
System.out.println(test);
}
And that prints
tab_PRO
1
1
#tRecordType#
0
tab_PRO
Note that I had to use a look-behind assertion in your regex.

Java - Extract string from pattern

Given some strings that look like this:
(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)
(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)
(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)
(btw, they are solutions for a puzzle which I write a program for :) )
They all share this pattern
"(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"
For 1 solution : How can I get the values with this given pattern?
So for the first solution I will get an collection,list,array (doesn't matter) like this:
[4,5,1,7,3,6,9,2,8]
You've done most of the work actually by providing the pattern. All you need to do is use capturing groups where the . are (and escape the rest).
I put your inputs in a String array and got the results into a List of integers (as you said, you can change it to something else). As for the pattern, you want to capture the dots; this is done by surrounding them with ( and ). The problem in your case is that the whole string is full of them, so we need to quote / escape them out (meaning, tell the regex compiler that we mean the literal / character ( and )). This can be done by putting the part we want to escape between \Q and \E.
The code below shows a coherent (though maybe not effective) way to do this. Just be careful with using the right amount of \ in the right places:
public class Example {
public static void main(String[] args) {
String[] inputs = new String[3];
inputs[0] = "(((((((((((((4)+13)*5)/1)+7)+12)*3)-6)-11)+9)*2)/8)-10)";
inputs[1] = "(((((((((((((4)+13)*6)/1)+5)+12)*2)-7)-11)+8)*3)/9)-10)";
inputs[2] = "(((((((((((((4)+13)*6)/1)+7)+12)*2)-8)-11)+5)*3)/9)-10)";
List<Integer> results;
String pattern = "(((((((((((((.)+13)*.)/.)+.)+12)*.)-.)-11)+.)*.)/.)-10)"; // Copy-paste from your question.
pattern = pattern.replaceAll("\\.", "\\\\E(.)\\\\Q");
pattern = "\\Q" + pattern;
Pattern p = Pattern.compile(pattern);
Matcher m;
for (String input : inputs) {
m = p.matcher(input);
results = new ArrayList<>();
if (m.matches()) {
for (int i = 1; i < m.groupCount() + 1; i++) {
results.add(Integer.parseInt(m.group(i)));
}
}
System.out.println(results);
}
}
}
Output:
[4, 5, 1, 7, 3, 6, 9, 2, 8]
[4, 6, 1, 5, 2, 7, 8, 3, 9]
[4, 6, 1, 7, 2, 8, 5, 3, 9]
Notes:
You are using a single ., which means
Any character (may or may not match line terminators)
So if you have a number there which is not a single digit or a single character which is not a number (digit), something will go wrong either in the matches or parseInt. Consider \\d to signify a single digit or \\d+ for a number instead.
See Pattern for more info on regex in Java.

Regex - extract indefinite number of hits

The method getPolygonPoints() (see below) becomes a String name as parameter, which looks something like this:
points={{-100,100},{-120,60},{-80,60},{-100,100},{-100,100}}
The first number stands for the x-coordinate, the second for the y coordinate. For example,the first point is
x=-100
y=100
The second point is
x=-120
y=60
and so on.
Now I want to extract the points of the String and put them in a ArrayList, which has to look like this at the end:
[-100, 100, -120, 60, -80, 60, -100, 100, -100, 100]
The special feature here is, that the number of points in the given String changes and is not always the same.
I have written the following code:
private ArrayList<Integer> getPolygonPoints(String name) {
// the regular expression
String regGroup = "[-]?[\\d]{1,3}";
// compile the regular expression into a pattern
Pattern regex = Pattern.compile("\\{(" + regGroup + ")");
// the mather
Matcher matcher;
ArrayList<Integer> points = new ArrayList<Integer>();
// matcher that will match the given input against the pattern
matcher = regex.matcher(name);
int i = 1;
while(matcher.find()) {
System.out.println(Integer.parseInt(matcher.group(i)));
i++;
}
return points;
}
The first x coordinate is extracted correctly, but then a IndexOutOfBoundsException is thrown. I think that happens, because group 2 is not defined.
I think at first I have to count the points and then iterate over this number. Inside of the iteration I would put the int values in the ArrayList with a simple add(). But I don't know how to do this. Maybe I don't understand the regex part at this point. Especially how the groups work.
Please help!
String points = "{{-100,100},{-120,60},{-80,60},{-100,100},{-100,100}}";
String[] strs = points.replaceAll("(\\{|\\})", "").split(",");
ArrayList<Integer> list = new ArrayList<Integer>(strs.length);
for (String s : strs)
{
list.add(Integer.valueOf(s));
}
The part you don't seem to understand about the regex API is that the capture group number "reset" with every call to find(). Or, to put it another way: the number of the capture group is its position in the pattern, not in the input string.
You're also going about this the wrong way. You should match the whole construct you're looking for, in this case the {x,y} pairs. I'm assuming you don't want to validate the format of the whole string, so we can ignore the outside brackets and comma:
Pattern p = Pattern.compile("\\{(-?\\d+),(-?\\d+)\\}");
Matcher m = p.matcher(name);
while (m.find()) {
String x = m.group(1);
String y = m.group(2);
// parse and add to list
}
Alternately, since you don't care about which coordinate is X and which is Y, you can even do:
Matcher m = Pattern.compile("-?\\d+").matcher(name);
while (m.find()) {
String xOrY = m.group();
// parse etc.
}
Now, if you want to validate the input as well, I'd say that's a separate concern, I wouldn't necessarily try to do it in the same step as the parsing to keep the regex readable. (It might be possible in this case but if you don't need it why bother in the first place.)
You can also try this regex:
((-?\d+)\s*,\s*(-?\d+))
It will give you three groups:
Group 1 : x
Group 2 : y
Group 3 : x,y
You can use which one is required to you.
How about doing it in just one line:
List<String> list = Arrays.asList(name.replaceAll("(^\\w+=\\{+)|(\\}+$)", "").split("\\{?,\\}?"));
Your whole method would then be:
private ArrayList<Integer> getPolygonPoints(String name) {
return new ArrayList<String>(Arrays.asList(name.replaceAll("(^\\w+=\\{+)|(\\}+$)", "").split("\\{?,\\}?")));
}
This works by first stripping off the leading and trailing text, then splits on commas optionally surrounded by braces.
BTW You really should return the abstract type List, not the concrete implementation ArrayList.

Discard the leading and trailing series of a character, but retain the same character otherwise

I have to process a string with the following rules:
It may or may not start with a series of '.
It may or may not end with a series of '.
Whatever is enclosed between the above should be extracted. However, the enclosed string also may or may not contain a series of '.
For example, I can get following strings as input:
''''aa''''
''''aa
aa''''
''''aa''bb''cc''''
For the above examples, I would like to extract the following from them (respectively):
aa
aa
aa
aa''bb''cc
I tried the following code in Java:
Pattern p = Pattern.compile("[^']+(.+'*.+)[^']*");
Matcher m = p.matcher("''''aa''bb''cc''''");
while (m.find()) {
int count = m.groupCount();
System.out.println("count = " + count);
for (int i = 0; i <= count; i++) {
System.out.println("-> " + m.group(i));
}
But I get the following output:
count = 1
-> aa''bb''cc''''
-> ''bb''cc''''
Any pointers?
EDIT: Never mind, I was using a * at the end of my regex, instead of +. Doing this change gives me the desired output. But I would still welcome any improvements for the regex.
This one works for me.
String str = "''''aa''bb''cc''''";
Pattern p = Pattern.compile("^'*(.*?)'*$");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1));
}
have a look at the boundary matcher of Java's Pattern class (http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html). Especially $ (=end of a line) might be interesting. I also recommend the following eclipse plugin for regex testing: http://sourceforge.net/projects/quickrex/ it gives you the possibilty to exactly see what will be the match and the group of your regex for a given test string.
E.g. try the following pattern: [^']+(.+'*.+)+[^'$]
I'm not that good in Java, so I hope the regex is sufficient. For your examples, it works well.
s/^'*(.+?)'*$/$1/gm

Categories