How to split after some specific word - java

I have to split word when find ^ and _live in String. I am able to split only match ^ but I have to split when match ^ and _live. The result should be
[ab,cb,db,qw]
How will be done?
String usergroup="ab_live^cb_live^db_live^qw_live";
String[] userGroupParts = usergroup.split("\\^");
List<String> listUserGroupParts = Arrays.asList(userGroupParts);
Set<String> SMGroupDetails = new HashSet<String>(listUserGroupParts);

We can say that split separator should be _live^ or just _live at the end of the line.
That's why regular expression must consist of _live and capturing group (\^|$) witch includes two alternatives separated by | (or):
1st alternative \^ matches the character ^ literally (by using escape character before) and 2nd alternative $ asserts position at the end of a line.
String[] userGroupParts = usergroup.split("_live(\\^|$)");

This should do it...
public static void main(String[] args) {
String usergroup = "ab_live^cb_live^db_live^qw_live";
String[] userGroupParts = usergroup.split("\\^");
for (int i=0; i<userGroupParts.length; i++) userGroupParts[i] = userGroupParts[i].split("\\_")[0];
for (String s : userGroupParts) System.out.println(s);
}
i.e. you first split by ^ and then you cycle through the resulting strings splitting on _, retaining only the bit prior to the underscore

I would not use method split, of class java.lang.String, but rather regular expressions.
You want to create a list of all the occurrences of the letters that appear after the literal character ^ and before the string _live. The following code achieves this. (Explanations after the code.)
/* Required imports:
* java.util.ArrayList
* java.util.List
* java.util.regex.Matcher
* java.util.regex.Pattern
*/
String usergroup="ab_live^cb_live^db_live^qw_live";
Pattern pattern = Pattern.compile("\\^?(\\w+)_live");
Matcher matcher = pattern.matcher(usergroup);
List<String> listUserGroupParts = new ArrayList<>();
while (matcher.find()) {
listUserGroupParts.add(matcher.group(1));
}
System.out.println(listUserGroupParts);
The regular expression, i.e. the argument to method compile in the above code, looks for the following:
the literal character ^, followed by
at least one word character, followed by
the literal string _live
Note that part 2 is surrounded by brackets which means it is referred to as a group.
The while loop searches usergroup for the next occurrence of the regular expression and each time it finds an occurrence, it extracts the contents of the group and adds it to the List.
The output when running the above code is:
[ab, cb, db, qw]

Related

“minus-sign” into this regular expression. How?

Consider:
String str = "XYhaku(ABH1235-123548)";
From the above string, I need only "ABH1235-123548" and so far I created a regular expression:
Pattern.compile("ABH\\d+")
But it returns false. So what the correct regular expression for it?
I would just grab whatever is in the parenthesis:
Pattern p = Pattern.compile("\\((?<data>[A-Z\\d]+\\-\\d+)\\)");
Or, if you want to be even more open (any parenthesis):
Pattern p = Pattern.compile("\\((?<data>.+\\)\\)");
Then just nab it:
String s = /* some input */;
Matcher m = p.matcher(s);
if (m.find()) { //just find first
String tag = m.group("data"); //ABH1235-123548
}
\d only matches digits. To include other characters, use a character class:
Pattern.compile("ABH[\\d-]+")
Note that the - must be placed first or last in the character class, because otherwise it will be treated as a range indicator ([A-Z] matching every letter between A and Z, for example). Another way to avoid that would be to escape it, but that adds two more backslashes to your string...

Java match whole word in String

I have an ArrayList<String> which I iterate through to find the correct index given a String. Basically, given a String, the program should search through the list and find the index where the whole word matches. For example:
ArrayList<String> foo = new ArrayList<String>();
foo.add("AAAB_11232016.txt");
foo.add("BBB_12252016.txt");
foo.add("AAA_09212017.txt");
So if I give the String AAA, I should get back index 2 (the last one). So I can't use the contains() method as that would give me back index 0.
I tried with this code:
String str = "AAA";
String pattern = "\\b" + str + "\\b";
Pattern p = Pattern.compile(pattern);
for(int i = 0; i < foo.size(); i++) {
// Check each entry of list to find the correct value
Matcher match = p.matcher(foo.get(i));
if(match.find() == true) {
return i;
}
}
Unfortunately, this code never reaches the if statement inside the loop. I'm not sure what I'm doing wrong.
Note: This should also work if I searched for AAA_0921, the full name AAA_09212017.txt, or any part of the String that is unique to it.
Since word boundary does not match between a word char and underscore you need
String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";
Here, (?<=_|\b) positive lookbehind requires a word boundary or an underscore to appear before the str, and the (?=_|\b) positive lookahead requires an underscore or a word boundary to appear right after the str.
See this regex demo.
If your word may have special chars inside, you might want to use a more straight-forward word boundary:
"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"
Here, the negative lookbehind (?<![^\\W_]) fails the match if there is a word character except an underscore ([^...] is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W and a _), and the (?![^\W_]) negative lookahead fails the match if there is a word char except the underscore after the str.
Note that the second example has a quoted search string, so that even AA.A_str.txt could be matched well with AA.A.
See another regex demo

Splitting line based on comma, strange line

I have the following line comma separated,
LanguageID=0,LastKnownPeriod="Active",c_MultiPartyCall={Counter=1,TimeStamp=1394539271448},LTH={Data=["1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||","1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||"}
Using split method, I can get comma seperated values but the actual problem comes when the text c_MultiPartyCall={Counter=1,TimeStamp=1394539271448}, since comma is found within itself.
so the word after splitting should be,
LanguageID=0
LastKnownPeriod="Active"
c_MultiPartyCall={Counter=1,TimeStamp=1394539271448} (comma is again found within the word)
LTH={Data=["1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||","1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||"} (comma is again found within the word in curly brackets)
I tried with following code but didn't work:
String arr[]=input_line.split("(.*!{),(.*!})");
for (int i=0;i<arr.length;i++)
System.out.println(arr[i]);
Please advise.
Use regular expressions instead:
([\w_]+=(?:\{[\w=_,\[\]"\|:\.\s-]*\}))|([^,]+)
This will group the line into 4 sections:
LanguageID=0
LastKnownPeriod="Active"
c_MultiPartyCall={Counter=1,TimeStamp=1394539271448}
LTH={Data=["1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||","1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||"}
Code:
import java.util.regex.*;
public class JavaRegEx {
public static void main(String[] args) {
String line = "LanguageID=0,LastKnownPeriod=\"Active\",c_MultiPartyCall={Counter=1,TimeStamp=1394539271448},LTH={Data=[\"1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||\",\"1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||\"}";
Pattern pattern = Pattern.compile("([\\w_]+=(?:\\{[\\w=_,\\[\\]\"\\|:\\.\\s-]*\\}))|([^,]+)");
Matcher matcher = pattern.matcher(line);
while(matcher.find())
System.out.println(matcher.group(0));
}
}
First, just splitting on a comma isn't how CSV works
a,b,"c,d"
has only three values, a, b, and c,d. I recommend using a CSV parser, like opencsv. CSV is not terribly complicated, but it isn't as simple as split by comma.
Second, your CSV data is invalid because you have a quote and a comma in a field that isn't quoted.
In othe words, if you want the values a, b","c, then the CSV is
a,"b"",""c"
(Note that quotes are double-escaped.)
Otherwise, it is impossible to tell what fields you actually wanted. A CSV parser would choke on your data.
While it might be possible to do this by split(), it's much easier to match the actual tokens (where split() matches the delimiters between the tokens). Your tokens all consist of one or more of any characters other than comma or brace, optionally followed by a pair of braces enclosing some non-brace characters (which can include commas):
[^,{}]+(?:\{[^{}]+\})?
The Java code for that would be:
List<String> matchList = new ArrayList<String>();
Pattern p = Pattern.compile("[^,{}]+(?:\\{[^{}]+\\})?");
Matcher m = p.matcher(s);
while (m.find()) {
matchList.add(m.group());
}
But it looks like you can break it down further:
Pattern p = Pattern.compile("(\\w+)=([^,{}]+|\\{[^{}]+\\})");
Matcher m = p.matcher(TEST_STR);
while (m.find()) {
System.out.printf("%nname = %s%nvalue = %s%n",
m.group(1), m.group(2));
}
output:
name = LanguageID
value = 0
name = LastKnownPeriod
value = "Active"
name = c_MultiPartyCall
value = {Counter=1,TimeStamp=1394539271448}
name = LTH
value = {Data=["1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakAccountID|0|1000||","1|MTC|01.01.1970 15:00:00|0.0|7|-1|OnPeakA
ccountID|0|1000||"}

Regex of base classes

I am trying to create a hexadecimal calculator but I have a problem with the regex.
Basically, I want the string to only accept 0-9, A-E, and special characters +-*_
My code keeps returning false no matter how I change the regex, and the adding the asterisk is giving me a PatternSyntaxException error.
public static void main(String[] args) {
String input = "1A_16+2B_16-3C_16*4D_16";
String regex = "[0-9A-E+-_]";
System.out.println(input.matches(regex));
}
Also whenever I add the * as part of the regex it gives me this error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 9
[0-9A-E+-*_]+
^
You need to match more than one character with your regex. As it currently stands you only match one character.
To match one or more characters add a + to the end of the regex
[0-9A-E+-_]+
Also to match a * just add a star in the brackets so the final regex would be
[0-9A-E+\\-_*]+
You need to escape the - otherwise the regex thinks you want to accept all character between + and _ which is not what you want.
You regex is OK there should be no exceptions, just add + at the end of regex which means one or more characters like those in brackets, and it seems you wanted * as well
"[0-9A-E+-_]+"
public static boolean isValidCode (String code) {
Pattern p = Pattern.compile("[fFtTvV\\-~^<>()]+"); //a-zA-Z
Matcher m = p.matcher(code);
return m.matches();
}

Explain working Regex expression

Found this code that breaks out CSV fields if contains double-quotes
But I don't really understand the pattern matching from regex
If someone can give me an step by step explanation of how this expression evaluates a pattern it would be appreciated
"([^\"]*)"|(?<=,|^)([^,]*)(?:,|$)
Thanks
====
Old posting
This is working well for me - either it matches on "two quotes and whatever is between them", or "something between the start of the line or a comma and the end of the line or a comma". Iterating through the matches gets me all the fields, even if they are empty. For instance,
the quick, "brown, fox jumps", over, "the",,"lazy dog" breaks down into
the quick "brown, fox jumps" over "the" "lazy dog"
import java.util.ArrayList;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class CSVParser {
/*
* This Pattern will match on either quoted text or text between commas, including
* whitespace, and accounting for beginning and end of line.
*/
private final Pattern csvPattern = Pattern.compile("\"([^\"]*)\"|(?<=,|^)([^,]*)(?:,|$)");
private ArrayList<String> allMatches = null;
private Matcher matcher = null;
private String match = null;
private int size;
public CSVParser() {
allMatches = new ArrayList<String>();
matcher = null;
match = null;
}
public String[] parse(String csvLine) {
matcher = csvPattern.matcher(csvLine);
allMatches.clear();
String match;
while (matcher.find()) {
match = matcher.group(1);
if (match!=null) {
allMatches.add(match);
}
else {
allMatches.add(matcher.group(2));
}
}
size = allMatches.size();
if (size > 0) {
return allMatches.toArray(new String[size]);
}
else {
return new String[0];
}
}
public static void main(String[] args) {
String lineinput = "the quick,\"brown, fox jumps\",over,\"the\",,\"lazy dog\"";
CSVParser myCSV = new CSVParser();
System.out.println("Testing CSVParser with: \n " + lineinput);
for (String s : myCSV.parse(lineinput)) {
System.out.println(s);
}
}
}
I try to give you hints and the needed vocabulary to find very good explanations on regular-expressions.info
"([^\"]*)"|(?<=,|^)([^,])(?:,|$)
() is a group
* is a quantifier
If there is a ? right after the opening bracket then it's a special group, here (?<=,|^) is a lookbehind assertion.
Square brackets declare a character class e.g. [^\"]. This one is a special one, because of the ^ at the start. It is a negated character class.
| denotes an alternation, i.e. an OR operator.
(?:,|$) is a non capturing group
$ is a special character in regex, it is an anchor (which matches the end of the string)
"([^\"]*)"|(?<=,|^)([^,]*)(?:,|$)
() capture group
(?:) non-capture group
[] any character within the bracket matches
\ escape character used to match operators aka "
(?<=) positive lookbehind (looks to see if the contained matches before the marker)
| either or operator (matches either side of the pipe)
^ beginning of line operator
* zero or more of the preceding character
$ or \z end of line operator
For future reference please bookmark a a good regex reference it can explain each part quite well.

Categories