split string on comma, but not commas inside parenthesis - java

Now I have a task about Java IO , I need to read file line by line and save this data into database, But Now I got a problem.
"desk","12","15","(small,median,large)"
I want to use string.split(",") to split by , and save data into each column. But I found that the data in the () has also been split, I do not want to split (small, median, large) and I want to keep this integrity. How can I do that? I know I can use regualr expression , but I really do not know how to do it?

You could solve this by using Pattern and Matcher. Any solution using split would just seem like a nasty workaround. Here's an example:
public static void main(String[] args){
String s = "\"desk\",\"12\",\"15\",\"(small,median,large)\"";
Pattern p = Pattern.compile("\".+?\"");
Matcher m = p.matcher(s);
List<String> matches = new ArrayList<String>();
while (m.find()){
matches.add(m.group());
}
System.out.println(matches);
}

or, if Java must be:), you can split by "\\s*\"\\s*,\\s*\"" and add afterwards the " if necessary to the beginning of the first field and to the end of the second.
I put \s because I see that you also have blanks separators - 15",blank"(small

(\(.+?\)|\w+)
the code above matches the result below this will allow for a more flexible solution that some of the other posted ones. The syntax for the regular expression is in another answer on this page just use this regular expression instead
desk
12
15
(small,median,large)

Related

how to break the string using keywords using regex

I have a scenario where i need to break the below input string based on the keywords using regex.
Keywords are UPRCAS, REPLC, LOWCAS and TUPIL.
String input = "UPRCAS-0004-abcdREPLC-0003-123TUPIL-0005-adf2344LOWCAS-0003-ABCD";
The output should be as follows
UPRCAS-00040-abcd
REPLC-0003-123
TUPIL-0005-adf2344
LOWCAS-00030-ABCD
How can i achieve this using java regex.
I have tried using split by '-' and using regex but both the approach gives an array of strings and again i have to process each string and combine 3 strings together to form UPRCAS-00040-abcd. I felt this is not the efficient way to do as it takes an extra array and process them back.
String[] tokens = input.split("-");
String[] r = input.split("(?=\\p{Upper})");
Please let me know if we can split the string using regex based on the keyword. Basically i need to extract the string between the keyword boundary.
Edited question after understanding the limitation of existing problem
The regex should be generic to extract the string from input between the UPPERCASE characters
The regex should not contains keywords to split the string.
I understood that, it is a bad idea to add new keyword everytime in regex pattern for searching. My expectation is to be a generic as possible.
Thanks all for your time. Really appreciate it.
Split using the following regex:
(?=UPRCAS|REPLC|LOWCAS|TUPIL)
The (?=xxx) is a zero-width positive lookahead, meaning that it matches the empty space immediately preceding one of the 4 keywords.
See Regular-Expressions.info for more information: Lookahead and Lookbehind Zero-Length Assertions
Test
String input = "UPRCAS-0004-abcdREPLC-0003-123TUPIL-0005-adf2344LOWCAS-0003-ABCD";
String[] output = input.split("(?=UPRCAS|REPLC|LOWCAS|TUPIL)");
for (String value : output)
System.out.println(value);
Output
UPRCAS-0004-abcd
REPLC-0003-123
TUPIL-0005-adf2344
LOWCAS-0003-ABCD
You can try this regex:
\w+-\w+-(?:[a-z0-9]+|[A-Z]+)
Demo: https://regex101.com/r/etKBjI/3

How can I push regex matches to array in java?

I've currently got a string, of which I want to use certain parts. With these parts I want to do various things, like pushing them to an array or showing them in a text area.
Fist I try to split method. It delete my regex matches and prints other part of string. I want to delete other part and print the regex match.
How can I do this?
For example:
There are lot of youtube links like this
https://www.youtube.com/watch?v=qJuoXM7G322&list=PLRfAW_jVDn06M7qxHIwlowgLY3Io1pG6z&index=7
I want to take only simple video link with this expression
"https:\\/\\/www.youtube.com\\/watch\\?v=.{11}"
when I use this code :
String ytLink = linkArea.getText();
String regexp = "https:\\/\\/www.youtube.com\\/watch\\?v=.{11}";
String[] tokenVal;
tokenVal = ytLink.split(regexp);
System.out.println("Count of Links : "+tokenVal.length);
for (String t : tokenVal) {
System.out.println(t);
}
It prints
"&list=PLRfAW_jVDn06M7qxHIwlowgLY3Io1pG6z&index=7"
I want to output be like this:
"https://www.youtube.com/watch?v=SATL2mTfZO0"
"when I Right this code :"
You are splitting the string with that regular expression, which is not the correct tool for the job.
It is dividing your example string into:
"" // The bit before the separator.
"https://www.youtube.com/watch?v=qJuoXM7G322" // The separator
"&list=PLRfAW_jVDn06M7qxHIwlowgLY3Io1pG6z&index=7" // The bit after the separator
but then discarding the separator, so you'd get back a 2-element array containing:
"" // The bit before the separator.
"&list=PLRfAW_jVDn06M7qxHIwlowgLY3Io1pG6z&index=7" // The bit after the separator
If you want to get the thing that matches the regex, you'd need to use Pattern and Matcher:
Pattern pattern = Pattern.compile("https:\\/\\/www.youtube.com\\/watch\\?v=.{11}");
Matcher matcher = pattern.matcher(ytLink);
if (matcher.find()) {
System.out.println(matcher.group());
}
(I don't entirely trust your escaped backslashes in your regular expression; however the pattern is not really important to the principle)
You can negate your regex using the negative lookaround: (?!pattern)
See also : How to negate the whole regex?

get the last portion of the link using java regex

I have an arraylist links. All links having same format abc.([a-z]*)/\\d{4}/
List<String > links= new ArrayList<>();
links.add("abc.com/2012/aa");
links.add("abc.com/2014/dddd");
links.add("abc.in/2012/aa");
I need to get the last portion of every link. ie, the part after domain name. Domain name can be anything(.com, .in, .edu etc).
/2012/aa
/2014/dddd
/2012/aa
This is the output i want. How can i get this using regex?
Thanks
Some people, when confronted with a problem, think “I know, I'll use
regular expressions.” Now they have two problems.
(see here for background)
Why use regex ? Perhaps a simpler solution is to use String.split("/") , which gives you an array of substrings of the original string, split by /. See this question for more info.
Note that String.split() does in fact take a regex to determine the boundaries upon which to split. However you don't need a regex in this case and a simple character specification is sufficient.
Try with below regex and use regex grouping feature that is grouped based on parenthesis ().
\.[a-zA-Z]{2,3}(/.*)
Pattern description :
dot followed by two or three letters followed by forward slash then any characters
DEMO
Sample code:
Pattern pattern = Pattern.compile("\\.[a-zA-Z]{2,3}(/.*)");
Matcher matcher = pattern.matcher("abc.com/2012/aa");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
output:
/2012/aa
Note:
You can make it more precise by using \\.[a-zA-Z]{2,3}(/\\d{4}/.*) if there are always 4 digits in the pattern.
String result = s.replaceAll("^[^/]*","");
s would be the string in your list.
Some people, when confronted with a problem, think “I know, I'll use regular expressions.” Now they have two problems.
Why not just use the URI class?
output = new URI(link).getPath()
Try this one and use the second capturing group
(.*?)(/.*)
Use foreach loop to iterate over list.
Use substring and indexOf('/').
FOR EXAMPLE
String s="abc.com/2014/dddd";
System.out.println(s.substring(s.indexOf('/')));
OUTPUT
/2014/dddd
Or you can go for split method.
System.out.println(s.split("/",2)[1]);//OUTPUT:2014/dddd --->you need to add /

Java Repeat regular expression

I have following RegEx which should match e.g. some ids in brackets:
[swpf_02-7679, swpf_02-7622, ...]
Pattern p = Pattern.compile("[\\[\\s]*?[a-z]{1,8}[0-9]*?_[0-9]{2,}\\-[0-9]+[\\s]*?\\]");
The goal is now to combine this pattern with "split" at "," to fit the string [swpf_02-7679, swpf_02-7622] and not only [swpf_02-7679] like the posted RegEx above.
Can someone give me a hint?
Just remove the [ and ] from the string then split at the ,
The easiest way to do what you want to do I think is to just remove the '[' and ']' in front and back (use String.subString()), then split on comma with String.split() and use the regex on each individual string so returned (adjust the regex to remove the brackets of course).
Ok, assuming that you want the bits that the id's are like "swpf_02-7622", then split on the comma, and loop through the remains, trimming as you go. Some thing like
List<String> cleanIds = new ArrayList<String>();
for(String id : ids.split(","))
cleanIds.add(id.trim());
If you want rid of the "swpf_" bits, then id.substring(5).
Finally, to git rid of the square brackets, use id.startsWith('[') and id.endsWith(']') .
Why don't you use the Java StringTokenizer class and then just use the regex on the tokens you get out of this? You can post-process them to include the brackets you need or modify the regex slightly.
As #was and #garyh already mentioned the simplest way is to remove [], then split your list using `String.split("\s*,\S*"), then match each member using your pattern.
You can also match your string multiple times using start position as a end position of the previous iteration:
Pattern p = .... // your pattern in capturing brackets ()
Matcher m = p.matcher(str);
for (int start = 0; m.find(start); start = m.end()) {
String element = m.group(1);
// do what you need with the element.
}
If you simply want to extract all the codes in you list you could use this regular expression:
[^,\s\[\]]+
Getting all the matches from the following string:
[swpf_02-7679, swpf_02-762342, swpf_02-7633 , swpf_02-723422]
Would give you the following results:
swpf_02-7679
swpf_02-762342
swpf_02-7633
swpf_02-723422

Split number string on java using regex

I want to using regex on Java to split a number string.
I using a online regex tester test the regex is right.
But in Java is wrong.
Pattern pattern = Pattern.compile("[\\\\d]{1,4}");
String[] results = pattern.split("123456");
// I expect 2 results ["1234","56"]
// Actual results is ["123456"]
Anything do I missing?
I knows this question is boring. But I wanna to solve this problem.
Answer
Pattern pattern = Pattern.compile("[\\d]{1,4}");
String[] results = pattern.split("123456");
// Results length is 0
System.out.println(results.length);
is not working. I have try it. It's will return nothing on the results.
Please try before answer it.
Sincerely thank the people who helped me.
Solution:
Pattern pattern = Pattern.compile("([\\d]{1,4})");
Matcher matcher = pattern.matcher("123456");
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group(1));
}
Output 2 results ["1234","56"]
Pattern pattern = Pattern.compile("[\\\\d]{1,4}")
Too many backslashes, try [\\d]{1,4} (you only have to escape them once, so the backslash in front of the d becomes \\. The pattern you wrote is actually [\\d]{1,4} (a literal backslash or a literal d, one to four times).
When Java decided to add regular expressions to the standard library, they should have also added a regular expression literal syntax instead of shoe-horning it over Strings (with the unreadable extra escaping and no compile-time syntax checking).
Solution:
Pattern pattern = Pattern.compile("([\\d]{1,4})");
Matcher matcher = pattern.matcher("123456");
List<String> results = new ArrayList<String>();
while (matcher.find()) {
results.add(matcher.group(1));
}
Output 2 results ["1234","56"]
You can't do it in one method call, because you can't specify a capturing group for the split, which would be needed to break up into four char chunks.
It's not "elegant", but you must first insert a character to split on, then split:
String[] results = "123456".replaceAll("....", "$0,").split(",");
Here's the output:
System.out.println(Arrays.toString(results)); // prints [1234, 56]
Note that you don't need to use Pattern etc because String has a split-by-regex method, leading to a one-line solution.

Categories