Java split with certain patern - java

String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";
String[] ab = abc.split("(\\d+),[a-z]");
System.out.println(ab[0]);
Expected Output:
abc_123
low
101.111.111.111,100.254.132.156
abc
1
The problem is i am not able to find appropriate regex for this pattern.

I would suggest to not solve all problems with one regular expression.
It seems that your initial string contains values that are separated by ",". So split those values with ",".
Then iterate the output of that process; and "join" those elements that are IP addresses (as it seems that this is what you are looking for).
And just for the sake of it: keep in mind that IP addresses are actually pretty complicated; a pattern "to match em all" can be found here

You could use lookahead and lookbehind to check, if 3 digits and a . at the correct place are preceding or following the ,:
String[] ab = abc.split("(?<!\\.\\d{3}),|,(?!\\d{3}\\.)");

String[] ab = abc.split(",");
System.out.println(ab[0]);
System.out.println(ab[1]);
int i = 2;
while(ab[i].matches("[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}")) {
if(i > 2) System.out.print(",");
System.out.print(ab[i++]);
}
System.out.println();
System.out.println(ab[i++]);
System.out.println(ab[i++]);

first split them into array by , ,then apply regex to check whether it is in desired formate or not.If yes then concate all these separated by,
String abc ="abc_123,low,101.111.111.111,100.254.132.156,abc,1";//or something else.
String[] split = abc.split(",");
String concat="";
for(String data:split){
boolean matched=data.matches("[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}");
if(matched){
concat=concat+","+data;
}else{
System.out.println(data);
}
}
if(concat.length()>0)
System.out.println(concat.substring(1));
}

Related

How can I split a string without knowing the split characters a-priori?

For my project I have to read various input graphs. Unfortunately, the input edges have not the same format. Some of them are comma-separated, others are tab-separated, etc. For example:
File 1:
123,45
67,89
...
File 2
123 45
67 89
...
Rather than handling each case separately, I would like to automatically detect the split characters. Currently I have developed the following solution:
String str = "123,45";
String splitChars = "";
for(int i=0; i < str.length(); i++) {
if(!Character.isDigit(str.charAt(i))) {
splitChars += str.charAt(i);
}
}
String[] endpoints = str.split(splitChars);
Basically I pick the first row and select all the non-numeric characters, then I use the generated substring as split characters. Is there a cleaner way to perform this?
Split requires a regexp, so your code would fail for many reasons: If the separator has meaning in regexp (say, +), it'll fail. If there is more than 1 non-digit character, your code will also fail. If you code contains more than exactly 2 numbers, it will also fail. Imagine it contains hello, world - then your splitChars string becomes " , " - and your split would do nothing (that would split the string "test , abc" into two, nothing else).
Why not make a regexp to fetch digits, and then find all sequences of digits, instead of focussing on the separators?
You're using regexps whether you want to or not, so let's make it official and use Pattern, while we are at it.
private static final Pattern ALL_DIGITS = Pattern.compile("\\d+");
// then in your split method..
Matcher m = ALL_DIGITS.matcher(str);
List<Integer> numbers = new ArrayList<Integer>();
// dont use arrays, generally. List is better.
while (m.find()) {
numbers.add(Integer.parseInt(m.group(0)));
}
//d+ is: Any number of digits.
m.find() finds the next match (so, the next block of digits), returning false if there aren't any more.
m.group(0) retrieves the entire matched string.
Split the string on \\D+ which means one or more non-digit characters.
Demo:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "123,45", "67,89", "125 89", "678 129" };
for (String s : arr) {
System.out.println(Arrays.toString(s.split("\\D+")));
}
}
}
Output:
[123, 45]
[67, 89]
[125, 89]
[678, 129]
Why not split with [^\d]+ (every group of nondigfit) :
for (String n : "123,456 789".split("[^\\d]+")) {
System.out.println(n);
}
Result:
123
456
789

Split a string based on pattern and merge it back

I need to split a string based on a pattern and again i need to merge it back on a portion of string.
for ex: Below is the actual and expected strings.
String actualstr="abc.def.ghi.jkl.mno";
String expectedstr="abc.mno";
When i use below, i can store in a Array and iterate over to get it back. Is there anyway it can be done simple and efficient than below.
String[] splited = actualstr.split("[\\.\\.\\.\\.\\.\\s]+");
Though i can acess the string based on index, is there any other way to do this easily. Please advise.
You do not understand how regexes work.
Here is your regex without the escapes: [\.\.\.\.\.\s]+
You have a character class ([]). Which means there is no reason to have more than one . in it. You also don't need to escape .s in a char class.
Here is an equivalent regex to your regex: [.\s]+. As a Java String that's: "[.\\s]+".
You can do .split("regex") on your string to get an array. It's very simple to get a solution from that point.
I would use a replaceAll in this case
String actualstr="abc.def.ghi.jkl.mno";
String str = actualstr.replaceAll("\\..*\\.", ".");
This will replace everything with the first and last . with a .
You could also use split
String[] parts = actualString.split("\\.");
string str = parts[0]+"."+parts[parts.length-1]; // first and last word
public static String merge(String string, String delimiter, int... partnumbers)
{
String[] parts = string.split(delimiter);
String result = "";
for ( int x = 0 ; x < partnumbers.length ; x ++ )
{
result += result.length() > 0 ? delimiter.replaceAll("\\\\","") : "";
result += parts[partnumbers[x]];
}
return result;
}
and then use it like:
merge("abc.def.ghi.jkl.mno", "\\.", 0, 4);
I would do it this way
Pattern pattern = Pattern.compile("(\\w*\\.).*\\.(\\w*)");
Matcher matcher = pattern.matcher("abc.def.ghi.jkl.mno");
if (matcher.matches()) {
System.out.println(matcher.group(1) + matcher.group(2));
}
If you can cache the result of
Pattern.compile("(\\w*\\.).*\\.(\\w*)")
and reuse "pattern" all over again this code will be very efficient as pattern compilation is the most expensive. java.lang.String.split() method that other answers suggest uses same Pattern.compile() internally if the pattern length is greater then 1. Meaning that it will do this expensive operation of Pattern compilation on each invocation of the method. See java.util.regex - importance of Pattern.compile()?. So it is much better to have the Pattern compiled and cached and reused.
matcher.group(1) refers to the first group of () which is "(\w*\.)"
matcher.group(2) refers to the second one which is "(\w*)"
even though we don't use it here but just to note that group(0) is the match for the whole regex.

Java String Regex Divide - Always the Same Pattern

I never understood how to make properly regex to divide my Strings.
I have this types of Strings example = "on[?a, ?b, ?c]";
Sometimes I have this, Strings example2 = "not clear[?c]";
For the first Example I would like to divide into this:
[on, a, b, c]
or
String name = "on";
String [] vars = [a,b,c];
And for the second example I would like to divide into this type:
[not clear, c]
or
String name = "not clear";
String [] vars = [c];
Thanks alot in advance guys ;)
If you know the character set of your identifiers, you can simply do a split on all of the text that isn't in that set. For example, if your identifiers only consist of word characters ([a-zA-Z_0-9]) you can use:
String[] parts = "on[?a, ?b, ?c]".split("[\\W]+");
String name = parts[0];
String[] vars = Arrays.copyOfRange(parts, 1, parts.length);
If your identifiers only have A-Z (upper and lower) you could replace \\W above with ^A-Za-z.
I feel that this is more elegant than using a complex regular expression.
Edit: I realize that this will have issues with your second example "not clear". If you have no option of using something like an underscore instead of a space there, you could do one split on [? (or substring) to get the "name", and another split on the remainder, like so:
String s = "not clear[?a, ?b, ?c]";
String[] parts = s.split("\\[\\?"); //need the '?' so we don't get an extra empty array element in the next split
String name = parts[0];
String[] vars = parts[1].split("[\\W]+");
This comes close, but the problem is the third remembered group is actually repeated so it only captures the last match.
(.*?)\[(?:\s*(?:\?(.*?)(?:\s*,\s*\?(.*?))*)\s*)?]
For example, the first one you list on[?a, ?b, ?c] would give group 1 as on, 2 as a 3 as c. If you are using perl, you could the g flag to apply a regex to a line multiple times and use this:
my #tokens;
while ( my $line =~ /\s*(.*?)\s*[[,\]]/g ) {
push( #tokens, $1 );
}
Note, i did not actually test the perl code, just off the top of my head. It should give you the idea though
String[] parts = example.split("[^\\w ]");
List<String> x = new ArrayList<String>();
for (int i = 0; i < parts.length; i++) {
if (!"".equals(parts[i]) && !" ".equals(parts[i])) {
x.add(parts[i]);
}
}
This will work as long as you don't have more than one space separating your non-space characters. There's probably a cleverer way of filtering out the null and " " strings.

Help in writing a Regular expression for a string

Hi please help me out in getting regular expression for the
following requirement
I have string type as
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
String sStr = "Every 1 form(s) - Earth: (Air,Fire) ";
from these strings after using regex I need to get values as "Air,Earth,Water sea,Fire" and "Air,Fire"
that means after
String vStrRegex ="Air,Earth,Water sea,Fire";
String sStrRegex ="Air,Fire";
All the strings that are input will be seperated by ":" and values needed are inside brackets always
Thanks
The regular expression would be something like this:
: \((.*?)\)
Spelt out:
Pattern p = Pattern.compile(": \\((.*?)\\)");
Matcher m = p.matcher(vStr);
// ...
String result = m.group(1);
This will capture the content of the parentheses as the first capture group.
Try the following:
\((.*)\)\s*$
The ending $ is important, otherwise you'll accidentally match the "(s)".
If you have each string separately, try this expression: \(([^\(]*)\)\s*$
This would get you the content of the last pair of brackets, as group 1.
If the strings are concatenated by : try to split them first.
Ask yourself if you really need a regex. Does the text you need always appear within the last two parentheses? If so, you can keep it simple and use substring instead:
String vStr = "Every 1 nature(s) - Universe: (Air,Earth,Water sea,Fire)";
int lastOpeningParens = vStr.lastIndexOf('(');
int lastClosingParens = vStr.lastIndexOf(')');
String text = vStr.substring(lastOpeningParens + 1, lastClosingParens);
This is much more readable than a regex.
I assume that there are only whitespace characters between : and the opening bracket (:
Pattern regex = Pattern.compile(":\\s+\\((.+)\\)");
You'll find your results in capturing group 1.
Try this regex:
.*\((.*)\)
$1 will contain the required string

Use String.split() with multiple delimiters

I need to split a string base on delimiter - and .. Below are my desired output.
AA.BB-CC-DD.zip ->
AA
BB
CC
DD
zip
but my following code does not work.
private void getId(String pdfName){
String[]tokens = pdfName.split("-\\.");
}
I think you need to include the regex OR operator:
String[]tokens = pdfName.split("-|\\.");
What you have will match:
[DASH followed by DOT together] -.
not
[DASH or DOT any of them] - or .
Try this regex "[-.]+". The + after treats consecutive delimiter chars as one. Remove plus if you do not want this.
You can use the regex "\W".This matches any non-word character.The required line would be:
String[] tokens=pdfName.split("\\W");
The string you give split is the string form of a regular expression, so:
private void getId(String pdfName){
String[]tokens = pdfName.split("[\\-.]");
}
That means to split on any character in the [] (we have to escape - with a backslash because it's special inside []; and of course we have to escape the backslash because this is a string). (Conversely, . is normally special but isn't special inside [].)
Using Guava you could do this:
Iterable<String> tokens = Splitter.on(CharMatcher.anyOf("-.")).split(pdfName);
For two char sequence as delimeters "AND" and "OR" this should be worked. Don't forget to trim while using.
String text ="ISTANBUL AND NEW YORK AND PARIS OR TOKYO AND MOSCOW";
String[] cities = text.split("AND|OR");
Result : cities = {"ISTANBUL ", " NEW YORK ", " PARIS ", " TOKYO ", " MOSCOW"}
pdfName.split("[.-]+");
[.-] -> any one of the . or - can be used as delimiter
+ sign signifies that if the aforementioned delimiters occur consecutively we should treat it as one.
I'd use Apache Commons:
import org.apache.commons.lang3.StringUtils;
private void getId(String pdfName){
String[] tokens = StringUtils.split(pdfName, "-.");
}
It'll split on any of the specified separators, as opposed to StringUtils.splitByWholeSeparator(str, separator) which uses the complete string as a separator
String[] token=s.split("[.-]");
It's better to use something like this:
s.split("[\\s\\-\\.\\'\\?\\,\\_\\#]+");
Have added a few other characters as sample. This is the safest way to use, because the way . and ' is treated.
Try this code:
var string = 'AA.BB-CC-DD.zip';
array = string.split(/[,.]/);
You may also specified regular expression as argument in split() method ..see below example....
private void getId(String pdfName){
String[]tokens = pdfName.split("-|\\.");
}
s.trim().split("[\\W]+")
should work.
you can try this way as split accepts varargs so we can pass multiple parameters as delimeters
String[]tokens = pdfName.split("-",".");
you can pass as many parameters that you want.
If you know the sting will always be in the same format, first split the string based on . and store the string at the first index in a variable. Then split the string in the second index based on - and store indexes 0, 1 and 2. Finally, split index 2 of the previous array based on . and you should have obtained all of the relevant fields.
Refer to the following snippet:
String[] tmp = pdfName.split(".");
String val1 = tmp[0];
tmp = tmp[1].split("-");
String val2 = tmp[0];
...

Categories