Java Split String Consecutive Delimiters - java

I have a need to split a string that is passed in to my app from an external source. This String is delimited with a caret "^" and here is how I split the String into an Array
String[] barcodeFields = contents.split("\\^+");
This works fine except that some of the passed in fields are empty and I need to account for them. I need to insert either "" or "null" or "empty" into any missing field.
And the missing fields have consecutive delimiters. How do I split a Java String into an array and insert a string such as "empty" as placeholders where there are consecutive delimiters?

The answer by mureinik is quite close, but wrong in an important edge case: when the trailing delimiters are in the end. To account for that you have to use:
contents.split("\\^", -1)
E.g. look at the following code:
final String line = "alpha ^beta ^^^";
List<String> fieldsA = Arrays.asList(line.split("\\^"));
List<String> fieldsB = Arrays.asList(line.split("\\^", -1));
System.out.printf("# of fieldsA is: %d\n", fieldsA.size());
System.out.printf("# of fieldsB is: %d\n", fieldsB.size());
The above prints:
# of fieldsA is: 2
# of fieldsB is: 5

String.split leaves an empty string ("") where it encounters consecutive delimiters, as long as you use the right regex. If you want to replace it with "empty", you'd have to do so yourself:
String[] split = barcodeFields.split("\\^");
for (int i = 0; i < split.length; ++i) {
if (split[i].length() == 0) {
split[i] = "empty";
}
}

Using ^+ means one (or more consecutive) carat characters. Remove the plus
String[] barcodeFields = contents.split("\\^");
and it won't eat empty fields. You'll get (your requested) "" for empty fields.

The following results in [blah, , bladiblah, moarblah]:
String test = "blah^^bladiblah^moarblah";
System.out.println(Arrays.toString(test.split("\\^")));
Where the ^^ are replaced by a "", the empty String

Related

Regex to remove special characters in java

I have a string with a couple of special characters and need to remove only a few (~ and `). I have written the code below, but when I print the splitted strings, getting empty also with values.
String str = "ABC123-xyz`~`XYZ 1.7A";
String[] str1= varinaces.split("[\\~`]");
for(int i=0; i< str1.length ; i++){
System.out.println("str==="+ parts[i] );
}
Output:
str===ABC123-xyz
str===
str===
str===XYZ 1.7A
why empty strings also printing here ?
You’re splitting on one special char... split on 1 or more:
String[] str1= varinaces.split("[~`]+");
Note also that the tilda ~ doesn’t need escaping.
Its because when you use the .split() method it returns a String array of 4 items shown below:
String[4] { "ABC123-xyz", "", "", "XYZ 1.7A" }
And then in your for loop you printing all items of that array. You can use the following to resolve it:
for(int i=0; i< str1.length ; i++){
if(parts[i].compareTo("") > 0) {
System.out.println("str==="+ parts[i] );
}
}
The split method returns the stuff around every match of the regex. Your regex, [~`], matches to a single character that is either "~" or "`".
The parts of the string separated by matches to that regex are determined as follows:
The string "ABC123-xyz" is returned because it is split off the given string at the character: "`".
In between that character and the next match, "~", is the empty string, and so on.
If you want it to match to more, use [~`]+

How to escape delimiter in data?

I have a list of titles that I want to save as a String:
- title1
- title2
- title|3
Now, I want to save this as a single line String delimited by |, which would mean it ends up like this: title1|title2|title|3.
But now, when I split the String:
String input = "title1|title2|title|3";
String[] splittedInput = input.split("\\|");
splittedInput will be the following array: {"title1", "title2", "title", "3"}.
Obviously, this is not what I want, I want the third entry of the array to be title|3.
Now my question: how do I correctly escape the | in the titles so that when I split the String I end up with the correct array of three titles, instead of 4?
#Gábor Bakos
Running this code snippet:
String input = "title1|title2|title\\|3";
String[] split = input.split("(?<!\\\\)\\|");
for (int i = 0; i < split.length; i++) {
split[i] = split[i].replace("\\\\(?=\\|)", "");
}
System.out.println(Arrays.toString(split));
I get this output: [title1, title2, title\|3]. What am I doing wrong?
You can use anything. For example with \:
"title1|title2|title\\|3".split("(?<!\\\\)\\|").map(_.replaceAll("\\\\(?=\\|)", "")) //Scala syntax
Resulting:
Array(title1, title2, title|3)
The final mapping is required to remove the escaping character too.
(?<!\\\\) is look behind, (?=\\|) is an extra look-ahead for the escaped |.
Well if you use a TSV format the chosen separator must never be left unescaped in the data.
You could simply escape your data (for ex, title1|title2|title\|3) and you would then split on (?<!\\)| (negative lookbehind).
In Java, it gives:
public static void main(String[] args) {
// prints out [title1, title2, title|3, title|4]
System.out.println(parsePipeSeparated("title1|title2|title\\|3|title\\|4"));
}
private static List<String> parsePipeSeparated(String input) {
return Stream.of(input.split("(?<!\\\\)\\|"))
.map(escapedText -> escapedText.replace("\\|", "|"))
.collect(Collectors.toList());
}
Use another separator, for instance "title1,title2,title|3", instead of "title1|title2|title|3". And then split(",")

how to remove multiple token from string array in java by split along with [ ]

how to remove multiple token from string array in java by split along with [ ]
String Order_Menu_Name= [pohe-7, puri-3];
String [] s2=Order_Menu_Name.split("-|,");
int j = 0;
//out.println("s2.length "+s2.length);
while(j<s2.length){ }
and expected output should be each value separate.
e,g pohe 7 puri 3
Your question is not clear. Assuming that your string contains "pohe-7, puri-3" you can split them using a separator such as "," or "-" or whitespace. See below.
String Order_Menu_Name= "[pohe-7, puri-3]";
To remove "[" and "]" from the above String. you can use Java's replace method as follow:
Order_Menu_Name = Order_Menu_Name.replace("[", "");
Order_Menu_Name = Order_Menu_Name.replace("]", "");
You can replace the above two lines with one using regex expression that matches [....] if you wish to.
After you removed the above characters then you can split your string as follow.
String[] chunks = Order_Menu_Name.split(",");
i = 0;
while(chunks.length) {
System.out.println(chunks[i]);
i++;
}
You can pass one or two params to the Java split() method, one being the regex expression that defines the pattern to be found and the second argument is limit, specifying how many chunks to return, see below:
public String[] split(String regex, int limit)
or
public String[] split(String regex)
For example
String Str = new String("Welcome-to-Stackoverflow.com");
for (String retval: Str.split("-", 3)){
System.out.println(retval);
}
When splitting the above Str using seperator "-" you should get 3 chunks of strings as follow:
Welcome
to
Stackoverflow.com
If you pass the split function a limit of 2 instead of three then you get the following:
Welcome
to-Stackoverflow.com
Notice above "to-Stckoverflow.com" is returned as is because we limited the chunks to 2.

String.replace() not replacing all occurrences

I have a very long string which looks similar to this.
355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,....
When I tried using the following code to remove the number 382 from the string.
String str = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,...."
str = str.replace(",382,", ",");
But it seems that not all occurrences are being replaced. The string which originally had above 3000 occurrences still was left with about 630 occurrences after replacing.
Is the capability of String.replace() limited? If so, is there a possible way of achieving what I need?
You need to replace the trailing comma as well (if one exists, which it won't if last in the list):
str = str.replaceAll("\\b382,?", "");
Note \b word boundary to prevent matching "-,1382,-".
The above will convert:
382,111,382,1382,222,382
to:
111,1382,222
I think the issue is your first argument to replace(), in particular the comma (,) before and after 382. If you have "382,382,383", you will only match the inner ",382," and leave the initial one behind. Try:
str.replace("382,", "");
Although this will fail to match "382" at the very end as it does not have a comma after it.
A full solution might entail two method calls thus:
str = str.replace("382", ""); // Remove all instances of 382
str.replaceAll(",,+", ","); // Compress all duplicates, triplicates, etc. of commas
This combines the two approaches:
str.replaceAll("382,?", ""); // Remove 382 and an optional comma after it.
Note: both of the last two approaches leave a trailing comma if 382 is at the end.
try this
str = str.replaceAll(",382,", ",");
Firstly, remove the preceding comma in your matching string. Then, remove duplicated commas by replacing commas with a single comma using java regular expression.
String input = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399";
String result = input.replace("382,", ","); // remove the preceding comma
String result2 = result.replaceAll("[,]+", ","); // replace duplicate commas
System.out.println(result2);
As dave already said, the problem is that your pattern overlaps. In the string "...,382,382,..." there are two occurrences of ",382,":
"...,382,382,..."
----- first occurrence
----- second occurrence
These two occurrences overlap at the comma, and thus Java can only replace one of them. When finding occurrences, it does not see yet what you replace the pattern with, and thus it doesn't see that new occurrence of ",382," is generated when replacing the first occurrence is replaced by the comma.
If your data is known not to contain numbers with more than 3 digits, then you might do:
str.replace("382,", "");
and then handle occurrences at the end as a special case. But if your data can contain big numbers, then "...,1382,..." will be replaced by "...,1,..." which probably is not what you want.
Here are two solutions that do not have the above problem:
First, simply repeat the replacement until no changes occur anymore:
String oldString = str;
str = str.replace(",382,", ",");
while (!str.equals(oldString)) {
oldString = str;
str = str.replace(",382,", ",");
}
After that, you will have to handle possible occurrences at the end of the string.
Second, if you have Java 8, you can do a little more work yourself and use Java streams:
str = Arrays.stream(str.split(","))
.filter(s -> !s.equals("382"))
.collect(Collectors.joining(","));
This first splits the string at ",", then filters out all strings which are equal to "382", and then concatenates the remaining strings again with "," in between.
(Both code snippets are untested.)
Traditional way:
String str = ",abc,null,null,0,0,7,8,9,10,11,12,13,14";
String newStr = "", word = "";
for (int i=0; i<str.length(); i++) {
if (str.charAt(i) == ',') {
if (word.equals("null") || word.equals("0"))
word = "";
newStr += word+",";
word = "";
} else {
word += str.charAt(i);
if (i == str.length()-1)
newStr += word;
}
}
System.out.println(newStr);
Output:
,abc,,,,,7,8,9,10,11,12,13,14

String.split(String pattern) Java method is not working as intended

I'm using String.split() to divide some Strings as IPs but its returning an empty array, so I fixed my problem using String.substring(), but I'm wondering why is not working as intended, my code is:
// filtrarIPs("196.168.0.1 127.0.0.1 255.23.44.1 100.168.100.1 90.168.0.1","168");
public static String filtrarIPs(String ips, String filtro) {
String resultado = "";
String[] lista = ips.split(" ");
for (int c = 0; c < lista.length; c++) {
String[] ipCorta = lista[c].split("."); // Returns an empty array
if (ipCorta[1].compareTo(filtro) == 0) {
resultado += lista[c] + " ";
}
}
return resultado.trim();
}
It should return an String[] as {"196"."168"."0"."1"}....
split works with regular expressions. '.' in regular expression notation is a single character. To use split to split on an actual dot you must escape it like this: split("\\.").
Use
String[] ipCorta = lista[c].split("\\.");
in regular expressions the . matches almost any character.
If you want to match the dot you have to escape it \\..
Your statement
lista[c].split(".")
will split the first String "196.168.0.1" by any (.) character, because String.split takes a regular expression as argument.
However, the point, why you are getting an empty array is, that split will also remove all trailing empty Strings in the result.
For example, consider the following statement:
String[] tiles = "aaa".split("a");
This will split the String into three empty values like [ , , ]. Because of the fact, that the trailing empty values will be removed, the array will remain empty [].
If you have the following statement:
String[] tiles = "aaab".split("a");
it will split the String into three empty values and one filled value b like [ , , , "b"]
Since there are no trailing empty values, the result remains with these four values.
To get rid of the fact, that you don't want to split on every character, you have to escape the regular expression like this:
lista[c].split("\\.")
String.split() takes a regular expression as parameter, so you have to escape the period (which matches on anything). So use split("\\.") instead.
THis may help you:
public static void main(String[] args){
String ips = "196.168.0.1 127.0.0.1 255.23.44.1 100.168.100.1 90.168.0.1";
String[] lista = ips.split(" ");
for(String s: lista){
for(String s2: s.split("\\."))
System.out.println(s2);
}
}

Categories