Java equivalent of python's String partition - java

Java's string split(regex) function splits at all instances of the regex. Python's partition function only splits at the first instance of the given separator, and returns a tuple of {left,separator,right}.
How do I achieve what partition does in Java?
e.g.
"foo bar hello world".partition(" ")
should become
"foo", " ", "bar hello world"
Is there an external library which
provides this utility already?
how would I achieve it without
an external library?
And can it be achieved without an external library and without Regex?
NB. I'm not looking for split(" ",2) as it doesn't return the separator character.

The String.split(String regex, int limit) is close to what you want. From the documentation:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array.
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
If n is non-positive then the pattern will be applied as many times as possible and the array can have any length.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Here's an example to show these differences (as seen on ideone.com):
static void dump(String[] ss) {
for (String s: ss) {
System.out.print("[" + s + "]");
}
System.out.println();
}
public static void main(String[] args) {
String text = "a-b-c-d---";
dump(text.split("-"));
// prints "[a][b][c][d]"
dump(text.split("-", 2));
// prints "[a][b-c-d---]"
dump(text.split("-", -1));
// [a][b][c][d][][][]
}
A partition that keeps the delimiter
If you need a similar functionality to the partition, and you also want to get the delimiter string that was matched by an arbitrary pattern, you can use Matcher, then taking substring at appropriate indices.
Here's an example (as seen on ideone.com):
static String[] partition(String s, String regex) {
Matcher m = Pattern.compile(regex).matcher(s);
if (m.find()) {
return new String[] {
s.substring(0, m.start()),
m.group(),
s.substring(m.end()),
};
} else {
throw new NoSuchElementException("Can't partition!");
}
}
public static void main(String[] args) {
dump(partition("james007bond111", "\\d+"));
// prints "[james][007][bond111]"
}
The regex \d+ of course is any digit character (\d) repeated one-or-more times (+).

While not exactly what you want, there's a second version of split which takes a "limit" parameter, telling it the maximum number of partitions to split the string into.
So if you called (in Java):
"foo bar hello world".split(" ", 2);
You'd get the array:
["foo", "bar hello world"]
which is more or less what you want, except for the fact that the separator character isn't embedded at index 1. If you really need this last point, you'd need to do it yourself, but hopefully all you specifically wanted was the ability to limit the number of splits.

How about this:
String partition(String string, String separator) {
String[] parts = string.split(separator, 2);
return new String[] {parts[0], separator, parts[1]};
}
BTW, you have to add some input/result checks at this :)

Use:
"foo bar hello world".split(" ",2)
By default the delimiter is whitespace

Is there an external library which provides this utility already?
None that I know of.
how would I achieve it without an external library?
And can it be achieved without an external library and without Regex?
Sure, that's no problem at all; just use String.indexOf() and String.substring(). However, Java does not have tuple datatype, so you'll have to return an array, List or write your own result class.

Related

How can I split a string without knowing the split characters a-priori?

For my project I have to read various input graphs. Unfortunately, the input edges have not the same format. Some of them are comma-separated, others are tab-separated, etc. For example:
File 1:
123,45
67,89
...
File 2
123 45
67 89
...
Rather than handling each case separately, I would like to automatically detect the split characters. Currently I have developed the following solution:
String str = "123,45";
String splitChars = "";
for(int i=0; i < str.length(); i++) {
if(!Character.isDigit(str.charAt(i))) {
splitChars += str.charAt(i);
}
}
String[] endpoints = str.split(splitChars);
Basically I pick the first row and select all the non-numeric characters, then I use the generated substring as split characters. Is there a cleaner way to perform this?
Split requires a regexp, so your code would fail for many reasons: If the separator has meaning in regexp (say, +), it'll fail. If there is more than 1 non-digit character, your code will also fail. If you code contains more than exactly 2 numbers, it will also fail. Imagine it contains hello, world - then your splitChars string becomes " , " - and your split would do nothing (that would split the string "test , abc" into two, nothing else).
Why not make a regexp to fetch digits, and then find all sequences of digits, instead of focussing on the separators?
You're using regexps whether you want to or not, so let's make it official and use Pattern, while we are at it.
private static final Pattern ALL_DIGITS = Pattern.compile("\\d+");
// then in your split method..
Matcher m = ALL_DIGITS.matcher(str);
List<Integer> numbers = new ArrayList<Integer>();
// dont use arrays, generally. List is better.
while (m.find()) {
numbers.add(Integer.parseInt(m.group(0)));
}
//d+ is: Any number of digits.
m.find() finds the next match (so, the next block of digits), returning false if there aren't any more.
m.group(0) retrieves the entire matched string.
Split the string on \\D+ which means one or more non-digit characters.
Demo:
import java.util.Arrays;
public class Main {
public static void main(String[] args) {
// Test strings
String[] arr = { "123,45", "67,89", "125 89", "678 129" };
for (String s : arr) {
System.out.println(Arrays.toString(s.split("\\D+")));
}
}
}
Output:
[123, 45]
[67, 89]
[125, 89]
[678, 129]
Why not split with [^\d]+ (every group of nondigfit) :
for (String n : "123,456 789".split("[^\\d]+")) {
System.out.println(n);
}
Result:
123
456
789

How to add a space after certain characters using regex Java

I have a string consisting of 18 digits Eg. 'abcdefghijklmnopqr'. I need to add a blank space after 5th character and then after 9th character and after 15th character making it look like 'abcde fghi jklmno pqr'. Can I achieve this using regular expression?
As regular expressions are not my cup of tea hence need help from regex gurus out here. Any help is appreciated.
Thanks in advance
Regex finds a match in a string and can't preform a replacement. You could however use regex to find a certain matching substring and replace that, but you would still need a separate method for replacement (making it a two step algorithm).
Since you're not looking for a pattern in your string, but rather just the n-th char, regex wouldn't be of much use, it would make it unnecessary complex.
Here are some ideas on how you could implement a solution:
Use an array of characters to avoid creating redundant strings: create a character array and copy characters from the string before
the given position, put the character at the position, copy the rest
of the characters from the String,... continue until you reach the end
of the string. After that construct the final string from that
array.
Use Substring() method: concatenate substring of the string before
the position, new character, substring of the string after the
position and before the next position,... and so on, until reaching the end of the original string.
Use a StringBuilder and its insert() method.
Note that:
First idea listed might not be a suitable solution for very large strings. It needs an auxiliary array, using additional space.
Second idea creates redundant strings. Strings are immutable and final in Java, and are stored in a pool. Creating
temporary strings should be avoided.
Yes you can use regex groups to achieve that. Something like that:
final Pattern pattern = Pattern.compile("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})");
final Matcher matcher = pattern.matcher("abcdefghijklmnopqr");
if (matcher.matches()) {
String first = matcher.group(0);
String second = matcher.group(1);
String third = matcher.group(2);
String fourth = matcher.group(3);
return first + " " + second + " " + third + " " + fourth;
} else {
throw new SomeException();
}
Note that pattern should be a constant, I used a local variable here to make it easier to read.
Compared to substrings, which would also work to achieve the desired result, regex also allow you to validate the format of your input data. In the provided example you check that it's a 18 characters long string composed of only lowercase letters.
If you had a more interesting examples, with for example a mix of letters and digits, you could check that each group contains the correct type of data with the regex.
You can also do a simpler version where you just replace with:
"abcdefghijklmnopqr".replaceAll("([a-z]{5})([a-z]{4})([a-z]{6})([a-z]{3})", "$1 $2 $3 $4")
But you don't have the benefit of checking because if the string doesn't match the format it will just not replaced and this is less efficient than substrings.
Here is an example solution using substrings which would be more efficient if you don't care about checking:
final Set<Integer> breaks = Set.of(5, 9, 15);
final String str = "abcdefghijklmnopqr";
final StringBuilder stringBuilder = new StringBuilder();
for (int i = 0; i < str.length(); i++) {
if (breaks.contains(i)) {
stringBuilder.append(' ');
}
stringBuilder.append(str.charAt(i));
}
return stringBuilder.toString();

Java Regex Metacharacters returning extra space while spliting

I want to split string using regex instead of StringTokenizer. I am using String.split(regex);
Regex contains meta characters and when i am using \[ it is returning extra space in returning array.
import java.util.Scanner;
public class Solution{
public static void main(String[] args) {
Scanner i= new Scanner(System.in);
String s= i.nextLine();
String[] st=s.split("[!\\[,?\\._'#\\+\\]\\s\\\\]+");
System.out.println(st.length);
for(String z:st)
System.out.println(z);
}
}
When i enter input [a\m]
It returns array length as 3 and
a m
Space is also there before a.
Can anyone please explain why this is happening and how can i correct it. I don't want extra space in resulting array.
Since the [ is at the beginning of the string, when split removes [, there appear two elements after the first split step: the empty string that is at the beginning of the string, and the rest of the string. String#split does not return trailing empty elements only (as it is executed with limit=0 by default).
Remove the characters you split against from the start (using a .replaceAll("^[!\\[,?._'#+\\]\\s\\\\]+", note the ^ at the beginning of the pattern). Here is a sample code you can leverage:
String[] st="[a\\m]".replaceAll("^[!\\[,?._'#+\\]\\s\\\\]+", "")
.split("[!\\[,?._'#+\\]\\s\\\\]+");
System.out.println(st.length);
for(String z:st) {
System.out.println(z);
}
See demo
As an addition to Wiktor Stribiżew’s answer, you may do the same without having to specify the pattern twice, by dealing with the java.util.regex package directly. Removing this redundancy may avoid potential errors and may also be more efficient as the pattern doesn’t need to be parsed twice:
Pattern p = Pattern.compile("[!\\[,?\\._'#\\+\\]\\s\\\\]+");
Matcher m = p.matcher(s);
if(m.lookingAt()) s=m.replaceFirst("");
String[] st = p.split(s);
for(String z:st)
System.out.println(z);
To be able to use the same pattern, i.e. without having to use the anchor ^ for removing a leading separator, we first check via lookingAt() whether the pattern really matches at the beginning of the text before removing the first occurrence. Then, we proceed with the split operation, but reusing the already prepared Pattern.
Regarding your issue mentioned in a comment, the split operation will always return at least one element, the input string, when there is no match, even when the string is empty. If you wish to have an empty array then, the only solution is to replace the result explicitly:
if(st.length==1 && s.equals[0]) st=new String[0];
or, if you only want to treat an empty string specially, you may check this beforehand:
if(s.isEmpty()) st=new String[0];
else {
// the code as shown above
}

Java split() a String made out of the String you are splitting with?

When I compile and run this code:
class StringTest {
public static void main(String[] args) {
System.out.println("Begin Test");
String letters = "AAAAAAA"
String[] broken = letters.split("A");
for(int i = 0; i < broken.length; i++)
System.out.println("Item " + i + ": " + broken[i]);
System.out.println("End Test");
}
}
The output to the console is:
Begin Test
End Test
Can anyone explain why split() works like this? I saw some other questions sort of like this on here, but didn't fully understand why there is no output when splitting a string made entirely out of the character that you are using for regex. Why does java handle Strings this way?
String.split discards trailing empty strings. For example, "foo,bar,,".split(",") gets split into {"foo", "bar"}. What you're seeing is a string that consists entirely of the separator, so all the empty splits are "trailing" and get discarded.
You could probably get all those empty strings if you used letters.split("A", -1). Alternately, Guava's Splitter doesn't do things like that unless you ask for it: Splitter.on('A').split(letters).
It is because "A" is used as delimiter in split method and since you don't have any other text in your string other than delimiter "A" therefore after split you are left with nothing (empty string is not returned in the resulting array).
Since every character in your input is a delimiter, every string found is blank. By default, every trailing blank found is ignored, hence what you're seeing.
However, split() comes in two flavours. There is a second version of the split() method that accepts another int parameter limit, which controls the number of times the match is to be applied, but also the behaviour of ignoring trailing blanks.
If the limit parameter is negative, trailing blanks are preserved.
If you executed this code:
String letters = "AAAAAAA";
String[] broken = letters.split("A", -1); // note the -1
System.out.println(Arrays.toString(broken));
You get this output:
{"", "", "", "", "", "", ""}
See the javadoc for more, including examples of how various limit values affect behaviour.

Need to split a string into two parts in java

I have a string which contains a contiguous chunk of digits and then a contiguous chunk of characters. I need to split them into two parts (one integer part, and one string).
I tried using String.split("\\D", 1), but it is eating up first character.
I checked all the String API and didn't find a suitable method.
Is there any method for doing this thing?
Use lookarounds: str.split("(?<=\\d)(?=\\D)")
String[] parts = "123XYZ".split("(?<=\\d)(?=\\D)");
System.out.println(parts[0] + "-" + parts[1]);
// prints "123-XYZ"
\d is the character class for digits; \D is its negation. So this zero-matching assertion matches the position where the preceding character is a digit (?<=\d), and the following character is a non-digit (?=\D).
References
regular-expressions.info/Lookarounds and Character Class
Related questions
Java split is eating my characters.
Is there a way to split strings with String.split() and include the delimiters?
Alternate solution using limited split
The following also works:
String[] parts = "123XYZ".split("(?=\\D)", 2);
System.out.println(parts[0] + "-" + parts[1]);
This splits just before we see a non-digit. This is much closer to your original solution, except that since it doesn't actually match the non-digit character, it doesn't "eat it up". Also, it uses limit of 2, which is really what you want here.
API links
String.split(String regex, int limit)
If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter.
There's always an old-fashioned way:
private String[] split(String in) {
int indexOfFirstChar = 0;
for (char c : in.toCharArray()) {
if (Character.isDigit(c)) {
indexOfFirstChar++;
} else {
break;
}
}
return new String[]{in.substring(0,indexOfFirstChar), in.substring(indexOfFirstChar)};
}
(hope it works with digit-only or char-only Strings too - can't test it here - if not, take it as a general idea)

Categories