Split Strings separated by an artbitrary character - java

Say we would like to write a method to receive entire book in a string and an arbitrary single-character delimiter to separate strings and return an array of strings. I came up with the following implementation (Java).(suppose no consecutive delimiter etc)
ArrayList<String> separater(String book, char delimiter){
ArrayList<String> ret = new ArrayList<>();
String word ="";
for (int i=0; i<book.length(), ++i){
if (book.charAt(i)!= delimiter){
word += book.charAt(i);
} else {
ret.add(word);
word = "";
}
}
return ret;
}
Question: I wonder if there is any way to leverage String.split() for shorter solutions? Its because I could not find a general way of defining a general regex for an arbitrary character delimiter.
String.split("\\.") if the delimiter is '.'
String.split("\\s+"); if the delimiter is ' ' // space character
That measn I cold not find a general way of generating the input regex of method split() from the input character delimiter. Any suggestions?

String[] array = string.split(Pattern.quote(String.valueOf(delimiter)));
That said, The Guava Splitter is much more versatile and well-behaving than String.split().
And a note on your method: concatenating to a String in a loop is very inefficient. As Strings are immutable, it produces a lot of temporary Strings and StringBuilders. You should use a StringBuilder instead.

Related

Efficiently split large strings in Java

I have a large string that should be split at a certain character, if it is not preceded by another certain character.
Would is the most efficient way to do this?
An example: Split this string at ':', but not at "?:":
part1:part2:https?:example.com:anotherstring
What I have tried so far:
Regex (?<!\?):. Very slow.
First getting the indices where to split the string and then split it. Only efficient if there are not many split characters in the string.
Iterating over the string character by character. Efficient if there are not many protect characters (e.g. '?').
I fear you would have to go through the string and check if a ":" is preceded by a "?"
int lastIndex=0;
for(int index=string.indexOf(":"); index >= 0; index=string.indexOf(":", lastIndex)){
if(index == 0 || string.charAt(index-1) != '?'){
String splitString = string.subString(lastIndex, index);
// add splitString to list or array
lastIndex = index+1;
}
}
// add string.subString(lastIndex) to list or array
You will have to test this very carefully (since I didn't do that), but using a regular expression in the split() might produce the results you want:
public static void main(String[] args) {
String s = "Start.Teststring.Teststring1?.Teststring2.?Teststring3.?.End";
String[] result = s.split("(?<!\\?)\\.(?!\\.)");
System.out.println(String.join("|", result));
}
Output:
Start|Teststring|Teststring1?.Teststring2|?Teststring3|?.End
Note:
This only considers your example about splitting by dot if the dot is not preceded by an interrogation mark.
I don't think you will get a much more performant solution than the regex...

String.replace() not replacing all occurrences

I have a very long string which looks similar to this.
355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,....
When I tried using the following code to remove the number 382 from the string.
String str = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399,...."
str = str.replace(",382,", ",");
But it seems that not all occurrences are being replaced. The string which originally had above 3000 occurrences still was left with about 630 occurrences after replacing.
Is the capability of String.replace() limited? If so, is there a possible way of achieving what I need?
You need to replace the trailing comma as well (if one exists, which it won't if last in the list):
str = str.replaceAll("\\b382,?", "");
Note \b word boundary to prevent matching "-,1382,-".
The above will convert:
382,111,382,1382,222,382
to:
111,1382,222
I think the issue is your first argument to replace(), in particular the comma (,) before and after 382. If you have "382,382,383", you will only match the inner ",382," and leave the initial one behind. Try:
str.replace("382,", "");
Although this will fail to match "382" at the very end as it does not have a comma after it.
A full solution might entail two method calls thus:
str = str.replace("382", ""); // Remove all instances of 382
str.replaceAll(",,+", ","); // Compress all duplicates, triplicates, etc. of commas
This combines the two approaches:
str.replaceAll("382,?", ""); // Remove 382 and an optional comma after it.
Note: both of the last two approaches leave a trailing comma if 382 is at the end.
try this
str = str.replaceAll(",382,", ",");
Firstly, remove the preceding comma in your matching string. Then, remove duplicated commas by replacing commas with a single comma using java regular expression.
String input = "355,356,357,358,359,360,361,382,363,364,365,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,360,361,363,366,368,369,313,370,371,372,373,374,375,376,377,378,379,380,381,382,382,382,382,382,382,383,384,385,380,381,382,382,382,382,382,386,387,388,389,380,381,382,382,382,382,382,382,390,391,380,381,382,382,382,382,382,392,393,394,395,396,397,398,399";
String result = input.replace("382,", ","); // remove the preceding comma
String result2 = result.replaceAll("[,]+", ","); // replace duplicate commas
System.out.println(result2);
As dave already said, the problem is that your pattern overlaps. In the string "...,382,382,..." there are two occurrences of ",382,":
"...,382,382,..."
----- first occurrence
----- second occurrence
These two occurrences overlap at the comma, and thus Java can only replace one of them. When finding occurrences, it does not see yet what you replace the pattern with, and thus it doesn't see that new occurrence of ",382," is generated when replacing the first occurrence is replaced by the comma.
If your data is known not to contain numbers with more than 3 digits, then you might do:
str.replace("382,", "");
and then handle occurrences at the end as a special case. But if your data can contain big numbers, then "...,1382,..." will be replaced by "...,1,..." which probably is not what you want.
Here are two solutions that do not have the above problem:
First, simply repeat the replacement until no changes occur anymore:
String oldString = str;
str = str.replace(",382,", ",");
while (!str.equals(oldString)) {
oldString = str;
str = str.replace(",382,", ",");
}
After that, you will have to handle possible occurrences at the end of the string.
Second, if you have Java 8, you can do a little more work yourself and use Java streams:
str = Arrays.stream(str.split(","))
.filter(s -> !s.equals("382"))
.collect(Collectors.joining(","));
This first splits the string at ",", then filters out all strings which are equal to "382", and then concatenates the remaining strings again with "," in between.
(Both code snippets are untested.)
Traditional way:
String str = ",abc,null,null,0,0,7,8,9,10,11,12,13,14";
String newStr = "", word = "";
for (int i=0; i<str.length(); i++) {
if (str.charAt(i) == ',') {
if (word.equals("null") || word.equals("0"))
word = "";
newStr += word+",";
word = "";
} else {
word += str.charAt(i);
if (i == str.length()-1)
newStr += word;
}
}
System.out.println(newStr);
Output:
,abc,,,,,7,8,9,10,11,12,13,14

Replacing Only Certain White Spaces In a String

I have string queryInputNameString that is equal to fir, spotted owl and I'm trying to use replaceAll() to remove the white spaces and split() to separate the elements in the inputNameArray array when a comma occurs.
String noSpaces = queryInputNameString.replaceAll("\\s+","");
String[] inputNameArray = noSpaces.split("\\,");
So far the above returns:
fir
spottedowl
but I would like it to only remove the white spaces that occurs immediately before or after a comma and return this:
fir
spotted owl
How can I make my code ignore white spaces that are not preceded/followed by a comma?
Thanks.
Since split() accepts a regex as argument, you can directly do this:
String[] inputNameArray = queryInputNameString.split("\\s*\\,\\s*");
Otherwise, if you really want to replace only spaces after a comma, you can use:
String noSpaces = queryInputNameString.replaceAll(",\\s+",",");
You actually do not have to use more sophisticated regex. If you just split by comma first and then trim each array element you will get the desired result.
This approach might prove to be less effective when dealing with a lot of data.
String[] inputArray = queryInputNameString.split(",");
for (int i=0; i < inputArray.length, ++i) {
inputArray[i] = inputArray[i].trim();
}

the best way for character replacement in String in java

I want to check a string for each character I replace it with other characters or keep it in the string. and also because it's a long string the time to do this task is so important. what is the best way of these, or any better idea?
for all of them I append the result to an StringBuilder.
check all of the characters with a for and charAt commands.
use switch like the previous way.
use replaceAll twice.
and if one of the first to methods is better is there any way to check a character with a group of characters, like :
if (st.charAt(i)=='a'..'z') ....
Edit:
please tell the less consuming in time way and tell the reason.I know all of these ways you said!
If you want to replace a single character (or a single sequence), use replace(), as other answers have suggested.
If you want to replace several characters (e.g., 'a', 'b', and 'c') with a single substitute character or character sequence (e.g., "X"), you should use a regular expression replace:
String result = original.replaceAll("[abc]", "X");
If you want to replace several characters, each with a different replacement (e.g., 'a' with 'A', 'b' with 'B'), then looping through the string yourself and building the result in a StringBuilder will probably be the most efficient. This is because, as you point out in your question, you will be going through the string only once.
String sb = new StringBuilder();
String targets = "abc";
String replacements = "ABC";
for (int i = 0; i < result.length; ++i) {
char c = original.charAt(i);
int loc = targets.indexOf(c);
sb.append(loc >= 0 ? replacements.charAt(loc) : c);
}
String result = sb.toString();
Check the documentation and find some good methods:
char from = 'a';
char to = 'b';
str = str.replace(from, to);
String replaceSample = "This String replace Example shows
how to replace one char from String";
String newString = replaceSample.replace('r', 't');
Output: This Stting teplace Example shows how to teplace one chat ftom Stting
Also, you could use contains:
str1.toLowerCase().contains(str2.toLowerCase())
To check if the substring str2 exists in str1
Edit.
Just read that the String come from a file. You can use Regex for this. That would be the best method.
http://docs.oracle.com/javase/tutorial/essential/regex/literals.html
This is your comment:
I want to replace all of the uppercases to lower cases and replace all
of the characters except a-z with space.
You can do it like this:
str = str.toLowerCase().replaceAll("[^a-z]", " ");
Your requirement should be part of the question, not in comment #7 under a posted answer...
You should look into regex for Java. You can match an entire set of characters. Strings have several functions: replace, replaceAll, and match, which you may find useful here.
You can match the set of alphanumeric, for instance, using [a-zA-Z], which may be what you're looking for.

how could I split the string into vallid substrings regardless of uncertain length blank inside

For example: after execution, the output of the String "hello world yo" and "hello   world  yo" should be strictly the same.
what's more, the output should be a String[] in which:
String[0] == "hello"; String[1] == "world"; String[2] == "yo";
so that other method can deal with the effective words latter.
I was thinking about String.split(" "), but the blanks between the words are uncertain, and will then cause an exception..
You can use
String.split("\\s+") // one or more whitespace.
Dont use == for string comaprision instead use String.equals()
Edit for question in comment
what's the notation called? what if there is one or more "_" or "\n" ?
As you can see String#split() API accepts regex as parameter. The \s is shorthand character class for whitespace, whereas + is used to repeats the previous item once or more.
Now if you want to split String on
_ ie. underscore --> "this__is_test".split("[_]+");
\n ie. newline --> "this__is_test\n new line".split("\\r?\\n");
Regex Tutorial
You can split on "\\s+". That splits on one or more whitespace characters.
String.split() takes a regexp, so you can simply do String.split(" +").
I think the split function takes regex, but if it doesn't then the below works.
The regex in this might not be right, but it demonstrates the concept of what you're trying to do.
Pattern p = Pattern.compile("(.*?) *(.*)");
Matcher m = p.matcher(s);
if (m.matches()) {
String name = m.group(1);
String value = m.groupo(2);
}
for (int i = 1; i<=m.groupCount(); i++) {
System.out.println(m.group(i));
}
you can use Regular Expression to split the String.
String.split(Regular Expression);
for multiple whitespace, you can use Regular Expression: " \\s+ ", which 's' stand for space.
"==" operator used to judge whether left and right is equal. for String, they are Object actually, which means that they are regard as reference(like the pointer in C).
So if you want compare the content of two Strings, you can use method equals(String) of String.
e.g. str1.equals(str2)

Categories