Split by first found String in Java

Split by first found String in Java - java

is ist possible to tell String.split("(") function that it has to split only by the first found string "("?
Example:
String test = "A*B(A+B)+A*(A+B)";
test.split("(") should result to ["A*B" ,"A+B)+A*(A+B)"]
test.split(")") should result to ["A*B(A+B" ,"+A*(A+B)"]

Yes, absolutely:
test.split("\\(", 2);
As the documentation for String.split(String,int) explains:
The limit parameter controls the number of times the
pattern is applied and therefore affects the length of the resulting
array. If the limit n is greater than zero then the pattern
will be applied at most n - 1 times, the array's
length will be no greater than n, and the array's last entry
will contain all input beyond the last matched delimiter.

test.split("\\(",2);
See javadoc for more info
EDIT: Escaped bracket, as per #Pedro's comment below.

Try with this solution, it's generic, faster and simpler than using a regular expression:
public static String[] splitOnFirst(String str, char c) {
int idx = str.indexOf(c);
String head = str.substring(0, idx);
String tail = str.substring(idx + 1);
return new String[] { head, tail} ;
}
Test it like this:
String test = "A*B(A+B)+A*(A+B)";
System.out.println(Arrays.toString(splitOnFirst(test, '(')));
System.out.println(Arrays.toString(splitOnFirst(test, ')')));

Related

finding the middle index of a substring when there are duplicates in the string

I was working on a Java coding problem and encountered the following issue.
Problem:
Given a string, does "xyz" appear in the middle of the string? To define middle, we'll say that the number of chars to the left and right of the "xyz" must differ by at most one
xyzMiddle("AAxyzBB") → true
xyzMiddle("AxyzBBB") → false
My Code:
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)result=false;
if(str.length()==3 && str.equals("xyz"))result=true;
for(int j=0;j<str.length()-3;j++){
if(str.substring(j,j+3).equals("xyz")){
String rightSide=str.substring(j+3,str.length());
int rightLength=rightSide.length();
String leftSide=str.substring(0,j);
int leftLength=leftSide.length();
int diff=Math.abs(rightLength-leftLength);
if(diff>=0 && diff<=1)result=true;
else result=false;
}
}
return result;
}
Output I am getting:
Running for most of the test cases but failing for certain edge cases involving more than once occurence of "xyz" in the string
Example:
xyzMiddle("xyzxyzAxyzBxyzxyz")
My present method is taking the "xyz" starting at the index 0. I understood the problem. I want a solution where the condition is using only string manipulation functions.
NOTE: I need to solve this using string manipulations like substrings. I am not considering using list, stringbuffer/builder etc. Would appreciate answers which can build up on my code.

There is no need to loop at all, because you only want to check if xyz is in the middle.
The string is of the form
prefix + "xyz" + suffix
The content of the prefix and suffix is irrelevant; the only thing that matters is they differ in length by at most 1.
Depending on the length of the string (and assuming it is at least 3):
Prefix and suffix must have the same length if the (string's length - the length of xyz) is even. In this case:
int prefixLen = (str.length()-3)/2;
result = str.substring(prefixLen, prefixLen+3).equals("xyz");
Otherwise, prefix and suffix differ in length by 1. In this case:
int minPrefixLen = (str.length()-3)/2;
int maxPrefixLen = minPrefixLen+1;
result = str.substring(minPrefixLen, minPrefixLen+3).equals("xyz") || str.substring(maxPrefixLen, maxPrefixLen+3).equals("xyz");
In fact, you don't even need the substring here. You can do it with str.regionMatches instead, and avoid creating the substrings, e.g. for the first case:
result = str.regionMatches(prefixLen, "xyz", 0, 3);

Super easy solution:
Use Apache StringUtils to split the string.
Specifically, splitByWholeSeparatorPreserveAllTokens.
Think about the problem.
Specifically, if the token is in the middle of the string then there must be an even number of tokens returned by the split call (see step 1 above).
Zero counts as an even number here.
If the number of tokens is even, add the lengths of the first group (first half of the tokens) and compare it to the lengths of the second group.
Pay attention to details,
an empty token indicates an occurrence of the token itself.
You can count this as zero length, count as the length of the token, or count it as literally any number as long as you always count it as the same number.
if (lengthFirstHalf == lengthSecondHalf) token is in middle.

Managing your code, I left unchanged the cases str.lengt<3 and str.lengt==3.
Taking inspiration from #Andy's answer, I considered the pattern
prefix+'xyz'+suffix
and, while looking for matches I controlled also if they respect the rule IsMiddle, as you defined it. If a match that respect the rule is found, the loop breaks and return a success, else the loop continue.
public boolean xyzMiddle(String str) {
boolean result=false;
if(str.length()<3)
result=false;
else if(str.length()==3 && str.equals("xyz"))
result=true;
else{
int preLen=-1;
int sufLen=-2;
int k=0;
while(k<str.lenght){
if(str.indexOf('xyz',k)!=-1){
count++;
k=str.indexOf('xyz',k);
//check if match is in the middle
preLen=str.substring(0,k).lenght;
sufLen=str.substring(k+3,str.lenght-1).lenght;
if(preLen==sufLen || preLen==sufLen-1 || preLen==sufLen+1){
result=true;
k=str.length; //breaks the while loop
}
else
result=false;
}
else
k++;
}
}
return result;
}

Strange behavior of Java String split() method

I have a method which takes a string parameter and split the string by # and after splitting it prints the length of the array along with array elements. Below is my code
public void StringSplitTesting(String inputString) {
String tokenArray[] = inputString.split("#");
System.out.println("tokenArray length is " + tokenArray.length
+ " and array elements are " + Arrays.toString(tokenArray));
}
Case I : Now when my input is abc# the output is tokenArray length is 1 and array elements are [abc]
Case II : But when my input is #abc the output is tokenArray length is 2 and array elements are [, abc]
But I was expecting the same output for both the cases. What is the reason behind this implementation? Why split() method is behaving like this? Could someone give me proper explanation on this?

One aspect of the behavior of the one-argument split method can be surprising -- trailing nulls are discarded from the returned array.
Trailing empty strings are therefore not included in the resulting array.
To get a length of 2 for each case, you can pass in a negative second argument to the two-argument split method, which means that the length is unrestricted and no trailing empty strings are discarded.

Just take a look in the documentation:
Trailing empty strings are therefore not included in the resulting
array.
So in case 1, the output would be {"abc", ""} but Java cuts the trailing empty String.
If you don't want the trailing empty String to be discarded, you have to use split("#", -1).

The observed behavior is due to the inherently asymmetric nature of the substring() method in Java:
This is the core of the implementation of split():
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
The key to understanding the behavior of the above code is to understand the behavior of the substring() method:
From the Javadocs:
String java.lang.String.substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring
begins at the specified beginIndex and extends to the character at index
endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" (not "urger")
"smiles".substring(1, 5) returns "mile" (not "miles")
Hope this helps.

Exporting specific pattern of string using split method in a most efficient way

I want to export pattern of bit stream in a String varilable. Assume our bit stream is something like bitStream="111000001010000100001111". I am looking for a Java code to save this bit stream in a specific array (assume bitArray) in a way that all continous "0"s or "1"s be saved in one array element. In this example output would be somethins like this:
bitArray[0]="111"
bitArray[1]="00000"
bitArray[2]="1"
bitArray[3]="0"
bitArray[4]="1"
bitArray[5]="0000"
bitArray[6]="1"
bitArray[7]="0000"
bitArray[8]="1111"
I want to using bitArray to calculate the number of bit which is stored in each continous stream. For example in this case the final output would be, "3,5,1,1,1,4,1,4,4". I figure it out that probably "split" method would solve this for me. But I dont know what splitting pattern would do that for me, if i Using bitStream.split("1+") it would split on contious "1" pattern, if i using bitStream.split("0+") it will do that base on continous"0" but how it could be based on both?
Mathew suggested this solution and it works:
var wholeString = "111000001010000100001111";
wholeString = wholeString.replace('10', '1,0');
wholeString = wholeString.replace('01', '0,1');
stringSplit = wholeString.split(',');
My question is "Is this solution the most efficient one?"

Try replacing any occurrence of "01" and "10" with "0,1" and "1,0" respectively. Then once you've injected the commas, split the string using the comma as the delimiting character.
String wholeString = "111000001010000100001111"
wholeString = wholeString.replace("10", "1,0");
wholeString = wholeString.replace("01", "0,1");
String stringSplit[] = wholeString.split(",");

You can do this with a simple regular expression. It matches 1s and 0s and will return each in the order they occur in the stream. How you store or manipulate the results is up to you. Here is some example code.
String testString = "111000001010000100001111";
Pattern pattern = Pattern.compile("1+|0+");
Matcher matcher = pattern.matcher(testString);
while (matcher.find())
{
System.out.print(matcher.group().length());
System.out.print(" ");
}
This will result in the following output:
3 5 1 1 1 4 1 4 4
One option for storing the results is to put them in an ArrayList<Integer>
Since the OP wanted most efficient, I did some tests to see how long each answer takes to iterate over a large stream 10000 times and came up with the following results. In each test the times were different but the order of fastest to slowest remained the same. I know tick performance testing has it's issues like not accounting for system load but I just wanted a quick test.
My answer completed in 1145 ms
Alessio's answer completed in 1202 ms
Matthew Lee Keith's answer completed in 2002 ms
Evgeniy Dorofeev's answer completed in 2556 ms
Hope this helps

I won't give you a code, but I'll guide you to a possible solution:
Construct an ArrayList<Integer>, iterate on the array of bits, as long as you have 1's, increment a counter and as soon as you have 0, add the counter to the ArrayList. After this procedure, you'll have an ArrayList that contain numbers, etc: [1,2,2,3,4] - Representing a serieses of 1's and 0's.
This will represent the sequences of 1's and 0's. Then you construct an array of the size of the ArrayList, and fill it accordingly.
The time complexity is O(n) because you need to iterate on the array only once.

This code works for any String and patterns, not only 1s and 0s. Iterate char by char, and if the current char is equal to the previous one, append the last char to the last element of the List, otherwise create a new element in the list.
public List<String> getArray(String input){
List<String> output = new ArrayList<String>();
if(input==null || input.length==0) return output;
int count = 0;
char [] inputA = input.toCharArray();
output.add(inputA[0]+"");
for(int i = 1; i <inputA.length;i++){
if(inputA[i]==inputA[i-1]){
String current = output.get(count)+inputA[i];
output.remove(count);
output.add(current);
}
else{
output.add(inputA[i]+"");
count++;
}
}
return output;
}

try this
String[] a = s.replaceAll("(.)(?!\\1)", "$1,").split(",");

I tried to implement #Maroun Maroun solution.
public static void main(String args[]){
long start = System.currentTimeMillis();
String bitStream ="0111000001010000100001111";
int length = bitStream.length();
char base = bitStream.charAt(0);
ArrayList<Integer> counts = new ArrayList<Integer>();
int count = -1;
char currChar = ' ';
for (int i=0;i<length;i++){
currChar = bitStream.charAt(i);
if (currChar == base){
count++;
}else {
base = currChar;
counts.add(count+1);
count = 0;
}
}
counts.add(count+1);
System.out.println("Time taken :" + (System.currentTimeMillis()-start ) +"ms");
System.out.println(counts.toString());
}
I believe it is more effecient way, as he said it is O(n) , you are iterating only once. Since the goal to get the count only not to store it as array. i woul recommen this. Even if we use Regular Expression ( internal it would have to iterate any way )
Result out put is
Time taken :0ms
[1, 3, 5, 1, 1, 1, 4, 1, 4, 4]

Try this one:
String[] parts = input.split("(?<=1)(?=0)|(?<=0)(?=1)");
See in action here: http://rubular.com/r/qyyfHNAo0T

how to split this string[LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,] as my requirement in java

hello every one i got a string from csv file like this
LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,
how to split this string with comma i want the array like this
s[0]=LECT-3A,s[1]=instr01,s[2]=Instructor 01,s[3]=teacher,s[4]=instr1#learnet.com,s[5]=,s[6]=,s[7]=,s[8]=male,s[9]=phone,s[10]=,s[11]=
can anyone please help me how to split the above string as my array
thank u inadvance

- Use the split() function with , as delimeter to do this.
Eg:
String s = "Hello,this,is,vivek";
String[] arr = s.split(",");

you can use the limit parameter to do this:
The limit parameter controls the number of times the pattern is applied and therefore affects the length of the resulting array. If the limit n is greater than zero then the pattern will be applied at most n - 1 times, the array's length will be no greater than n, and the array's last entry will contain all input beyond the last matched delimiter. If n is non-positive then the pattern will be applied as many times as possible and the array can have any length. If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Example:
String[]
ls_test = "LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,".split(",",12);
int cont = 0;
for (String ls_pieces : ls_test)
System.out.println("s["+(cont++)+"]"+ls_pieces);
output:
s[0]LECT-3A
s[1]instr01
s[2]Instructor 01
s[3]teacher
s[4]instr1#learnet.com
s[5]
s[6]
s[7]
s[8]male
s[9]phone
s[10]
s[11]

You could try something like so:
String str = "LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,";
List<String> words = new ArrayList<String>();
int current = 0;
int previous = 0;
while((current = str.indexOf(",", previous)) != -1)
{
words.add(str.substring(previous, current));
previous = current + 1;
}
String[] w = words.toArray(new String[words.size()]);
for(String section : w)
{
System.out.println(section);
}
This yields:
LECT-3A
instr01
Instructor 01
teacher
instr1#learnet.com
male
phone

Splitting string N into N/X strings

I would like some guidance on how to split a string into N number of separate strings based on a arithmetical operation; for example string.length()/300.
I am aware of ways to do it with delimiters such as
testString.split(",");
but how does one uses greedy/reluctant/possessive quantifiers with the split method?
Update: As per request a similar example of what am looking to achieve;
String X = "32028783836295C75546F7272656E745C756E742E657865000032002E002E005C0"
Resulting in X/3 (more or less... done by hand)
X[0] = 32028783836295C75546F
X[1] = 6E745C756E742E6578650
x[2] = 65000032002E002E005C0
Dont worry about explaining how to put it into the array, I have no problem with that, only on how to split without using a delimiter, but an arithmetic operation

You could do that by splitting on (?<=\G.{5}) whereby the string aaaaabbbbbccccceeeeefff would be split into the following parts:
aaaaa
bbbbb
ccccc
eeeee
fff
The \G matches the (zero-width) position where the previous match occurred. Initially, \G starts at the beginning of the string. Note that by default the . meta char does not match line breaks, so if you want it to match every character, enable DOT-ALL: (?s)(?<=\G.{5}).
A demo:
class Main {
public static void main(String[] args) {
int N = 5;
String text = "aaaaabbbbbccccceeeeefff";
String[] tokens = text.split("(?<=\\G.{" + N + "})");
for(String t : tokens) {
System.out.println(t);
}
}
}
which can be tested online here: http://ideone.com/q6dVB
EDIT
Since you asked for documentation on regex, here are the specific tutorials for the topics the suggested regex contains:
\G, see: http://www.regular-expressions.info/continue.html
(?<=...), see: http://www.regular-expressions.info/lookaround.html
{...}, see: http://www.regular-expressions.info/repeat.html

If there's a fixed length that you want each String to be, you can use Guava's Splitter:
int length = string.length() / 300;
Iterable<String> splitStrings = Splitter.fixedLength(length).split(string);
Each String in splitStrings with the possible exception of the last will have a length of length. The last may have a length between 1 and length.
Note that unlike String.split, which first builds an ArrayList<String> and then uses toArray() on that to produce the final String[] result, Guava's Splitter is lazy and doesn't do anything with the input string when split is called. The actual splitting and returning of strings is done as you iterate through the resulting Iterable. This allows you to just iterate over the results without allocating a data structure and storing them all or to copy them into any kind of Collection you want without going through the intermediate ArrayList and String[]. Depending on what you want to do with the results, this can be considerably more efficient. It's also much more clear what you're doing than with a regex.

How about plain old String.substring? It's memory friendly (as it reuses the original char array).

well, I think this is probably as efficient a way to do this as any other.
int N=300;
int sublen = testString.length()/N;
String[] subs = new String[N];
for(int i=0; i<testString.length(); i+=sublen){
subs[i] = testString.substring(i,i+sublen);
}
You can do it faster if you need the items as a char[] array rather as individual Strings - depending on how you need to use the results - e.g. using testString.toCharArray()

Dunno, you'll probably need a method that takes string and int times and returns a list of strings. Pseudo code (haven't checked if it works or not):
public String[] splintInto(String splitString, int parts)
{
int dlength = splitString.length/parts
ArrayList<String> retVal = new ArrayList<String>()
for(i=0; i<splitString.length;i+=dlength)
{
retVal.add(splitString.substring(i,i+dlength)
}
return retVal.toArray()
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split by first found String in Java - java

is ist possible to tell String.split("(") function that it has to split only by the first found string "("? Example: String test = "AB(A+B)+A(A+B)"; test.split("(") should result to ["AB" ,"A+B)+A(A+B)"] test.split(")") should result to ["AB(A+B" ,"+A(A+B)"]

test.split("\\(",2); See javadoc for more info EDIT: Escaped bracket, as per #Pedro's comment below.

Related

finding the middle index of a substring when there are duplicates in the string

Strange behavior of Java String split() method

Exporting specific pattern of string using split method in a most efficient way

how to split this string[LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,] as my requirement in java

Splitting string N into N/X strings

Categories

Resources

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Split by first found String in Java - java

is ist possible to tell String.split("(") function that it has to split only by the first found string "("? Example: String test = "A*B(A+B)+A*(A+B)"; test.split("(") should result to ["A*B" ,"A+B)+A*(A+B)"] test.split(")") should result to ["A*B(A+B" ,"+A*(A+B)"]

test.split("\\(",2); See javadoc for more info EDIT: Escaped bracket, as per #Pedro's comment below.

Related

finding the middle index of a substring when there are duplicates in the string

Strange behavior of Java String split() method

Exporting specific pattern of string using split method in a most efficient way

how to split this string[LECT-3A,instr01,Instructor 01,teacher,instr1#learnet.com,,,,male,phone,,] as my requirement in java

Splitting string N into N/X strings

Categories

Resources

is ist possible to tell String.split("(") function that it has to split only by the first found string "("? Example: String test = "AB(A+B)+A(A+B)"; test.split("(") should result to ["AB" ,"A+B)+A(A+B)"] test.split(")") should result to ["AB(A+B" ,"+A(A+B)"]