Java string split gives array index out of bounds error - java

I came across this unusual error today. Can anyone explain me what I am doing wrong. Below is the code:
AreStringsPermuted checkStringPerObj = new AreStringsPermuted();
String[] inputStrings = {"siddu$isdud", "siddu$siddarth", "siddu$sidde"};
for(String inputString : inputStrings){
String[] stringArray = inputString.split("$");
if(checkStringPerObj.areStringsPermuted(stringArray[0],stringArray[1]))
System.out.println("Strings : " + stringArray[0] + " ," + stringArray[1] + " are permuted");
else
System.out.println("Strings : " + stringArray[0] + " ," + stringArray[1] + " are not permuted");
}
The above code errors out at when i try to split the string. For some reason split does not work when I try to divide each string using "$". Can any one explain me what I am doing wrong here?
Below is the error message:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at arraysAndStrings.TestClass.checkStringsPermuted(TestClass.java:24)
at arraysAndStrings.TestClass.main(TestClass.java:43)

String.split() takes a regular expression, so you need to quote strings that contain characters that have special meanings in regular expressions.
String regularExpression = Pattern.quote("$");
for (String inputString : inputStrings) {
String[] stringArray = inputString.split(regularExpression);

String.split( ) uses regex partern and $ has special meaning in regex(the end of line).
In your case, use "\$" instead of "$".
String []arrayString = inputString.split("\\$");
For more information,
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Related

Regex: starts with messages and string between parent message curly brace

I want to get all the message data. Such that it should look for message and all the data between curly braces of the parent message. With the below pattern, I am not getting all parent body.
String data = "syntax = \"proto3\";\r\n" +
"package grpc;\r\n" +
"\r\n" +
"import \"envoyproxy/protoc-gen-validate/validate/validate.proto\";\r\n" +
"import \"google/api/annotations.proto\";\r\n" +
"import \"google/protobuf/wrappers.proto\";\r\n" +
"import \"protoc-gen-swagger/options/annotations.proto\";\r\n" +
"\r\n" +
"message Acc {\r\n" +
" message AccErr {\r\n" +
" enum Enum {\r\n" +
" UNKNOWN = 0;\r\n" +
" CASH = 1;\r\n" +
" }\r\n" +
" }\r\n" +
" string account_id = 1;\r\n" +
" string name = 3;\r\n" +
" string account_type = 4;\r\n" +
"}\r\n" +
"\r\n" +
"message Name {\r\n" +
" string firstname = 1;\r\n" +
" string lastname = 2;\r\n" +
"}";
List<String> allMessages = new ArrayList<>();
Pattern pattern = Pattern.compile("message[^\\}]*\\}");
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
String str = matcher.group();
allMessages.add(str);
System.out.println(str);
}
}
I am expecting response like below in my array list of string with size 2.
allMessage.get(0) should be:
message Acc {
message AccErr {
enum Enum {
UNKNOWN = 0;
CASH = 1;
}
}
string account_id = 1;
string name = 3;
string account_type = 4;
}
and allMessage.get(1) should be:
message Name {
string firstname = 1;
string lastname = 2;
}
First remove the input prior to "message" appearing at the start of the line, then split on newlines followed by "message" (include the newlines in the split so newlines that intervene parent messages are consumed):
String[] messages = data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)");
See live demo.
If you actually need a List<String>, pass that result to Arrays.asList():
List<String> = Arrays.asList(data.replaceAll("(?sm)\\A.*?(?=message)", "").split("\\R+(?=message)"));
The first regex matches everything from start up to, but not including, the first line that starts with message, which is replaced with a blank (ie deleted). Breaking the down:
(?sm) turns on flags s, which makes dot also match newlines, and m, which makes ^ and $ match start and end of each line
\\A means the very start of input
.*? .* means any quantity of any character (including newline as per the s flag being set), but adding ? makes this reluctant, so it matches as few characters as possible while still matching
(?=^message) is a look ahead and means the following characters are a start of a line then "message"
See regex101 live demo for a thorough explanation.
The split regex matches one or more line break sequences when they are followed by "message":
\\R+ means one or more line break sequences (all OS variants)
(?=message) is a look ahead and means the following characters are "message"
See regex101 live demo for a thorough explanation.
Try this for your regex. It anchors on message being the start of a line, and uses a positive lookahead to find the next message or the end of messages.
Pattern.compile("(?s)\r\n(message.*?)(?=(\r\n)+message|$)")
// or
Pattern.compile("(?s)\r?\n(message.*?)(?=(\r?\n)+message|$)")
No spliting, parsing, or managing nested braces either :)
https://regex101.com/r/Wa2xxx/1

cannot split a specific kind of strings using Java

I am working in Java. I have list of parameters stored in a string which is coming form excel. I want to split it only at starting hyphen of every new line. This string is stored in every excel cell and I am trying to extract it using Apache poi. The format is as below:
String text =
"- I am string one\n" +
"-I am string two\n" +
"- I am string-three\n" +
"with new line\n" +
"-I am string-four\n" +
"- I am string five";
What I want
array or arraylist which looks like this
[I am string one,
I am string two,
I am string-three with new line,
I am string-four,
I am string five]
What I Tried
I tried to use split function like this:
String[] newline_split = text.split("-");
but the output I get is not what I want
My O/P
[, I am string one,
I am string two,
I am string, // wrong
three // wrong
with new line, // wrong
I am string, // wrong!
four, // wrong!
I am string five]
I might have to tweak split function a bit but not able to understand how, because there are so many hyphens and new lines in the string.
P.S.
If i try splitting only at new line then the line - I am string-three \n with new line breaks into two parts which again is not correct.
EDIT:
Please know that this data inside string is incorrectly formatted just like what is shown above. It is coming from an excel file which I have received. I am trying to use apache poi to extract all the content out of each excel cell in a form of a string.
I intentionally tried to keep the format like what client gave me. For those who are confused about description inside A, I have changed it because I cannot post the contents on here as it is against privacy of my workplace.
You can
remove line separators (replace it with space) if they don't have - after it (in next line): .replaceAll("\\R(?!-)", " ") should do the trick
\R (written as "\\R" in string literal) since Java 8 can be used to represent line separators
(?!...) is negative-look-ahead mechanism - ensures that there is no - after place in which it was used (will not include it in match so we will not remove potential - which ware matched by it)
then remove - placed at start of each line (lets also include followed whitespaces to trim start of the string). In other words replace - placed
after line separators: can be represented by "\\R"
after start of string: can be represented by ^
This should do the trick: .replaceAll("(?<=\\R|^)-\\s*","")
split on remaining line separtors: .split("\\R")
Demo:
String text =
"- I am string one\n" +
"-I am string two\n" +
"- I am string-three\n" +
"with new line\n" +
"-I am string-four\n" +
"- I am string five";
String[] split = text.replaceAll("\\R(?!-)", " ")
.replaceAll("(?<=\\R|^)-\\s*","")
.split("\\R");
for (String s: split){
System.out.println("'"+s+"'");
}
Output (surrounded with ' to show start and end of results):
'I am string one'
'I am string two'
'I am string-three with new line'
'I am string-four'
'I am string five'
This is how I would do:
import java.util.*;
public class MyClass {
public static void main(String args[]) {
String A = "- I am string one \n" +
" -I am string two\n" +
" - I am string-three \n" +
" with new line\n" +
" -I am string-four\n" +
"- I am string five";
String[] s2 = A.split("\r?\n");
List<String> lines = new ArrayList<String>();
String line = "";
for (int i = 0; i < s2.length; i++) {
String ss = s2[i].trim();
if (i == 0) { // first line MUST start with "-"
line = ss.substring(1).trim();
} else if (ss.startsWith("-")) {
lines.add(line);
ss = ss.substring(1).trim();
line = ss;
} else {
line = line + " " + ss;
}
}
lines.add(line);
System.out.println(lines.toString());
}
}
I hope it helps.
A little explanation:
I will process line by line, trimming each one.
If it starts with '-' it means the end of the previous line, so I include it in the list. If not, I concatenate with the previous line.
looks as if you are splitting the FIRST - of each line, so you need to remove every instance of a "newline -"
str.replace("\n-", '\n')
then Remove the initial "-"
str = str.substring(1);

Unwanted elements appearing when splitting a string with multiple separators in Java

I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
System.out.println("spart[4]: " + spart[4]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: fever
spart[4]: normal
My desired output is:
spart[0]: s
spart[1]: film
spart[2]: fever
spart[3]: normal
spart[4]: curse
Try with this:
public static void main(String[] args) {
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");
for (String string : spart) {
System.out.println("'"+string+"'");
}
}
output:
's'
'film'
'fever'
'normal'
'curse'
I believe it is because you have a Greedy quantifier for space at the end there. I think you would have to use an escape sequence for the plus sign too.
String spart = s.replaceAll( "\\W", " " ).split(" +");

How to remove commas at the end of any string

I have Strings "a,b,c,d,,,,, ", ",,,,a,,,,"
I want these strings to be converted into "a,b,c,d" and ",,,,a" respectively.
I am writing a regular expression for this. My java code looks like this
public class TestRegx{
public static void main(String[] arg){
String text = ",,,a,,,";
System.out.println("Before " +text);
text = text.replaceAll("[^a-zA-Z0-9]","");
System.out.println("After " +text);
}}
But this is removing all the commas here.
How can write this to achieve as given above?
Use :
text.replaceAll(",*$", "")
As mentioned by #Jonny in comments, can also use:-
text.replaceAll(",+$", "")
Your first example had a space at the end, so it needs to match [, ]. When using the same regular expression multiple times, it's better to compile it up front, and it only needs to replace once, and only if at least one character will be removed (+).
Simple version:
text = text.replaceFirst("[, ]+$", "");
Full code to test both inputs:
String[] texts = { "a,b,c,d,,,,, ", ",,,,a,,,," };
Pattern p = Pattern.compile("[, ]+$");
for (String text : texts) {
String text2 = p.matcher(text).replaceFirst("");
System.out.println("Before \"" + text + "\"");
System.out.println("After \"" + text2 + "\"");
}
Output
Before "a,b,c,d,,,,, "
After "a,b,c,d"
Before ",,,,a,,,,"
After ",,,,a"

Java empty String split ArrayIndexOutOfBoundsException [duplicate]

This question already has answers here:
Java String split is not working
(3 answers)
Closed 7 years ago.
I have come across an unexpected feature in the split function of String in Java, here is my code:
final String line = "####";
final String[] lineData = line.split("#");
System.out.println("data: " + lineData[0] + " -- " + lineData[1]);
This code gives me an ArrayIndexOutOfBoundsException, whereas I would expect it to print "" and "" (two empty Strings), or maybe null and null (two null Strings).
If I change my code for
final String line = " # # # #";
final String[] lineData = line.split("#");
System.out.println("data: " + lineData[0] + " -- " + lineData[1]);
Then it prints " " and " " (the expected behaviour).
How can I make my first code not throwing an exception, and giving me an array of empty Strings?
Thanks
You can use the limit attribute of split method to achieve this. Try
final String line = "####";
final String[] lineData = line.split("#", -1);
System.out.println("Array length : " + lineData.length);
System.out.println("data: " + lineData[0] + " -- " + lineData[1]);
As always, answer is written in the Javadoc
This method works as if by invoking the two-argument split method with the given expression and a limit argument of zero. Trailing empty strings are therefore not included in the resulting array.
Since your array is composed only by empty strings, they are not added to it, thus trying to access the values result in an ArrayOutOfBoundException.
If I understand your question, this would do it -
final String line = " # ";
final String[] lineData = line.split("#");
System.out.println("data: " + lineData[0] + " -- " + lineData[1]);
The problem is that the empty string isn't a character.

Categories