So I'm using a String as a delimiter to use when I call the Split method.
String[] aExpr;
String strDelimiter = "[-+/=^//%//*//(//);:?]";
aExpr = expr.split(strDelimiter);
This fills aExpr with the strings broken accordingly with the strDelimiter.
The thing is that I also want the Split() method to compare not only the strDelimiter, but also this String:
String oprDelimiter = "[abcdefghijklmnopqrstuvwxyz0123456789]+"
Which is basically any characters followed by numbers. I could add all these characters to the First String, but the + in the end won't let me. The + means that any combination of the words will break the String. Any ideas of how could I do this?
Try using this regex:
(?<=[abcdefghijklmnopqrstuvwxyz0123456789])(?=[-+=^%*();:?])|(?<=[-+=^%*();:?])(?=[abcdefghijklmnopqrstuvwxyz0123456789])
as the delimiter. It will split on any location that is preceded by any of the characters abcdefghijklmnopqrstuvwxyz0123456789 and followed by any of the characters -+=^%*();:?, or vice versa. Explanation and demonstration here: http://regex101.com/r/mT3lL1.
Related
Original String: "12312123;www.qwerty.com"
With this Model.getList().get(0).split(";")[1]
I get: "www.qwerty.com"
I tried doing this: Model.getList().get(0).split(";")[1].split(".")[1]
But it didnt work I get exception. How can I solve this?
I want only "qwerty"
Try this, to achieve "qwerty":
Model.getList().get(0).split(";")[1].split("\\.")[1]
You need escape dot symbol
Try to use split(";|\\.") like this:
for (String string : "12312123;www.qwerty.com".split(";|\\.")) {
System.out.println(string);
}
Output:
12312123
www
qwerty
com
You can split a string which has multiple delimiters. Example below:
String abc = "11;xyz.test.com";
String[] tokens = abc.split(";|\\.");
System.out.println(tokens[tokens.length-2]);
The array index 1 part doesn't make sense here. It will throw an ArrayIndexOutOfBounds Exception or something of the sort.
This is because splitting based on "." doesn't work the way you want it to. You would need to escape the period by putting "\." instead. You will find here that "." means something completely different.
You'd need to escape the ., i.e. "\\.". Period is a special character in regular expressions, meaning "any character".
What your current split means is "split on any character"; this means that it splits the string into a number of empty strings, since there is nothing between consecutive occurrences of " any character".
There is a subtle gotcha in the behaviour of the String.split method, which is that it discards trailing empty strings from the token array (unless you pass a negative number as the second parameter).
Since your entire token array consists of empty strings, all of these are discarded, so the result of the split is a zero-length array - hence the exception when you try to access one of its element.
Don't use split, use a regular expression (directly). It's safer, and faster.
String input = "12312123;www.qwerty.com";
String regex = "([^.;]+)\\.[^.;]+$";
Matcher m = Pattern.compile(regex).matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // prints: qwerty
}
I have string with spaces and some non-informative characters and substrings required to be excluded and just to keep some important sections. I used the split as below:
String myString[]={"01: Hi you look tired today? Can I help you?"};
myString=myString[0].split("[\\s+]");// Split based on any white spaces
for(int ii=0;ii<myString.length;ii++)
System.out.println(myString[ii]);
The result is :
01:
Hi
you
look
tired
today?
Can
I
help
you?
The spaces appeared after the split as sub strings when the regex is “[\s+]” but disappeared when the regex is "\s+". I am confused and not able to find answer in the related stack overflow pages. The link regex-Pattern made me more confused.
Please help, I am new with java.
19/1/2015:Edit
After your valuable advice, I reached to point in my program where a conditional statements is required to be decomposed and processed. The case I have is:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\,]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result is fine till now as:
01:IF
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
with
0.4610;
My next step is to add string "with" to the regex and get rid of this word while doing the split.
I tried it this way:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\, with]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result not perfect, because I got unwonted extra split at every "h" letter as:
01:IF
rd.
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
0.4610;
Any advice on how to specify string with mixed white spaces and separation marks?
Many thanks.
inside square brackets, [\s+] will represent the whitespace character class with the plus sign added. it is only one character so a sequence of spaces will split many empty strings as Todd noted, and will also use + as separator.
you should use \s+ (without brackets) as the separator. that means one or more whitespace characters.
myString=myString[0].split("\\s+");
Your biggest problem is not understanding enough about regular expressions to write them properly. One key point you don't comprehend is that [...] is a character class, which is a list of characters any one of which can match. For example:
[abc] matches either a, b or c (it does not match "abc")
[\\s+] matches any whitespace or "+" character
[with] matches a single character that is either w, i, t or h
[.$&^?] matches those literal characters - most characters lose their special regex meaning when in a character class
To split on any number of whitespace, comma and ampersand and consume "with" (if it appears), do this:
String [] s2 = s1.split("[\\s,&]+(with[\\s,&]+)?");
You can try it easily here Online Regex and get useful comments.
I am using
mString.replaceAll("[\n,\\s]$", "");
Not working, what is the correct way to remove newlines commas or spaces from the end of a string if the can appear in any order.
Try this
mString = mString.replaceAll("[\n,\\s]+$", "");
There are two reasons your attempt
mString.replaceAll("[\n,\\s]$", "");
doesn't work. First of all, replaceAll does not modify the String instance, because Strings are immutable. It returns the modified string as the result of the method. But the above statement discards the result. So you at least need
mString = mString.replaceAll(...);
The second reason is that the replacement method looks for the pattern in order. If it started over at the beginning of the string after each replacement, then your expression would replace a newline, comma, or whitespace at the end of the string, then it would keep doing it until there were no more such characters at the end. But it doesn't do things this way (and if it did, it would be way too easy to write replaceAll expressions that looped infinitely). replaceAll works like this: It searches for the pattern, and if it finds it, it copies all characters before the pattern to the result. Then, it copies the replacement string to the result. Then, it resets the matcher to the character after the match. In your case, since the pattern match goes to the end of the input (because of the $), the character after the match will be the end of the string, and there can be no more matches. Thus, the matcher would only be able to replace one character. That's why you need to add + to the pattern, as in the other correct answers, like Anubhava's:
mString = mString.replaceAll("[,\\s]+$", "");
You can just take out \n since \s includes new lines also. You also need to add + quantifier to make it match more than 1 occurrence of whitespace or comma at end.
mString = mString.replaceAll("[,\\s]+$", "");
Try mString = mString.replaceAll("(\\n|,|\\s)+$", "");
I have a String : testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing
I want to replace the character s with some other character sequence suppose : <b>X</b> but i want the character sequence s to remain intact i.e. regex should not update the character s with a previous character as "<".
I used the JAVA code :
String str = testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
str = str.replace("s[^<]", "<b>X</b>");
The problem is that the regex would match 2 characters, s and following character if it is not ">" and Sting.replace would replace both the characters. I want only s to be replaced and not the following character.
Any help would be appreciated. Since i have lots of such replacements i don't want to use a loop matching each character and updating it sequentially.
There are other ways, but you could, for example, capture the second character and put it back:
str = str.replaceAll("s([^<])", "<b>X\\1</b>");
Looks like you want a negative lookahead:
s(?!<)
String str = "testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;";
System.out.println(str.replaceAll("s(?!<)", "<b>X</b>"));
output:
te<b>X</b>ting<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
Use look arounds to assert, but not capture, surrounding text:
str = str.replaceAll("s(?![^<]))", "whatever");
Or, capture and put back using a back reference $1:
str = str.replaceAll("s([^<])", "whatever$1");
Note that you need to use replaceAll() (which use regex), rather than replace() (which uses plain text).
When I try to split a String around occurrences of "." the method split returns an array of strings with length 0.When I split around occurrences of "a" it works fine.Does anyone know why?Is split not supposed to work with punctuation marks?
split takes regex. Try split("\\.").
String a = "a.jpg";
String str = a.split(".")[0];
This will throw ArrayOutOfBoundException because split accepts regex arguments and "." is a reserved character in regular expression, representing any character.
Instead, we should use the following statement:
String str = a.split("\\.")[0]; //Yes, two backslashes
When the code is compiled, the regular expression is known as "\.", which is what we want it to be
Here is the link of my old blog post in case you are interested: http://junxian-huang.blogspot.com/2009/01/java-tip-how-to-split-string-with-dot.html