Splitting a Java String with '.' - java

I have
1. This is a test message
I want to print
This is a test message
I am trying
String delimiter=".";
String[] parts = line.split(delimiter);
int gg=parts.length;
Than want to print array
for (int k ;k <gg;K++)
parts[k];
But my gg is always 0.
am I missing anything.
All I need is to remove the number and . and white spaces
The number can be 1 (or) 5 digit number

You are using "." as a delimiter, you should break the special meaning of the . char.
The . char in regex is "any character" so your split is just splitting according to "any character", which is obviously not what you are after.
Use "\\." as a delimiter
For more information on pre-defined character classes you can have a look at the tutorial.
For more information on regex on general (includes the above) you can try this tutorial
EDIT:
P.S. What you are up to (removing the number) can be achieved with a one-liner, using the String.replaceAll() method.
System.out.println(line.replaceAll("[0-9]+\\.\\s+", ""));
will provide output
This is a test message
For your input example.
The idea is: [0-9] is any digit. - the + indicate there can be any number of them, which is greater then 0. The \\. is a dot (with breaking as mentioned above) and the \\s+ is at least one space.
It is all replaced with an empty string.
Note however, for strings like: "1. this is a 2. test" - it will provide "this is a test", and remove the "2. " as well, so think carefully if that is indeed what you are after.

Use following code..
String delimtor="\\."; // use this because . required to be skipped
String[] parts = line.split(delimtor);
For your for loop.
for (int k=0 ;k <gg.length;K++)
parts[k];

try this
String delimtor = "\\.";
"." has a special meaning for a regular expression.

If you are just trying to remove the prefix numbers then you can do it in one line. Not sure if you actually want to split on multiple dots. If it is just the prefix then you can do it in one line
String s = "1. with single digit";
String s2 = "999. with multiple digits";
String s3 = "999. with multiple digits . and . dots";
assertEquals("with single digit", (s.substring(s.indexOf(".") + 1).trim()));
assertEquals("with multiple digits", (s2.substring(s2.indexOf(".") + 1).trim()));
assertEquals("with multiple digits . and . dots", (s3.substring(s3.indexOf(".") + 1).trim()));

Related

can deal with the first line space when i use regex for polynomials

here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out
Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits
Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.

Split String 2 times but with different splits ";" and "."

Original String: "12312123;www.qwerty.com"
With this Model.getList().get(0).split(";")[1]
I get: "www.qwerty.com"
I tried doing this: Model.getList().get(0).split(";")[1].split(".")[1]
But it didnt work I get exception. How can I solve this?
I want only "qwerty"
Try this, to achieve "qwerty":
Model.getList().get(0).split(";")[1].split("\\.")[1]
You need escape dot symbol
Try to use split(";|\\.") like this:
for (String string : "12312123;www.qwerty.com".split(";|\\.")) {
System.out.println(string);
}
Output:
12312123
www
qwerty
com
You can split a string which has multiple delimiters. Example below:
String abc = "11;xyz.test.com";
String[] tokens = abc.split(";|\\.");
System.out.println(tokens[tokens.length-2]);
The array index 1 part doesn't make sense here. It will throw an ArrayIndexOutOfBounds Exception or something of the sort.
This is because splitting based on "." doesn't work the way you want it to. You would need to escape the period by putting "\." instead. You will find here that "." means something completely different.
You'd need to escape the ., i.e. "\\.". Period is a special character in regular expressions, meaning "any character".
What your current split means is "split on any character"; this means that it splits the string into a number of empty strings, since there is nothing between consecutive occurrences of " any character".
There is a subtle gotcha in the behaviour of the String.split method, which is that it discards trailing empty strings from the token array (unless you pass a negative number as the second parameter).
Since your entire token array consists of empty strings, all of these are discarded, so the result of the split is a zero-length array - hence the exception when you try to access one of its element.
Don't use split, use a regular expression (directly). It's safer, and faster.
String input = "12312123;www.qwerty.com";
String regex = "([^.;]+)\\.[^.;]+$";
Matcher m = Pattern.compile(regex).matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // prints: qwerty
}

Why the space appears as sub string in this split instruction?

I have string with spaces and some non-informative characters and substrings required to be excluded and just to keep some important sections. I used the split as below:
String myString[]={"01: Hi you look tired today? Can I help you?"};
myString=myString[0].split("[\\s+]");// Split based on any white spaces
for(int ii=0;ii<myString.length;ii++)
System.out.println(myString[ii]);
The result is :
01:
Hi
you
look
tired
today?
Can
I
help
you?
The spaces appeared after the split as sub strings when the regex is “[\s+]” but disappeared when the regex is "\s+". I am confused and not able to find answer in the related stack overflow pages. The link regex-Pattern made me more confused.
Please help, I am new with java.
19/1/2015:Edit
After your valuable advice, I reached to point in my program where a conditional statements is required to be decomposed and processed. The case I have is:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\,]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result is fine till now as:
01:IF
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
with
0.4610;
My next step is to add string "with" to the regex and get rid of this word while doing the split.
I tried it this way:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\, with]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result not perfect, because I got unwonted extra split at every "h" letter as:
01:IF
rd.
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
0.4610;
Any advice on how to specify string with mixed white spaces and separation marks?
Many thanks.
inside square brackets, [\s+] will represent the whitespace character class with the plus sign added. it is only one character so a sequence of spaces will split many empty strings as Todd noted, and will also use + as separator.
you should use \s+ (without brackets) as the separator. that means one or more whitespace characters.
myString=myString[0].split("\\s+");
Your biggest problem is not understanding enough about regular expressions to write them properly. One key point you don't comprehend is that [...] is a character class, which is a list of characters any one of which can match. For example:
[abc] matches either a, b or c (it does not match "abc")
[\\s+] matches any whitespace or "+" character
[with] matches a single character that is either w, i, t or h
[.$&^?] matches those literal characters - most characters lose their special regex meaning when in a character class
To split on any number of whitespace, comma and ampersand and consume "with" (if it appears), do this:
String [] s2 = s1.split("[\\s,&]+(with[\\s,&]+)?");
You can try it easily here Online Regex and get useful comments.

Regular expression for splitting a String while preserving whitespace

I am doing an Android project which needs to split a String into tokens while preserving whitespaces and also not to split at non-word characters like #, & etc ...
Using \b splits at any non-word character .So i need a way to split the string in the following way.
Input: (. indicates whitespace)
A.A#..A##
Desired output:
A
.
A#
..
A##
So these 5 lines are the 5 values I would like in an array or similar. That means the 4th element of the result-array contains 2 spaces.
I think this is what you want:
(?<=\S)(?=\s)|(?<=\s)(?=\S)
Debuggex Demo
Basically I'm saying "if the previous character is a non-space and the next is a space or if the previous is a space and the next is a non-space, then split".
Use StringTokenizer:
StringTokenizer st = new StringTokenizer("A.A#..A##", ".");//first argument is string you want to split, another is whitespace
while(st.hasMoreTokens())
System.out.println(st.nextToken());
output will be:
A
A#
A##
Try:
String s = "A.A#..A##";
if(s.contains("..")) | s.contains("...")) {
s.replace("..", ".");
s.replace("...", ".");
String out[] = s.split(".");
It should give you an array with Strings the way you want :)
Don't forget to replace the "." with actual spaces :)

Java and Regex to add some linebreaks

So I have the following string that I want to inserts some \n before the numbers
1. Hello 2. Satuday 3.Kidding 4. sdsfjdfkj
I want to replace it to look like this
1. Hello
2. Satuday
3.Kidding
4. sdsfjdfkj
I was thinking something like this
variable.replaceAll("\d.", "\n");
Not sure how I could get the context I am find to replace
You can use replaceAll with a non-capturing regex, like this:
String res = str.replaceAll("\\b(?=\\d+[.])", "\n");
Given your string as an input, it prints
1. Hello
2. Satuday
3.Kidding
4. sdsfjdfkj
Demo on ideone.
So basically you want to replace every whitespace that has number and dot after it with new line. Try
variable = variable.replaceAll("\\s+(\\d+[.])", "\n$1");
// $1 is reference to captured group 1 which will contain number and dot
or
variable = variable.replaceAll("\\s+(?=\\d+[.])", "\n");
// (?=...) is called look-ahead, \\s+(?=\\d+[.]) makes sure that after matched
// whitespace there will be number and dot
somewhat slow, but easy fix:
string=string.replace("1","\n1");// '\n' is the escape sequence for newline
then repeat for all numbers

Categories