String split by dot - Java - java

I have the following code:
public static void main(String[] args) {
String str = "21.12.2015";
String delim = "\\.";
String[] st = str.split(delim);
System.out.println(st[0]+"."+st[1]+"."+st[2]); // 1
System.out.println(st[0]+delim+st[1]+delim+st[2]); // 2
}
Now, line 1 is printing expected output - 21.12.2015. But why line 2 is not giving same output as line 1? Why it is printing like 21\.12\.2015?
EDIT:
Actually in my requirement, the delimiter changes dynamically for each string(- or / or .). So I am trying to assign the delimiter to a variable and then split by it and finally print it as a pattern(say dd.mm.yy or dd-mm-yy or etc). For other delimiters it's fine, but for dot it's coming like dd\.mm\.yy. How shall I achieve the expected result?

This handles all delim values:
String str = "21.12.2015";
String delim = "."; // or "-" or "?" or ...
String[] st = str.split(java.util.regex.Pattern.quote(delim));

When you say split you are using delim as a regex pattern. It is treated differently. Please have a look to this regular expression.
But when you are using delim in sysout you are using it as string. the difference is obviuos

When you create the delim variable, you escape the backslash. The real value of the delim variable is \..
Just create the delim variable as (the backslash is useless):
String delim = ".";

because of delim = "\\.", while spliting "\\." is required.

You are using the split method from the String class, which uses regular expression for splitting the the string.
Due to this the \\. will split the string by every dot and needs to be escaped, since the dot itself is part of the regular expression.
In the second part you are simply printing the string, in which the backlash itself is a indicator for an string expression (like \n as a new line).
The double backlash just excludes this string expression to be written as a normal string "\n" in this case, and thats why you get the "\." result
For better understanding, try to delete one of the backslashes in the delim variable, and the java interpreter will throw an error since "\." is not a string expression

\\. is a regex String to parse . literally. You need it while splitting (since split() expects a regex String).
While printing, you need to use . directly isntead of "\\." because println() doesn't need a regex.

Split method uses regex for splitting so you will need to provide as \\. while this is not the scenario when you are printing it, you just need to use '.' directly.
In Java \\. will be printed as \. as \\ is considered as a single backslash.

Related

Java - Splitting String [duplicate]

I am wondering if I am going about splitting a string on a . the right way? My code is:
String[] fn = filename.split(".");
return fn[0];
I only need the first part of the string, that's why I return the first item. I ask because I noticed in the API that . means any character, so now I'm stuck.
split() accepts a regular expression, so you need to escape . to not consider it as a regex meta character. Here's an example :
String[] fn = filename.split("\\.");
return fn[0];
I see only solutions here but no full explanation of the problem so I decided to post this answer
Problem
You need to know few things about text.split(delim). split method:
accepts as argument regular expression (regex) which describes delimiter on which we want to split,
if delim exists at end of text like in a,b,c,, (where delimiter is ,) split at first will create array like ["a" "b" "c" "" ""] but since in most cases we don't really need these trailing empty strings it also removes them automatically for us. So it creates another array without these trailing empty strings and returns it.
You also need to know that dot . is special character in regex. It represents any character (except line separators but this can be changed with Pattern.DOTALL flag).
So for string like "abc" if we split on "." split method will
create array like ["" "" "" ""],
but since this array contains only empty strings and they all are trailing they will be removed (like shown in previous second point)
which means we will get as result empty array [] (with no elements, not even empty string), so we can't use fn[0] because there is no index 0.
Solution
To solve this problem you simply need to create regex which will represents dot. To do so we need to escape that .. There are few ways to do it, but simplest is probably by using \ (which in String needs to be written as "\\" because \ is also special there and requires another \ to be escaped).
So solution to your problem may look like
String[] fn = filename.split("\\.");
Bonus
You can also use other ways to escape that dot like
using character class split("[.]")
wrapping it in quote split("\\Q.\\E")
using proper Pattern instance with Pattern.LITERAL flag
or simply use split(Pattern.quote(".")) and let regex do escaping for you.
Split uses regular expressions, where '.' is a special character meaning anything. You need to escape it if you actually want it to match the '.' character:
String[] fn = filename.split("\\.");
(one '\' to escape the '.' in the regular expression, and the other to escape the first one in the Java string)
Also I wouldn't suggest returning fn[0] since if you have a file named something.blabla.txt, which is a valid name you won't be returning the actual file name. Instead I think it's better if you use:
int idx = filename.lastIndexOf('.');
return filename.subString(0, idx);
the String#split(String) method uses regular expressions.
In regular expressions, the "." character means "any character".
You can avoid this behavior by either escaping the "."
filename.split("\\.");
or telling the split method to split at at a character class:
filename.split("[.]");
Character classes are collections of characters. You could write
filename.split("[-.;ld7]");
and filename would be split at every "-", ".", ";", "l", "d" or "7". Inside character classes, the "." is not a special character ("metacharacter").
As DOT( . ) is considered as a special character and split method of String expects a regular expression you need to do like this -
String[] fn = filename.split("\\.");
return fn[0];
In java the special characters need to be escaped with a "\" but since "\" is also a special character in Java, you need to escape it again with another "\" !
String str="1.2.3";
String[] cats = str.split(Pattern.quote("."));
Wouldn't it be more efficient to use
filename.substring(0, filename.indexOf("."))
if you only want what's up to the first dot?
Usually its NOT a good idea to unmask it by hand. There is a method in the Pattern class for this task:
java.util.regex
static String quote(String s)
The split must be taking regex as a an argument... Simply change "." to "\\."
The solution that worked for me is the following
String[] fn = filename.split("[.]");
Note: Further care should be taken with this snippet, even after the dot is escaped!
If filename is just the string ".", then fn will still end up to be of 0 length and fn[0] will still throw an exception!
This is, because if the pattern matches at least once, then split will discard all trailing empty strings (thus also the one before the dot!) from the array, leaving an empty array to be returned.
Using ApacheCommons it's simplest:
File file = ...
FilenameUtils.getBaseName(file.getName());
Note, it also extracts a filename from full path.
split takes a regex as argument. So you should pass "\." instead of "." because "." is a metacharacter in regex.

How to split String with this regular expression?

if (url.contains("|##|")) {
Log.e("url data", "" + url);
final String s[] = url.split("\\|##|");
}
I have a URL with the separator "|##|"
I tried to separate this but didn't find solution.
Use Pattern.quote, it'll do the work for you:
Returns a literal pattern String for the specified String.
final String s[] = url.split(Pattern.quote("|##|"));
Now "|##|" is treated as the string literal "|##|" and not the regex "|##|". The problem is that you're not escaping the second pipe, it has a special meaning in regex.
An alternative solution (as suggested by #kocko), is escaping* the special characters manually:
final String s[] = url.split("\\|##\\|");
* Escaping a special character is done by \, but in Java \ is represented as \\
You have to escape the second |, as it is a regex operator:
final String s[] = url.split("\\|##\\|");
You should try to understand the concept as well - String.split(String regex) interprets the parameter as a regular expression, and since pipe character "|" is a logical OR in regular expression, you would be getting result as an array of each alphabet is your word.
Even if you had used url.split("|"); you would have got same result.
Now why the String.contains(CharSequence s) passed the |##| in the start because it interprets the parameter as CharSequence and not a regular expression.
Bottom line: Check the API that how the particular method interprets the passed input. Like we have seen, in case of split() it interprets as regular expression while in case of contains() it interprets as character sequence.
You can check the regular expression constructs over here - http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

Why split method does not support $,* etc delimiter to split string

import java.util.StringTokenizer;
class MySplit
{
public static void main(String S[])
{
String settings = "12312$12121";
StringTokenizer splitedArray = new StringTokenizer(settings,"$");
String splitedArray1[] = settings.split("$");
System.out.println(splitedArray1[0]);
while(splitedArray.hasMoreElements())
System.out.println(splitedArray.nextToken().toString());
}
}
In above example if i am splitting string using $, then it is not working fine and if i am splitting with other symbol then it is working fine.
Why it is, if it support only regex expression then why it is working fine for :, ,, ; etc symbols.
$ has a special meaning in regex, and since String#split takes a regex as an argument, the $ is not interpreted as the string "$", but as the special meta character $. One sexy solution is:
settings.split(Pattern.quote("$"))
Pattern#quote:
Returns a literal pattern String for the specified String.
... The other solution would be escaping $, by adding \\:
settings.split("\\$")
Important note: It's extremely important to check that you actually got element(s) in the resulted array.
When you do splitedArray1[0], you could get ArrayIndexOutOfBoundsException if there's no $ symbol. I would add:
if (splitedArray1.length == 0) {
// return or do whatever you want
// except accessing the array
}
If you take a look at the Java docs you could see that the split method take a regex as parameter, so you have to write a regular expression not a simple character.
In regex $ has a specific meaning, so you have to escape it this way:
settings.split("\\$");
The problem is that the split(String str) method expects str to be a valid regular expression. The characters you have mentioned are special characters in regular expression syntax and thus perform a special operation.
To make the regular expression engine take them literally, you would need to escape them like so:
.split("\\$")
Thus given this:
String str = "This is 1st string.$This is the second string";
for(String string : str.split("\\$"))
System.out.println(string);
You end up with this:
This is 1st string.
This is the second strin
Dollar symbol $ is a special character in Java regex. You have to escape it so as to get it working like this:
settings.split("\\$");
From the String.split docs:
Splits this string around matches of the given regular expression.
This method works as if by invoking the two-argument split method with
the given expression and a limit argument of zero. Trailing empty
strings are therefore not included in the resulting array.
On a side note:
Have a look at the Pattern class which will give you an idea as to which all characters you need to escape.
Because $ is a special character used in Regular Expressions which indicate the beginning of an expression.
You should escape it using the escape sequence \$ and in case of Java it should be \$
Hope that helps.
Cheers

Splitting a string with java not working

I have a string in java that looks something like:
holdingco^(218) 333-4444^scott#holdingco.com
I set a string variable equal to it:
String value = "holdingco^(218) 333-4444^scott#holdingco.com";
Then I want to split this string into it's components:
String[] components = value.split("^");
However it does not split up the string. I have tried escaping the carrot delimiter to no avail.
Use
String[] components = value.split("\\^");
The unescaped ^ means beginning of a string in a regex, and the unescaped $ means end. You have to use two backslashes for escaping, as the string literal "\\" represents a single backslash, and that's what regex needs.
If you tried escaping with one backslash, it didn't compile, as \^ is not a valid escape sequence in Java.
try with: value.split("\\^"); this should work a bit better

Is it possible to split a String around "." in java?

When I try to split a String around occurrences of "." the method split returns an array of strings with length 0.When I split around occurrences of "a" it works fine.Does anyone know why?Is split not supposed to work with punctuation marks?
split takes regex. Try split("\\.").
String a = "a.jpg";
String str = a.split(".")[0];
This will throw ArrayOutOfBoundException because split accepts regex arguments and "." is a reserved character in regular expression, representing any character.
Instead, we should use the following statement:
String str = a.split("\\.")[0]; //Yes, two backslashes
When the code is compiled, the regular expression is known as "\.", which is what we want it to be
Here is the link of my old blog post in case you are interested: http://junxian-huang.blogspot.com/2009/01/java-tip-how-to-split-string-with-dot.html

Categories