Which characters not accept in split method? - java

I have the following code:
String s = "100$ali$Rezaie" ;
String[] ar = s.split("$") ;
The folowing characters do not work in split:
. $ ^
Are there any other characters that will not be accepted in split() method?

The argument to split is a regular expression, not a single character. The page at http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html lists all the characters which have a special meaning in regular expressions.

http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#split%28java.lang.String%29
As the docs say, split takes a regexp as argument. Characters such as ., $ and ^ have special meaning in regexpes.
And it's not to say you couldn't use those characters to split strings. No, you can simply escape the characters in regexp to make them behave "ordinarily".
String[] ar = s.split("\\$");

Use like this
String[] parts = str.split("\\$");
The \ is really equivalent to a single \ (the first \ is required as a Java escape sequence in string literals). It is then a special character in regular expressions which means "use the next character literally, don't interpret its special meaning".

in Java split takes a Regex expression.
Read about regex meta characters here. They are :
the backslash \, the caret ^, the dollar sign $, the period or dot .,
the vertical bar or pipe symbol |, the question mark ?, the asterisk
or star *, the plus sign +, the opening parenthesis (, the closing
parenthesis ), and the opening square bracket [, the opening curly
brace {1
For the most part you can escape these characters with a backslash. In Java you need to escape that backslash with a second backslash. So in order to escape the meta characters you need to use \\.
So in your example:
String[] ar = s.split("\\$") ;

split() method accepts a regular expression as its input. Whatever RegEx has issues with, split() will have issues with that.
Here are the docs for Regex tutorial: http://docs.oracle.com/javase/tutorial/essential/regex/

The Pattern class lists the use of Regular Expression in Java. Any character that you find there has to be escaped if used in regular expression syntax.
If you want to treat that character as regular character you need to escape it.

We can use "\"(double slash) as a prefix and we can split the string..
String s = "100$ali$Rezaie" ;
String[] ar = s.split("\\$") ;
for (String str : ar) {
System.out.println(str);
}

Related

why an empty array gets printed out when trying to split a string using split(".") [duplicate]

This question already has answers here:
How can I use "." as the delimiter with String.split() in java [duplicate]
(8 answers)
Closed 6 years ago.
I'm experimenting with String class's instance method split(). However, when I use . as split() method's argument and trying to print it out using Arrays.toString(s2), only an empty array gets printed out.
Why is this happening?
String s1 = "hello.world";
String[] s2 = s1.split(".");
System.out.println(Arrays.toString(s2));
This " weird" behavior is because you are not scaping the dot in the regex... i.e you need to do s1.split("\\."); and not s1.split(".");
Example:
public static void main(String[] args) {
String s1 = "hello.world";
String[] s2 = s1.split("\\.");
System.out.println(Arrays.toString(s2));
}
just for the learning process the dot in the regex belongs to the known metacharacters and every time you try to use regexwith them they must be scaped...
take a look at this tutorial for more info...
there are 12 characters with special meanings: the backslash \, the
caret ^, the dollar sign $, the period or dot ., the vertical bar or
pipe symbol |, the question mark ?, the asterisk or star *, the plus
sign +, the opening parenthesis (, the closing parenthesis ), and the
opening square bracket [, the opening curly brace {, These special
characters are often called "metacharacters".
As per the API, the split method takes a regular expression as argument. It so happens that in regular expression syntax, the period means any character, thus you are splitting on every character, which is why you are getting the empty array.
To fix this, you will need to escape the period like so: .split("\\."). The extra \ instructs the regex engine to treat the period as an actual character, devoid of any special meaning.
You can escape special characters using these methods
1.String test = "abc.xyz";
String[] output1 = test.split("\.");
2.String test = "abc.xyz";
String[] output2 = test.split(Pattern.quote("."));
You can refer to
split a string
. is a special char in regex and split accepts regex as param, you should escape it.
Try with,
String[] s2 = s1.split("\\.");
There are 12 characters with special meanings: the backslash \, the caret ^, the dollar sign $, the period or dot ., the vertical bar or pipe symbol |, the question mark ?, the asterisk or star *, the plus sign +, the opening parenthesis (, the closing parenthesis ), and the opening square bracket [, the opening curly brace {, These special characters are often called "metacharacters".
If you want to use any of these characters as a literal in a regex, you need to escape them with a backslash. In literal Java strings the backslash is an escape character. The literal string "\\" is a single backslash.

RegEx special char "|" escaping in Java

I am trying to split a string like: abc|aa||
When I use the regular string.split I am required to provide a regular expression.
I tried to do the following :
string.split("|")
string.split("\|")
string.split("/|")
string.split("\Q|\E")
Non of them work.....
Does anyone know how to make it work?
I don't know how you tried, but
public static void main(String[] args) {
String a= "abc|aa||";
String split = Pattern.quote("|");
System.out.println(split);
System.out.println(Arrays.toString(a.split(split)));
}
prints out
\Q|\E
[abc, aa]
effectively splitting on |. The \Q ... \E is a regex quote. Anything inside it will be matched as a literal pattern.
string.split("\|"); // won't work because \| is not a valid escape sequence
string.split("/|"); // will compile, but split on / and empty space, so between each character
string.split("|"); // will compile, but split on empty space, so between each character
// true alternative to quoted solution above
string.split("\\|") // escape the second \ which will resolve as an escaped | in the regex pattern
using a double backslash is required because the backslash is also a special character. So you need to escape the escape character. i.e. \
\|
| is a special character hence you need to escape it using slashes. Try using
string.split("\\|")
| is a special character for the regular expression, thus it must be escaped e.g. \|
The backslash \ is a special character in Java, thus it must also be escaped
As a result, must do the following to achieve the desired effect.
string.split("\\|")
All of the following patterns split it all right: "\\Q|\\E" "\\|" "[|]" of course the latter two are preferrable

Java split on ^ (caret?) not working, is this a special character?

In Java, I am trying to split on the ^ character, but it is failing to recognize it. Escaping \^ throws code error.
Is this a special character or do I need to do something else to get it to recognize it?
String splitChr = "^";
String[] fmgStrng = aryToSplit.split(splitChr);
The ^ is a special character in Java regex - it means "match the beginning" of an input.
You will need to escape it with "\\^". The double slash is needed to escape the \, otherwise Java's compiler will think you're attempting to use a special \^ sequence in a string, similar to \n for newlines.
\^ is not a special escape sequence though, so you will get compiler errors.
In short, use "\\^".
The ^ matches the start of string. You need to escape it, but in this case you need to escape it so that the regular expression parser understands which means escaping the escape, so:
String splitChr = "\\^";
...
should get you what you want.
String.split() accepts a regex. The ^ sign is a special symbol denoting the beginning of the input sequence. You need to escape it to make it work. You were right trying to escape it with \ but it's a special character to escape things in Java strings so you need to escape the escape character with another \. It will give you:
\\^
use "\\^". Use this example as a guide:
String aryToSplit = "word1^word2";
String splitChr = "\\^";
String[] fmgStrng = aryToSplit.split(splitChr);
System.out.println(fmgStrng[0]+","+fmgStrng[1]);
It should print "word1,word2", effectively splitting the string using "\\^". The first slash is used to escape the second slash. If there were no double slash, Java would think ^ was an escape character, like the newline "\n"
None of the above answers makes no sense. Here is the right explanation.
As we all know, ^ doesn't need to be escaped in Java String.
As ^ is special charectar in RegulalExpression , it expects you to pass in \^
How do we make string \^ in java? Like this String splitstr = "\\^";

Regular Expression for matching parentheses

What is the regular expression for matching '(' in a string?
Following is the scenario :
I have a string
str = "abc(efg)";
I want to split the string at '(' using regular expression.For that i am using
Arrays.asList(Pattern.compile("/(").split(str))
But i am getting the following exception.
java.util.regex.PatternSyntaxException: Unclosed group near index 2
/(
Escaping '(' doesn't seems to work.
Two options:
Firstly, you can escape it using a backslash -- \(
Alternatively, since it's a single character, you can put it in a character class, where it doesn't need to be escaped -- [(]
The solution consists in a regex pattern matching open and closing parenthesis
String str = "Your(String)";
// parameter inside split method is the pattern that matches opened and closed parenthesis,
// that means all characters inside "[ ]" escaping parenthesis with "\\" -> "[\\(\\)]"
String[] parts = str.split("[\\(\\)]");
for (String part : parts) {
// I print first "Your", in the second round trip "String"
System.out.println(part);
}
Writing in Java 8's style, this can be solved in this way:
Arrays.asList("Your(String)".split("[\\(\\)]"))
.forEach(System.out::println);
I hope it is clear.
You can escape any meta-character by using a backslash, so you can match ( with the pattern
\(.
Many languages come with a build-in escaping function, for example, .Net's Regex.Escape or Java's Pattern.quote
Some flavors support \Q and \E, with literal text between them.
Some flavors (VIM, for example) match ( literally, and require \( for capturing groups.
See also: Regular Expression Basic Syntax Reference
For any special characters you should use '\'.
So, for matching parentheses - /\(/
Because ( is special in regex, you should escape it \( when matching. However, depending on what language you are using, you can easily match ( with string methods like index() or other methods that enable you to find at what position the ( is in. Sometimes, there's no need to use regex.

Java regular expression

I want to replace any one of these chars:
% \ , [ ] # & # ! ^
... with empty string ("").
I used this code:
String line = "[ybi-173]";
Pattern cleanPattern = Pattern.compile("%|\\|,|[|]|#|&|#|!|^");
Matcher matcher = cleanPattern.matcher(line);
line = matcher.replaceAll("");
But it doesn't work.
What do I miss in this regular expression?
Some of the characters are special characters that are being interpreted differently. You can either escape them all with backslashes, or better yet put them in a character class (no need to escape the non-CC characters, eases readability):
Pattern cleanPattern = Pattern.compile("[%\\\\,\\[\\]#&#!^]");
There are several reasons why your solution doesn't work.
Several of the characters you wish to match have special meanings in regular expressions, including ^, [, and ]. These must be escaped with a \ character, but, to make matters worse, the \ itself must be escaped so that the Java compiler will pass the \ through to the regular expression constructor. So, to sum up step one, if you wish to match a ] character, the Java string must look like "\\]".
But, furthermore, this is a case for character classes [], rather than the alternation operator |. If you want to match "any of the characters a, b, c, that looks like [abc]. You character class would be [%\,[]#&#!^], but, because of the Java string escaping rules and the special meaning of certain characters, your regex will be [%\\\\,\\[\\]#&#!\\^].
You'd define your pattern as a character group enclosed in [ and ] and escape special chars, e.g.
String n = "%\\,[]#&#!^".replaceAll("[%\\\\,\\[\\]#&#!^]", "");

Categories