Scala Pattern Syntax Exception - java

I'm trying to split a string in by the characters "}{". However I am getting an error:
> val string = "{one}{two}".split("}{")
java.util.regex.PatternSyntaxException: Illegal repetition near index 0
}{
^
I am not trying to use regex or anything. I tried using "\}\{" and it also doesn't work.

Well... the reason is that split treats its parameter string as a regular expression.
Now, both { and } are special character in regular expressions.
So you will have to skip the special characters of regex world for split's argument, like this,
val string = "{one}{two}".split("\\}\\{")
// string: Array[String] = Array({one, two})

Escape the {
val string = "{one}{two}".split("}\\{")

There are two ways to force a metacharacter to be treated as an ordinary character:
-> precede the metacharacter with a backslash.
String[] ss1 = "{one}{two}".split("[}\\{]+");
System.out.println(Arrays.toString(ss1));
output:
[one, two]
-> enclose it within \Q (which starts the quote) and \E (which ends it).
When using this technique, the \Q and \E can be placed at any location within the expression, provided that the \Q comes first.
String[] ss2 = "{one}{two}".split("[}\\Q{\\E]+");
System.out.println(Arrays.toString(ss2));
output:
[one, two]

Related

Regex split by ":{ in java?

basically I have:
String str = "Stream: {"stream":null,"_links":{"self":"https://api.twitch.tv/kraken/streams/tfue","channel":"https://api.twitch.tv/kraken/channels/tfue"}}";
I want to split the str by ":{
but when I do:
String[] BuftoStringparts = BuftoString.split("\":{");
I get below exception:
java.util.regex.PatternSyntaxException: Illegal repetition near index 1
":{
^
All replies are much appreciated :)
The main reason this happens:
java.util.regex.PatternSyntaxException: Illegal repetition near index 1 ":{ ^
It's because they are special characters in Java regular expressions so you need to use it escaped for the regex, so by following way:
String[] BuftoStringparts = BuftoString.split("\":\\{");
First of all you need to escape " in your JSON String, so the resulting String will be:
String str = "Stream: {\"stream\":null,\"_links\":{\"self\":\"https://api.twitch.tv/kraken/streams/tfue\",\"channel\":\"https://api.twitch.tv/kraken/channels/tfue\"}}";
Now as mentioned by others, you also need to escape regex special characters in your regex.
You can try your split by following regex:
String[] BuftoStringparts = BuftoString.split("\":\\{");

Java regex matches but String.replaceAll() doesn't replace matching substrings

public class test {
public static void main(String[]args) {
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("^&\\S*;$");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
}
}
This gives me following Output:
true
Nørrebro, Denmark
How is that possible ? Why does replaceAll() not register a match?
Your regex includes ^. Which makes the regex match from the very start.
If you try
test1.matches(regex)
you will get false.
You need to understand what ^ and $ means.
You probably put them in there because you want to say:
At the start of each match, I want a &, then 0 or more non-whitespace characters, then a ; at the end of the match.
However, ^ and $ doesn't mean the start and end of each match. It means the start and end of the string.
So you should remove the ^ and $ from your regex:
String regex = "&\\S*;";
Now it outputs:
true
Nrrebro, Denmark
"What character specifies the start and end of the match then?" you might ask. Well, since your regex basically the pattern you are matching, the start of the regex is the start of the match (unless you have lookbehinds)!
It is possible because ^&\S*;$ pattern matches the entire ø string but it does not match entire Nørrebro, Denmark string. The ^ matches (requires here) start of string to be right before & and $ requires the ; to appear right at the end of the string.
Just removing the ^ and $ anchors may not work, because \S* is a greedy pattern, and it may overmatch, e.g. in Nørrebro;.
You may use &\w+; or &\S+?; pattern, e.g.:
String test1 = "Nørrebro, Denmark";
String regex = "&\\w+;";
String value = test1.replaceAll(regex,"");
System.out.println(value); // => Nrrebro, Denmark
See the Java demo.
The &\w+; pattern matches a &, then any 1+ word chars, and then ;, anywhere inside the string. \S*? matches any 0+ chars other than whitespace.
You can use this regex : &(.*?);
String test1 = "Nørrebro, Denmark";
String test2 = "ø";
String regex = new String("&(.*?);");
String value = test1.replaceAll(regex,"");
System.out.println(test2.matches(regex));
System.out.println(value);
output :
true
Nrrebro, Denmark

Splitting strings delimited by [[ ]] in java?

I have the input string of the following form "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]" and I need to extract the tokens "Animal rights" , "Anthropocentrism" and so on etc.
I tried using the split method in the String library but I am not able to find the appropriate regular expression to get the tokens, it would be great if someone could help.
I am basically trying to parse the internal links in a Wikipedia XML file you can check out the format here.
You probably shouldn't be using split() here but instead a Matcher:
String input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile("\\[\\[(.*?)\\]\\]").matcher(input);
while (m.find()) {
System.out.println(m.group(1));
}
Animal rights
Anthropocentrism
Anthropology
A pattern like this should work:
\[\[(.*?)\]\]
This will match a literal [[ followed by zero or more of any character, non-greedily, captured in group 1, followed by a literal ]].
Don't forget to escape the \ in the Java string literal:
Pattern.compile("\\[\\[(.*)?\\]\\]");
It's pretty easy with regex.
\[\[(.+?)\]\]
Edit live on Debuggex
I recommend doing a .+ to make sure there is something actually in the brackets and you won't get a null if something doesn't exist when you're trying to put it in your array.
string output = new string [10];
string pattern = "\[\[(.+?)\]\]";
string input = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
Matcher m = Pattern.compile(pattern).matcher(input);
int increment= 0;
while (m.find()) {
output[increment] = m.group(1);
increment++;
}
Since you said you wanted to learn regex also i'll break it down.
\[ 2x is finding [ brackets you need a \ because it's regex's special characters
. can denote every character except newlines
+ means one or more of that character
? Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item.
\] is capturing the ]
Try the next:
String str = "[[Animal rights]] [[Anthropocentrism]] [[Anthropology]]";
str = str.replaceAll("(^\\[\\[|\\]\\]$)", "");
String[] array = str.split("\\]\\] \\[\\[");
System.out.println(Arrays.toString(array));
// prints "[Animal rights, Anthropocentrism, Anthropology]"

Escaping double-slashes with regular expressions in Java

I have this unit test:
public void testDeEscapeResponse() {
final String[] inputs = new String[] {"peque\\\\u0f1o", "peque\\u0f1o"};
final String[] expected = new String[] {"peque\\u0f1o", "peque\\u0f1o"};
for (int i = 0; i < inputs.length; i++) {
final String input = inputs[i];
final String actual = QTIResultParser.deEscapeResponse(input);
Assert.assertEquals(
"deEscapeResponse did not work correctly", expected[i], actual);
}
}
I have this method:
static String deEscapeResponse(String str) {
return str.replaceAll("\\\\", "\\");
}
The unit test is failing with this error:
java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at java.lang.String.charAt(String.java:686)
at java.util.regex.Matcher.appendReplacement(Matcher.java:703)
at java.util.regex.Matcher.replaceAll(Matcher.java:813)
at java.lang.String.replaceAll(String.java:2189)
at com.acme.MyClass.deEscapeResponse
at com.acme.MyClassTest.testDeEscapeResponse
Why?
Use String.replace which does a literal replacement instead of String.replaceAll which uses regular expressions.
Example:
"peque\\\\u0f1o".replace("\\\\", "\\") // gives peque\u0f1o
String.replaceAll takes a regular expression thus \\\\ is interpreted as the expression \\ which in turn matches a single \. (The replacement string also has special treatment for \ so there's an error there too.)
To make String.replaceAll work as you expect here, you would need to do
"peque\\\\u0f1o".replaceAll("\\\\\\\\", "\\\\")
I think the problem is that you're using replaceAll() instead of replace(). replaceAll expects a regular expression in the first field and you're just trying to string match.
See javadoc for Matcher:
Note that backslashes (\) and dollar
signs ($) in the replacement string
may cause the results to be different
than if it were being treated as a
literal replacement string. Dollar
signs may be treated as references to
captured subsequences as described
above, and backslashes are used to
escape literal characters in the
replacement string.
Thus with replaceAll you cannot replace anything with a backslash. Thus a really crazy workaround for your case would be str.replaceAll("\\\\(\\\\)", "$1")

Splitting a string in Java on ";", but not on "\\;"

In Java I try try to use the String.split() method splitting a string on ";", but not on "\\\\;". (2 back-slashes followed by a semi-colon)
Ex: "aa;bb;cc\\;dd;ee\\;;ff" should be split into;
aa
bb
cc\\;dd
ee\\;
ff
How do I accomplish this using a regular expression?
Markus
Use
"aa;bb;cc\\;dd;ee\\;;ff".split("(?<!\\\\);");
(?<!...) is called a "zero-width lookbehind". In English, you're splitting on all ; characters that are NOT preceded by a double slash, without actually matching the double slash. The quadruple slash is to escape backslashes to the regex parser. The actual regular expression used in the split would then read:
(?<!\\);
This is called negative lookbehind and the syntax is like (?<!a)b. This matches on any b that isnt precended by an a. You would want something like:
(?<!\\\\);
Here a code example with . as separator:
String p = "hello.regex\\.brain\\.twister";
System.out.println( p );
for (String s : p.split( "(?<!\\\\)\\.", -1 )) {
System.out.println( "-> "+ s );
}
Will Ouptut:
hello.regex\.brain\.twister
-> hello
-> regex\.brain.\twister

Categories