Trying to extract strings that are wrapped in double brackets. For example [[this is one token]] that should be matched. To make things more elegant, there should be an escape sequence so that double bracketed items like \[[this escaped token\]] don't get matched.
The pattern [^\\\\]([\\[]{2}.+[^\\\\][\\]]{2}) with "group 1" to extract the token is close, but there are situations where it doesn't work. The problem seems to be that the first "not" statement is being evaluated as "anything except a backslash". The problem is, "anything" is not including "nothing". So, what would make this pattern match "nothing or any character other than a backslash"?
Here is a unit test to show the desired behavior:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import junit.framework.TestCase;
public class RegexSpike extends TestCase {
private String regex;
private Pattern pattern;
private Matcher matcher;
#Override
protected void setUp() throws Exception {
super.setUp();
regex = "[^\\\\]([\\[]{2}.+[^\\\\][\\]]{2})";
pattern = Pattern.compile(regex);
}
private String runRegex(String testString) {
matcher = pattern.matcher(testString);
return matcher.find() ? matcher.group(1) : "NOT FOUND";
}
public void testBeginsWithTag_Passes() {
assertEquals("[[should work]]", runRegex("[[should work]]"));
}
public void testBeginsWithSpaces_Passes() {
assertEquals("[[should work]]", runRegex(" [[should work]]"));
}
public void testBeginsWithChars_Passes() {
assertEquals("[[should work]]", runRegex("anything here[[should
work]]"));
}
public void testEndsWithChars_Passes() {
assertEquals("[[should work]]", runRegex("[[should
work]]with anything here"));
}
public void testBeginsAndEndsWithChars_Passes() {
assertEquals("[[should work]]", runRegex("anything here[[should
work]]and anything here"));
}
public void testFirstBracketsEscaped_Fails() {
assertEquals("NOT FOUND", runRegex("\\[[should NOT work]]"));
}
public void testSingleBrackets_Fails() {
assertEquals("NOT FOUND", runRegex("[should NOT work]"));
}
public void testSecondBracketsEscaped_Fails() {
assertEquals("NOT FOUND", runRegex("[[should NOT work\\]]"));
}
}
You can simply use (^|[^\\]), which will either match the beginning of a string (provided you set the MULTILINE mode on your regex) or a single character that is not a backslash (including spaces, newlines, etc.).
You'll also want to replace .+ with .+?, because otherwise a string such as "[[one]] and [[two]]" will be seen as a single match, where "one]] and [[two" is considered to be between brackets.
A third point is that you do not have to wrap a single character (even escaped ones such as \[ or \]) in a character class with [].
So that would make the following regex (pardon me removing the double-escapedness for clarity):
(^|[^\\])(\[{2}.+?[^\\]\]{2})
(Also note that you cannot escape the escape character with your regex. Two slashes before a [ will not be parsed as a single (escaped) slash, but will indicate a single (unescaped) slash and an escaped bracket.)
You want a "zero-width negative lookbehind assertion", which is (?<!expr). Try:
(?<!\\\\)([\\[]{2}.+[^\\\\][\\]]{2})
Actually, this can be simplified and made more general by cutting out some of those unnecessary brackets, and adding a negative lookbehind for the closing bracket, too. (Your version also will fail if you have an escaped bracket in the middle of the string, like [[text\]]moretext]]).
(?<!\\\\)(\\[{2}.*?(?<!\\\\)\\]{2})
What should happen with this string? (Actual string content, not a Java literal.)
foo\\[[blah]]bar
What I'm asking is whether you're supporting escaped backslashes. If you are, the lookbehind won't work. Instead of looking for a single backslash, you would have to check for on odd but unknown number of them, and Java lookbehinds can't be open-ended like that. Also, what about escaped brackets inside a token--is this valid?
foo[[blah\]]]bar
In any case, I suggest you come at the backslash problem from the other direction: match any number of escaped characters (i.e. backslash plus anything) immediately preceding the first bracket as part of the token. Inside the token, match any number of characters other than square brackets or backslashes, or any number of escaped characters. Here's the actual regex:
(?<!\\)(?:\\.)*+\[\[((?:[^\[\]\\]++|\\.)*+)\]\]
...and here it is as a Java string literal:
"(?<!\\\\)(?:\\\\.)*+\\[\\[((?:[^\\[\\]\\\\]++|\\\\.)*+)\\]\\]"
Related
I would like to escape non-alphanumeric characters occurring in a string as follows:
Say, the original string is: "test_", I would like to transform as "test\_".
In order to do this, one approach I can take by scanning the original string, and constructing a new string and while a non-alphanumeric character is found, append a '\' in front of this character.
But I am wondering if there is any cleaner approach to do the same using regular expression.
You can use the replaceable parameter as shown below:
public class Main {
public static void main(String[] args) {
String s = "test_";
s = s.replaceAll("[^\\p{Alnum}]", "\\\\$0");
System.out.println(s);
}
}
Output:
test\_
Notes:
$0 represents the string matched by the complete regex pattern, [^\\p{Alnum}].
\p{Alnum} specifies alphanumeric character and ^ inside [] is used to negate the pattern. Learn more about patterns from the documentation.
Notice the extra pair of \\ which is to escape \ that has been used to escape \.
import java.util.*;
import java.lang.*;
import java.io.*;
class GFG
{
public static void main (String[] args)
{
int turns;
Scanner scan=new Scanner(System.in);
turns=scan.nextInt();
while(turns-->0)
{
String pattern=scan.next();
String text=scan.next();
System.out.println(regex(pattern,text));
}
}//end of main method
static int regex(String pattern,String text)
{
if(pattern.startsWith("^"))
{
if(text.startsWith(pattern.replace("^","")))
return 1;
}
else if(pattern.endsWith("$"))
{
if(text.endsWith(pattern.replace("$","")))
return 1;
}
else
{
if(text.contains(pattern))
return 1;
}
return 0;
}
}
Input:
2
or$
hodor
or$
arya
Output:
1
0
In this program i am scanning two parameters(String) in which first one is pattern and second one is text in which i have to find pattern. Method should return 1 if pattern matched else return 0.
While using replace it is working fine but when i replace replace() to replaceAll() it is not working properly as expected.
How can i make replaceAll() work in this program.
Because replaceAll expects a string defining a regular expression, and $ means "end of line" in regular expressions. From the link:
public String replaceAll(String regex,
String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
You need to escape it with a backslash (which also has to be escaped, in the string literal):
if(text.endsWith(pattern.replaceAll("\\$","")))
For complex strings that you want to replace verbatim, Pattern.quote is useful:
if(text.endsWith(pattern.replaceAll(Pattern.quote("$"),"")))
You don't need it here because your replacement is "", but if your replacement may have special characters in it (like backslashes or dollar signs), use Matcher.quoteReplacement on the replacement string as well.
$ is a scpecial character in regex (EOL). You have to escape it
pattern.replaceAll("\\$","")
Despite the similar name, these are two very different methods.
replace replaces substrings with other substrings (*).
replaceAll uses regular expression matching, and $ is a special control character there (meaning "end of string/line").
You should not be using replaceAll here, but if you must, you have to quote the $:
pattern.replaceAll(Pattern.quote("$"),"")
(*) to make things more confusing, replace also replaces all occurances, so the only difference in the method names does not all describe the difference in function.
Introducing another level of complexity by replacing $ by \$.
"$ABC$AB".replaceAll(Matcher.quoteReplacement("$"), Matcher.quoteReplacement("\\\\$"))
// Output - \\$ABC\\$AB
This worked for me.
For the issue reported here,
"$ABC$AB".replaceAll(Matcher.quoteReplacement("$"), "")
should work.
I am incorporating a pattern with has a backslash(\) with an escape sequence once.But that is not working at all.I am getting result as no match.
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestClassRegex {
private static final String VALIDATION = "^[0-9\\-]+$";
public static void main(String[] args) {
String line = "1234\56";
Pattern r = Pattern.compile(VALIDATION);
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("match");
}
else {
System.out.println("no match !!");
}
}
}
How can I write a pattern which can recognize backslash literally.
I have actually seen another post :
Java regular expression value.split("\\."), "the back slash dot" divides by character?
which doesn't answer my question completely.Hence needs some heads up here.
"1234\56" will not produce "123456" but instead "1234."
Why?
The \ in a String is used to refer to the octal value of a character in the ASCII table. Here, you're calling \056 which is the character number 46 in the ASCII table and is represented by .
That's exactly the reason why you're not getting a match here.
Solution
You should first of all change your regex to ^[0-9\\\\-]+$ because in Java you need to escape the \ in a String. Even if your initial RegEx does not do it.
Your input needs to look like 1234\\56 for the same reason as above.
I am following the suggestions on the page, check if string ends with certain pattern
I am trying to display a string that is
Starts with anything
Has the letters ".mp4" in it
Ends explicitly with ', (apostrophe followed by comma)
Here is my Java code:
import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.*;
class Ideone
{
public static void main (String[] args) throws java.lang.Exception
{
// your code goes here
String str = " _file='ANyTypEofSTR1ngHere_133444556_266545797_10798866.mp4',";
Pattern p = Pattern.compile(".*.mp4[',]$");
Matcher m = p.matcher(str);
if(m.find())
System.out.println("yes");
else
System.out.println("no");
}
}
It prints "no". How should I declare my RegEx?
There are several issues in your regex:
"Has the letters .mp4 in it" means somewhere, not necessarily just in front of ',, so another .* should be inserted.
. matches any character. Use \. to match .
[,'] is a character group, i.e. exactly one of the characters in the brackets has to occur.
You can use the following regex instead:
Pattern p = Pattern.compile(".*\\.mp4.*',$");
Your character set [',] is checking whether the string ends with ' or , a single time.
If you want to match those character one or more times, use [',]+. However, you probably don't want to use a character set in this case since you said order is important.
To match an apostrophe followed by comma, just use:
.*\\.mp4',$
Also, since . has special meaning, you need to escape it in '.mp4'.
I am trying to create a hexadecimal calculator but I have a problem with the regex.
Basically, I want the string to only accept 0-9, A-E, and special characters +-*_
My code keeps returning false no matter how I change the regex, and the adding the asterisk is giving me a PatternSyntaxException error.
public static void main(String[] args) {
String input = "1A_16+2B_16-3C_16*4D_16";
String regex = "[0-9A-E+-_]";
System.out.println(input.matches(regex));
}
Also whenever I add the * as part of the regex it gives me this error:
Exception in thread "main" java.util.regex.PatternSyntaxException: Illegal character range near index 9
[0-9A-E+-*_]+
^
You need to match more than one character with your regex. As it currently stands you only match one character.
To match one or more characters add a + to the end of the regex
[0-9A-E+-_]+
Also to match a * just add a star in the brackets so the final regex would be
[0-9A-E+\\-_*]+
You need to escape the - otherwise the regex thinks you want to accept all character between + and _ which is not what you want.
You regex is OK there should be no exceptions, just add + at the end of regex which means one or more characters like those in brackets, and it seems you wanted * as well
"[0-9A-E+-_]+"
public static boolean isValidCode (String code) {
Pattern p = Pattern.compile("[fFtTvV\\-~^<>()]+"); //a-zA-Z
Matcher m = p.matcher(code);
return m.matches();
}