How to pattern match [ and ] in Java? - java

The string is something in the format:
[anything anything]
with a space separating the two, 'anything's.
I've tried:
(string).replaceAll("(^[)|(]$)","");
(string).replaceAll("(^\[)|(\]$)","");
but the latter gives me a compilation error and the first doesn't do anything. I implemented my current solution based on:
Java Regex to remove start/end single quotes but leave inside quotes
Looking around SO yields me many questions that answer problems similar to mine but implementing their solutions do not work (they either do nothing, or yield compilation errors):
regex - match brackets but exclude them from results
Regular Expressions on Punctuation
What am I doing wrong?

Since both Java and regex treats the \ character as an escape character, you actually have to double them when using in a Java literal string.
So the regular expression:
(^\[)|(\]$)
in a Java string actually should be:
"(^\\[)|(\\]$)"

Related

Unclosed character class error while trying to make calculator in java

This regex should divide a string like 2+2 into the two operand groups and the operator, but I'm getting unclosed character class error at index 41
"^(\\d+)?\\s*([+]|[-]|[*]|[/]|[^])?\\s*(\\d+)\\$"
Your problem is this:
[^]
The ^ character is a meta-character inside [ ... ], and needs to be escaped if you want to match a literal "caret"
Also:
you probably shouldn't be escaping the $ at the end.
if you use Matcher.match then the initial ^ and final $ are unnecessary
[+]|[-]|[*]|[/]|[^] is equivalent1 to [+\\-*/\\^].
Finally, I would recommend NOT using regexes for parsing expressions. Once you start trying to support expressions with 2 or more operators, precedence, brackets, and so on the complexity of the regexes gets out of hand.
A better idea is to tokenize, and then feed the tokens into a simple (grammar based) parser. You can write one by hand, or use a parser generator. Or look for one that someone has written already. (Google for "expression parser java" or some such.)
1 - I suspect that the escaping of ^ at that position might be redundant. Unfortunately, the javadocs for Pattern are not completely clear on when it is necessary to escape ^ and - inside [ ... ].

Using regex to only match those Strings which use escape character correctly (according to Java syntax)?

take these strings for example:
"hello world\n" (correct - regex should match this)
"I'm happy \ here" (this is incorrect as the escape character is not
used correctly - regex should not match this one)
I've tried searching on google but didn't find anything helpful.
I want this one to be used in a parser which only parses string literals from a java code file.
Here is the the regex I used:
"\\\"(\\[tbnrf\'\"\\])*[a-zA-Z0-9\\`\\~\\!\\#\\#\\$\\%\\^\\&\\*\\(\\)\\_\\-\\+\\=\\|\\{\\[\\}\\]\\;\\:\\'\\/\\?\\>\\.\\<\\,]\\\""
what am I doing wrong?
I guess you gave us the regex in Java String literal form, like
String regex = \"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\";
Unpacking that from Java's String escaping syntax gives the raw regex:
\"(\[tbnrf'"\])*[a-zA-Z0-9\`\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]\"
That consists of:
\" matching a double-quote character (Java String literal begins here). Escaping the double quotes with backslash isn't necessary: " on its own is ok as well.
(\[tbnrf'"\])*: a group, repeated 0...n times. I guess you want that to match against the various Java backslash escapes, but that should read (\\[tbnrf'"\\])* with a double backslash in front and inside the character class. And maybe you want to cover the Java octal escapes as well (see the language specification), giving (\\[tbnrf01234567'"\\])*
[a-zA-Z0-9\``\~\!\#\#\$\%\^\&\*\(\)\_\-\+\=\|\{\[\}\]\;\:\'\/\?\>\.\<\,]: a character class matching one character from a selected list of alphabetic and punctuation characters. I'd replace that with [^"\\], meaning anything but double quote or backslash.
\" matching a double-quote character (string literal ends here). Once again, no need to escape the double quote.
Besides the individual elements, the overall structure of the regex probably isn't what you want: You allow only strings beginning with any number of backslash escapes, followed by exactly one non-escape character, and this enclosed in a pair of double quotes.
The overall structure should instead be "(backslash_escape|simple_character)*"
So, the complete regex would be:
"(\\[tbnrf01234567'"\\]|[^"\\])*"
or, expressed in a Java literal:
String regex = "\"(\\\\[tbnrf01234567'\"\\\\]|[^\"\\\\])*\"";
And, although this is shorter than your original attempt, I'd still not call it readable and opt for a different implementation, not using regular expressions.
P.S. Although I did some testing with my regex, I'm not at all sure that it covers all relevant cases correctly.
P.P.S. There are the \uxxxx escapes, not yet covered by the regex.

Regex to allow only Numbers, alphabets, spaces and hyphens - Java

Need to allow user to enter only Numbers or alphabets or spaces or hyphens OR combination of any of the above.
and i tried the following
String regex = "/^[0-9A-Za-z\s\-]+$/";
sampleString.matches(regex);
but it is not working properly. would somebody help me to fix please.
Issue : your regex is trying to match / symbol at the beginning and at the end
In java there is no need of / before and after regex so use, java!=javascript
"^[0-9A-Za-z\\s-]+$"
^[0-9A-Za-z\\s-]+$ : ^ beginning of match
[0-9A-Za-z\\s-]+ : one or more alphabets, numbers , spaces and -
$ : end of match
You are close but need to make two changes.
The first is to double-escape (i.e., use \\ instead of \). This is due to the weirdness of Java (see the section "Backslashes, escapes, and quoting" in Javadoc for the Pattern class). The second thing is to drop the explicit reference to the start and end of the string. That's going to be implied when using matches(). So the correct Java code is
String regex = "[0-9A-Za-z\\s\\-]+";
sampleString.matches(regex);
While that will work, you can also replace the "0-9" reference with \d and drop the escaping of the "-". That gives you
String regex = "[\\dA-Za-z\\s-]+";

Regular expression to return results that do not match selection

I work on a product that provides a Java API to extend it.
The API provides a function which
takes a Perl regular expression and
returns a list of matching files.
I want to filter the list to remove all files that end in .xml, .xsl and .cfg; basically the opposite of .*(\.xml|\.xsl|\.cfg).
I have been searching but I haven't been able to get anything to work yet.
I tried .*(?!\.cfg) and ^((?!cfg).)*$ and \.(?!cfg$|?!xml$|?!xsl$).
I don't know if I am on the right track or not.
Note
I know the regex systems are similar, but I can't get a Java regex working either.
You may use
^(?!.*\.(x[ms]l|cfg)$).+
See the regex demo
Details:
^ - start of a string
(?!.*\.(x[ms]l|cfg)$) - a negative lookahead that fails the match if any 0+ chars other than line break chars (.*) are followed with xml, xsl or cfg ((x[ms]l|cfg)) at the end of the string ($)
.+ - any 1 or more chars other than linebreak chars. Might be omitted if the entire string match is not required (in some tools it is required though).
You need something like this, which matches only if the end of the string isn't preceded by a dot and one of the three unwanted types
/(?<!\.(?:xml|xsl|cfg))\z/

Match Lua multiline strings and comments with Regex

I have a Lua editor in which I implemented syntax highlighting. I use regexes to match expressions like strings, comments, tokens, numbers, etc of Lua. The whole thing is made in Java and uses Java regexes. I had trouble with two things:
Multiline strings - Lua multiline brackets start and end with double square brackets [[ Everything between is the string, there can even be nested multiline strings. You can see what I made here, the regex is \[\[((?>[^\[\[\]\]]|(?R))*\]\]) and it works. It's similar to what you can see on this page under the match balanced constructs section. It finds expressions with equal amounts of [[ and ]] The thing is, recursion is not supported by Java regex engine. How can I replace it with something supported?
Multiline comments - Lua multiline comments start with --[====[ and end with ]====]. It ends only if there is as much equal signs as the opening bracket. There can be anywhere between 0 and infinite equal signs. I made this regex --\[\[((.|\n)*?)\]\] but it only works for the --[[ comment ]] pattern and do not support this --[==[ comment ]==]. Maybe I could do something like counting number of matches of equal signs at the opening then match the same the number for the closing tag. Is this possible in java regex? How?
Try this
--\[(=*)\[(.|\n)*?\]\1\]
Multiline string literals are absolutely the same but without leading --:
\[((=*)\[(.|\n)*?)\]\2\]

Categories