Regular Expression for ")" matching parentheses - java

Every smiling face must have a smiling mouth that should be marked with either ) or D.
I tried to do this using the following code:
import java.util.*;
import java.util.regex.Pattern;
public class SmileFaces {
public static int countSmileys(List<String> arr) {
String regx = "/^((:|;)(-|~)?|D|//))$/";
int count=0;
ListIterator<String> itr=arr.listIterator();
while(itr.hasNext()){
if(Pattern.matches(regx,itr.next())){
count++;
}
}
return count;
}
}
I have tried this regex for smiling checking: /^((:|;)(-|~)?|D|//))$/

You could just patch your current regex by correctly escaping \\) with two backslashes, but I think character classes are easier to read here:
String regx = "^[;:][~-]?[D)]$";
Note that Java regex patterns do not take delimiters as they would in another language such as PHP or Python, so I removed them from your pattern. Also, if you wanted to use the above pattern with certain methods, such as String#matches, you could remove the ^ and $ anchors.

Related

Why replaceAll("$","") is not working although replace("$","") works just fine?

import java.util.*;
import java.lang.*;
import java.io.*;
class GFG
{
public static void main (String[] args)
{
int turns;
Scanner scan=new Scanner(System.in);
turns=scan.nextInt();
while(turns-->0)
{
String pattern=scan.next();
String text=scan.next();
System.out.println(regex(pattern,text));
}
}//end of main method
static int regex(String pattern,String text)
{
if(pattern.startsWith("^"))
{
if(text.startsWith(pattern.replace("^","")))
return 1;
}
else if(pattern.endsWith("$"))
{
if(text.endsWith(pattern.replace("$","")))
return 1;
}
else
{
if(text.contains(pattern))
return 1;
}
return 0;
}
}
Input:
2
or$
hodor
or$
arya
Output:
1
0
In this program i am scanning two parameters(String) in which first one is pattern and second one is text in which i have to find pattern. Method should return 1 if pattern matched else return 0.
While using replace it is working fine but when i replace replace() to replaceAll() it is not working properly as expected.
How can i make replaceAll() work in this program.
Because replaceAll expects a string defining a regular expression, and $ means "end of line" in regular expressions. From the link:
public String replaceAll(String regex,
String replacement)
Replaces each substring of this string that matches the given regular expression with the given replacement.
You need to escape it with a backslash (which also has to be escaped, in the string literal):
if(text.endsWith(pattern.replaceAll("\\$","")))
For complex strings that you want to replace verbatim, Pattern.quote is useful:
if(text.endsWith(pattern.replaceAll(Pattern.quote("$"),"")))
You don't need it here because your replacement is "", but if your replacement may have special characters in it (like backslashes or dollar signs), use Matcher.quoteReplacement on the replacement string as well.
$ is a scpecial character in regex (EOL). You have to escape it
pattern.replaceAll("\\$","")
Despite the similar name, these are two very different methods.
replace replaces substrings with other substrings (*).
replaceAll uses regular expression matching, and $ is a special control character there (meaning "end of string/line").
You should not be using replaceAll here, but if you must, you have to quote the $:
pattern.replaceAll(Pattern.quote("$"),"")
(*) to make things more confusing, replace also replaces all occurances, so the only difference in the method names does not all describe the difference in function.
Introducing another level of complexity by replacing $ by \$.
"$ABC$AB".replaceAll(Matcher.quoteReplacement("$"), Matcher.quoteReplacement("\\\\$"))
// Output - \\$ABC\\$AB
This worked for me.
For the issue reported here,
"$ABC$AB".replaceAll(Matcher.quoteReplacement("$"), "")
should work.

Java regex only bashslash(\\) not working

I am incorporating a pattern with has a backslash(\) with an escape sequence once.But that is not working at all.I am getting result as no match.
package com.test;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestClassRegex {
private static final String VALIDATION = "^[0-9\\-]+$";
public static void main(String[] args) {
String line = "1234\56";
Pattern r = Pattern.compile(VALIDATION);
Matcher m = r.matcher(line);
if (m.matches()) {
System.out.println("match");
}
else {
System.out.println("no match !!");
}
}
}
How can I write a pattern which can recognize backslash literally.
I have actually seen another post :
Java regular expression value.split("\\."), "the back slash dot" divides by character?
which doesn't answer my question completely.Hence needs some heads up here.
"1234\56" will not produce "123456" but instead "1234."
Why?
The \ in a String is used to refer to the octal value of a character in the ASCII table. Here, you're calling \056 which is the character number 46 in the ASCII table and is represented by .
That's exactly the reason why you're not getting a match here.
Solution
You should first of all change your regex to ^[0-9\\\\-]+$ because in Java you need to escape the \ in a String. Even if your initial RegEx does not do it.
Your input needs to look like 1234\\56 for the same reason as above.

regex last word in a sentence ending with punctuation (period)

I'm looking for the regex pattern, not the Java code, to match the last word in an English (or European language) sentence. If the last word is, in this case, "hi" then I want to match "hi" and not "hi."
The regex (\w+)\.$ will match "hi.", whereas the output should be just "hi". What's the correct regex?
thufir#dur:~/NetBeansProjects/regex$
thufir#dur:~/NetBeansProjects/regex$ java -jar dist/regex.jar
trying
a b cd efg hi
matches:
hi
trying
a b cd efg hi.
matches:
thufir#dur:~/NetBeansProjects/regex$
code:
package regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String matchesLastWordFine = "a b cd efg hi";
lastWord(matchesLastWordFine);
String noMatchFound = matchesLastWordFine + ".";
lastWord(noMatchFound);
}
private static void lastWord(String sentence) {
System.out.println("\n\ntrying\n" + sentence + "\nmatches:");
Pattern pattern = Pattern.compile("(\\w+)$");
Matcher matcher = pattern.matcher(sentence);
String match = null;
while (matcher.find()) {
match = matcher.group();
System.out.println(match);
}
}
}
My code is in Java, but that's neither here nor there. I'm strictly looking for the regex, not the Java code. (Yes, I know it's possible to strip out the last character with Java.)
What regex should I put in the pattern?
You can use lookahead asserion. For example to match sentence without period:
[\w\s]+(?=\.)
and
[\w]+(?=\.)
For just last word (word before ".")
If you need to have the whole match be the last word you can use lookahead.
\w+(?=(\.))
This matches a set of word characters that are followed by a period, without matching the period.
If you want the last word in the line, regardless of wether the line ends on the end of a sentence or not you can use:
\w+(?=(\.?$))
Or if you want to also include ,!;: etc then
\w+(?=(\p{Punct}?$))
You can use matcher.group(1) to get the content of the first capturing group ((\w+) in your case). To say a little more, matcher.group(0) would return you the full match. So your regex is almost correct. An improvement is related to your use of $, which would catch the end of the line. Use this only if your sentence fill exactly the line!
With this regular expression (\w+)\p{Punct} you get a group count of 1, means you get one group with punctionation at matcher.group(0) and one without the punctuation at matcher.group(1).
To write the regular expression in Java, use: "(\\w+)\\p{Punct}"
To test your regular expressions online with Java (and actually a lot of other languages) see RegexPlanet
By using the $ operator you will only get a match at the end of a line. So if you have multiple sentences on one line you will not get a match in the middle one.
So you should just use:
(\w+)\.
the capture group will give the correct match.
You can see an example here
I don't understand why really, but this works:
package regex;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main {
public static void main(String[] args) {
String matchesLastWordFine = "a b cd efg hi";
lastWord(matchesLastWordFine);
String noMatchFound = matchesLastWordFine + ".";
lastWord(noMatchFound);
}
private static void lastWord(String sentence) {
System.out.println("\n\ntrying\n" + sentence + "\nmatches:");
Pattern pattern = Pattern.compile("(\\w+)"); //(\w+)\.
Matcher matcher = pattern.matcher(sentence);
String match = null;
while (matcher.find()) {
match = matcher.group();
}
System.out.println(match);
}
}
I guess regex \w+ will match all the words (doh). Then the last word is what I was after. Too simple, really, I was trying to exclude punctuation, but I guess regex does that automagically for you..?

Counting words with regular expression "\S+"

Why does wordCount end up being 1, rather than 5, in the code below?
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class WordCount {
public static void main(String[] args) {
final Pattern wordCountRegularExpression = Pattern.compile("\\S+");
final Matcher matcher = wordCountRegularExpression
.matcher("one two three four five");
int wordCount = 0;
while (matcher.find()) {
wordCount++;
}
System.out.println("wordCount: " + wordCount);
}
}
Doesn't the pattern "\S+" match a word, since it means one or more non-space characters?
This does work by the way:
final Pattern wordCountRegularExpression = Pattern.compile("\\b\\w+\\b");
But I still don't understand why the original code doesn't work.
Doesn't the pattern "\S+" match a word, since it means one or more non-space characters?
Yes.
Using
import java.util.regex.*;
in java 7, the following pattern:
Pattern.compile("\\S+");
Will not count word, but spaces.
So, it should return 4 for the input: "one two three four five", since it have 4 spaces.
It depends on what you're using to separate the words. When I copy the code from your question into my editor, I see plain old spaces (U+0020), but when I viewsource the page I see non-breaking spaces (U+00A0). Java doesn't recognize the NBSP as a whitespace character.
Now the question is why am I seeing NBSP's in the string literal, but nowhere else? And why are they being converted to spaces when I copy/paste? Is anyone else seeing that?

How do I know if a regexp has more than one possible match?

I am writing Java code that has to distinguish regular expressions with more than one possible match from regular expressions that have only one possible match.
For example:
"abc." can have several matches ("abc1", abcf", ...),
while "abcd" can only match "abcd".
Right now my best idea was to look for all unescaped regexp special characters.
I am convinced that there is a better way to do it in Java. Ideas?
(Late addition):
To make things clearer - there is NO specific input to test against. A good solution for this problem will have to test the regex itself.
In other words, I need a method who'se signature may look something like this:
boolean isSingleResult(String regex)
This method should return true if only for one possible String s1. The expression s1.matches(regex) will return true. (See examples above.)
This sounds dirty, but it might be worth having a look at the Pattern class in the Java source code.
Taking a quick peek, it seems like it 'normalize()'s the given regex (Line 1441), which could turn the expression into something a little more predictable. I think reflection can be used to tap into some private resources of the class (use caution!). It could be possible that while tokenizing the regex pattern, there are specific indications if it has reached some kind "multi-matching" element in the pattern.
Update
After having a closer look, there is some data within package scope that you can use to leverage the work of the Pattern tokenizer to walk through the nodes of the regex and check for multiple-character nodes.
After compiling the regular expression, iterate through the compiled "Node"s starting at Pattern.root. Starting at line 3034 of the class, there are the generalized types of nodes. For example class Pattern.All is multi-matching, while Pattern.SingleI or Pattern.SliceI are single-matching, and so on.
All these token classes appear to be in package scope, so it should be possible to do this without using reflection, but instead creating a java.util.regex.PatternHelper class to do the work.
Hope this helps.
If it can only have one possible match it isn't reeeeeally an expression, now, is it? I suspect your best option is to use a different tool altogether, because this does not at all sound like a job for regular expressions, but if you insist, well, no, I'd say your best option is to look for unescaped special characters.
The only regular expression that can ONLY match one input string is one that specifies the string exactly. So you need to match expressions with no wildcard characters or character groups AND that specify a start "^" and end "$" anchor.
"the quick" matches:
"the quick brownfox"
"the quick brown dog"
"catch the quick brown fox"
"^the quick brown fox$" matches ONLY:
"the quick brown fox"
Now I understand what you mean. I live in Belgium...
So this is something what work on most expressions. I wrote this by myself. So maybe I forgot some rules.
public static final boolean isSingleResult(String regexp) {
// Check the exceptions on the exceptions.
String[] exconexc = "\\d \\D \\w \\W \\s \\S".split(" ");
for (String s : exconexc) {
int index = regexp.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
// Then remove all exceptions:
String regex = regexp.replaceAll("\\\\.", "");
// Now, all the strings how can mean more than one match
String[] mtom = "+ . ? | * { [:alnum:] [:word:] [:alpha:] [:blank:] [:cntrl:] [:digit:] [:graph:] [:lower:] [:print:] [:punct:] [:space:] [:upper:] [:xdigit:]".split(" ");
// iterate all mtom-Strings
for (String s : mtom) {
int index = regex.indexOf(s);
if (index != -1) // Forbidden char found
{
return false;
}
}
return true;
}
Martijn
I see that the only way is to check if regexp matches multiple times for particular input.
package com;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class AAA {
public static void main(String[] args) throws Exception {
String input = "123 321 443 52134 432";
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(input);
int i = 0;
while (matcher.find()) {
++i;
}
System.out.printf("Matched %d times%n", i);
}
}

Categories