Retain emoticons, Java Regex - java

I have a program which filters the string and retains the english characters and the emoticons. I am trying to get a regular expression which keeps the emoticons like :) , :D , :( etc but takes out single ':' or '(' or ')' ... Basically I want ':' and ')' together else I need to filter them....In my program I am able to keep the emoticons but I am also getting : and ) along with it....Can you please help me out?
String pattern = "[^\\w^\\s^(:))]";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(text);
text = m.replaceAll("");
Thanks for your help.

You're trying to use grouping parenthesis inside square brackets. This doesn't work since inside square brackets parenthesis lose their special meaning.
Square brackets define a character class which is a single atom, not a sequence of atoms. Instead, you should simply use a two-branch alternative: one for : and one for a parenthesis, D etc and use look-ahead and look-behind in each branch.
You can find more info about regular expression syntax here.
Also, you may give some consideration to more complex emoticons like :-).

Related

what is missing in my java regex?

I want to fetch
http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png
from
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
I have tried this code:
String a = "";
Pattern pattern = Pattern.compile("url(.*)");
Matcher matcher = pattern.matcher(imgpath);
if (matcher.find()) {
a = (matcher.group(1));
}
return a;
but a == (http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_639_o_4746_precious_image_1419867529.png)
how can I fine tune it?
Why use a regular expression to begin with?
Given
final String s = "url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)";
If the string is always the same format a simple substring(4,s.length()-1) would be better.
That said, if you insist on a regular expression:
You have to escape the ( with \( so in Java ( you have to escape the \ ) it would be \\( same with the ).
Then you can get the grouping with url\\((.+)\\), test it here!
Learn to use RegEx101.com before coming here, it will point out errors like this immediately.
As you already seem to know ( and )` represents groups which means that in regex
url(.*)
(.*) will place everything after url in group 1, which in case of
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
will be
(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
If you want to exclude ( and ) from match you need to add their literals to regex, which means you need to escape them. There are many things to do it, like adding \ before each of them, or surrounding them with [ ].
Other problem with your regex is that .* finds maximal potential match but since . represents any character (except line separators) it can also include ( and ). To solve this problem you can make * quantifier reluctant by adding ? after it so your final regex can be written as string
"url\\((.*?)\\)"
---------------
url
\\( - ( literal
(.*?) - group 1
\\) - ) literal
or you can use instead of . character class which will accept all characters except ) like
"url\\(([^)]*)\\)"
Try this regex:
url\((.*?)\)
The outermost parentheses are escaped so they will be matched literally. The inner parentheses are for capturing a group. The question mark after the .* is to make the match lazy, so the first closing parenthesis found will end the group.
Note that to use this regex in Java, you'll have to additionally escape the backslashes in order to express the above regex as a string literal:
String regex = "url\\((.*?)\\)";
You need to escape the () to match the parenthesis in the string, and then add another set of () around the part you want to pull out in group 1, the actual url. I also changed the part inside the parenthesis to [^)]*, which will match everything until it finds a ). See below:
url\(([^)]*)\)

Java // No match with RegExp and square brackets

I have a string like
Berlin -> Munich [label="590"]
and now I'm searching a regular expression in Java that checks if a given line (like above) is valid or not.
Currently, my RegExp looks like \\w\\s*->\\s*\\w\\s*\\[label=\"\\d\"\\]"
However, it doesn't work and I've found out that \\w\\s*->\\s*\\w\\s* still works but when adding \\[ it can't find the occurence (\\w\\s*->\\s*\\w\\s*\\[).
What I also found out is that when '->' is removed it works (\\w\\s*\\s*\\w\\s*\\[)
Is the arrow the problem? Can hardly imagine that.
I really need some help on this.
Thank you in advance
This is the correct regular expression:
"\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]"
What you report about matches and non-matches of partial regular expressions is very unlikely, not possible with the Berlin/Munich string.
Also, if you are really into German city names, you might have to consider names like Castrop-Rauxel (which some wit has called the Latin name of Wanne-Eickel ;-) )
Try this
String message = "Berlin -> Munich [label=\"590\"]";
Pattern p = Pattern.compile("\\w+\\s*->\\s*\\w+\\s*\\[label=\"\\d+\"\\]");
Matcher matcher = p.matcher(message);
while(matcher.find()) {
System.out.println(matcher.group());
}
You need to much more than one token of characters and numbers.

Java Regex lookahead takes too much time

I'm trying to create a proper regex for my problem and apparently ran into weird issue.
Let me describe what I'm trying to do..
My goal is to remove commas from both ends of the string. E,g, string , ,, ,,, , , Hello, my lovely, world, ,, , should become just Hello, my lovely, world.
I have prepared following regex to accomplish this:
(\w+,*? *?)+(?=(,?\W+$))
It works like a charm in regex validators, but when I'm trying to run it on Android device, matcher.find() function hangs for ~1min to find a proper match...
I assume, the problem is in positive lookahead I'm using, but I couldn't find any better solution than just trim commas separately from the beginning and at the end:
output = input.replaceAll("^(,?\\W?)+", ""); //replace commas at the beginning
output = output.replaceAll("(,?\\W?)+$", ""); //replace commas at the end
Is there something I am missing in positive lookahead in Java regex? How can I retrieve string section between commas at the beginning and at the end?
You don't have to use a lookahead if you use matching groups. Try regex ^[\s,]*(.+?)[\s,]*$:
EDIT: To break it apart, ^ matches the beginning of the line, which is technically redundant if using matches() but may be useful elsewhere. [\s,]* matches zero or more whitespace characters or commas, but greedily--it will accept as many characters as possible. (.+?) matches any string of characters, but the trailing question mark instructs it to match as few characters as possible (non-greedy), and also capture the contents to "group 1" as it forms the first set of parentheses. The non-greedy match allows the final group to contain the same zero-or-more commas or whitespaces ([\s,]*). Like the ^, the final $ matches the end of the line--useful for find() but redundant for matches().
If you need it to match spaces only, replace [\s,] with [ ,].
This should work:
Pattern pattern = Pattern.compile("^[\\s,]*(.+?)[\\s,]*$");
Matcher matcher = pattern.matcher(", ,, ,,, , , Hello, my lovely, world, ,, ,");
if (!matcher.matches())
return null;
return matcher.group(1); // "Hello, my lovely, world"

Java: regex - how do i get the first quote text

As a beginner with regex i believe im about to ask something too simple but ill ask anyway hope it won't bother you helping me..
Lets say i have a text like "hello 'cool1' word! 'cool2'"
and i want to get the first quote's text (which is 'cool1' without the ')
what should be my pattern? and when using matcher, how do i guarantee it will remain the first quote and not the second?
(please suggest a solution only with regex.. )
Use this regular expression:
'([^']*)'
Use as follows: (ideone)
Pattern pattern = Pattern.compile("'([^']*)'");
Matcher matcher = pattern.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Or this if you know that there are no new-line characters in your quoted string:
'(.*?)'
when using matcher, how do i guarantee it will remain the first quote and not the second?
It will find the first quoted string first because it starts seaching from left to right. If you ask it for the next match it will give you the second quoted string.
If you want to find first quote's text without the ' you can/should use Lookahead and Lookbehind mechanism like
(?<=').*?(?=')
for example
System.out.println("hello 'cool1' word! 'cool2'".replaceFirst("(?<=').*?(?=')", "ABC"));
//out -> hello 'ABC' word! 'cool2'
more info
You could just split the string on quotes and get the second piece (which will be between the first and second quotes).
If you insist on regex, try this:
/^.*?'(.*?)'/
Make sure it's set to multiline, unless you know you'll never have newlines in your input. Then, get the subpattern from the result and that will be your string.
To support double quotes too:
/^.*?(['"])(.*?)\1/
Then get subpattern 2.

How do I match text within parentheses using regex?

I have the following pattern:
(COMPANY) -277.9887 (ASP,) -277.9887 (INC.)
I want the final output to be:
COMPANY ASP, INC.
Currently I have the following code and it keeps returning the original pattern ( I assume because the group all falls between the first '(' and last ')'
Pattern p = Pattern.compile("((.*))",Pattern.DOTALL);
Matcher matcher = p.matcher(eName);
while(matcher.find())
{
System.out.println("found match:"+matcher.group(1));
}
I am struggling to get the results I need and appreciate any help. I am not worried about concatenating the results after I get each group, just need to get each group.
Pattern p = Pattern.compile("\\((.*?)\\)",Pattern.DOTALL);
Your .* quantifier is 'greedy', so yes, it's grabbing everything between the first and last available parenthesis. As chaos says, tersely :), using the .*? is a non-greedy quantifier, so it will grab as little as possible while still maintaining the match.
And you need to escape the parenthesis within the regex, otherwise it becomes another group. That's assuming there are literal parenthesis in your string. I suspect what you referred to in the initial question as your pattern is in fact your string.
Query: are "COMPANY", "ASP," and "INC." required?
If you must have values for them, then you want to use + instead of *, the + is 1-or-more, the * is zero-or-more, so a * would match the literal string "()"
eg: "((.+?))"
Tested with Java 8:
/** * Below Pattern returns the string inside Parenthesis.
* Description about casting regular expression: \(+\s*([^\s)]+)\s*\)+
* \(+ : Exactly matches character "(" at least once
* \s* : matches zero to any number white character.
* ( : Start of Capturing group
* [^\s)]+: match any number of character except ^, ) and spaces.
* ) : Closing of capturing group.
* \s*: matches any white character(0 to any number of character)
* \)*: Exactly matches character ")" at least once.
private static Pattern REGULAR_EXPRESSION = Pattern.compile("\\(+\\s*([^\\s)]+)\\s*\\)+");
Not a direct answer to your question but I recommend you use RegxTester to get to the answer and any future question quickly. It allows you to test in realtime.
If your strings are always going to look like that, you could get away with just using a couple calls to replaceAll instead. This seems to work for me:
String eName = "(COMPANY) -277.9887 (ASP,) -277.9887 (INC.)";
String eNameEdited = eName.replaceAll("\\).*?\\("," ").replaceAll("\\(|\\)","");
System.out.println(eNameEdited);
Probably not the most efficient thing in the world, but fairly simple.

Categories