I want to get the first word of astring containing alphanumeric field
EG.
string can be 'abc123abc' or 'abc-123abc'
i just want the first 'abc'
is there any way to get it without for loop(I want to do this using regex but i don't know much about regular expression)
actually string pattern is like
[A-Za-z]{2,5}[-]{0,1}[0-9]{1,15}[A-Za-z]{0,15}
My aim is to get the first word
Wrap the part of the expression that you would like to capture in a capturing group, and then use group(1) of the matcher to access it:
([A-Za-z]{2,5})-?[0-9]{1,15}[A-Za-z]{0,15}
The first group will capture everything up to the optional dash:
Pattern p = Pattern.compile("([A-Za-z]{2,5})-?[0-9]{1,15}[A-Za-z]{0,15}");
Matcher m = p.matcher("abc123abc");
if (m.find()) {
System.out.println(m.group(1));
}
The above prints abc (link to ideone).
Try as
System.out.println("abc-123abc".split("[-\\d]+")[0]);
output
abc
^[A-Za-z]+
will match ASCII letters at the start of the string. Is that what you need?
You can get the matched text for ^[A-Za-z]{2,5}. This will match all the first letters.
String word = "abc-123abc".replaceFirst("[^a-zA-Z].*$", "");
This removes everything after the first non a-z character. You can also use replace with capturing groups.
String word = "abc-123abc".replaceFirst("^([a-zA-Z]+).*$", "$1");
String.replaceFirst()
Related
Although I read a large number of posts on the topic (in particular using lookarounds), I haven't understood if this more general case can be solved using regular expressions.
setup:
1) an input regex is passed in
2) the input regex is embedded in a negative regex so that
3) anything that is not identified by the input regex is matched
Example:
given:
input regex: "[-//s]";
and
text: "self-service restaurant"
I want a negative regex wherein to embed my input regex so that I can match my text as:
"self", "service", "restaurant"
Importantly, the negative regex should also be able to match a simple string like:
"restaurant"
Note, what I want to do could be achieved changing the input regex from
"[-//s]"
to
"[^-//s]"
Yet, I'm after a more general approach where any regular expression can be passed into a negative regex.
You could achieve this through matching or splitting.
Through matching.
String s = "self-service restaurant";
Matcher m = Pattern.compile("[^-\\s]+").matcher(s);
while(m.find()) {
System.out.println(m.group());
}
You need to put the pattern inside a negated character class to match all the chars except the one present inside the negated class.
Through splitting.
String s = "self-service restaurant";
String parts[] = s.split("[-\\s]+");
System.out.println(Arrays.toString(parts));
This would split your input according to one or more space or hyphen chars. Later you could join them to get your desired output.
Through replacing.
String s = "self-service restaurant";
System.out.println(s.replaceAll("[-\\s]+", "\\\n"));
I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}
Consider the following code snippet:
String input = "Print this";
System.out.println(input.matches("\\bthis\\b"));
Output
false
What could be possibly wrong with this approach? If it is wrong, then what is the right solution to find the exact word match?
PS: I have found a variety of similar questions here but none of them provide the solution I am looking for.
Thanks in advance.
When you use the matches() method, it is trying to match the entire input. In your example, the input "Print this" doesn't match the pattern because the word "Print" isn't matched.
So you need to add something to the regex to match the initial part of the string, e.g.
.*\\bthis\\b
And if you want to allow extra text at the end of the line too:
.*\\bthis\\b.*
Alternatively, use a Matcher object and use Matcher.find() to find matches within the input string:
Pattern p = Pattern.compile("\\bthis\\b");
Matcher m = p.matcher("Print this");
m.find();
System.out.println(m.group());
Output:
this
If you want to find multiple matches in a line, you can call find() and group() repeatedly to extract them all.
Full example method for matcher:
public static String REGEX_FIND_WORD="(?i).*?\\b%s\\b.*?";
public static boolean containsWord(String text, String word) {
String regex=String.format(REGEX_FIND_WORD, Pattern.quote(word));
return text.matches(regex);
}
Explain:
(?i) - ignorecase
.*? - allow (optionally) any characters before
\b - word boundary
%s - variable to be changed by String.format (quoted to avoid regex
errors)
\b - word boundary
.*? - allow (optionally) any characters after
For a good explanation, see: http://www.regular-expressions.info/java.html
myString.matches("regex") returns true or false depending whether the
string can be matched entirely by the regular expression. It is
important to remember that String.matches() only returns true if the
entire string can be matched. In other words: "regex" is applied as if
you had written "^regex$" with start and end of string anchors. This
is different from most other regex libraries, where the "quick match
test" method returns true if the regex can be matched anywhere in the
string. If myString is abc then myString.matches("bc") returns false.
bc matches abc, but ^bc$ (which is really being used here) does not.
This writes "true":
String input = "Print this";
System.out.println(input.matches(".*\\bthis\\b"));
You may use groups to find the exact word. Regex API specifies groups by parentheses. For example:
A(B(C))D
This statement consists of three groups, which are indexed from 0.
0th group - ABCD
1st group - BC
2nd group - C
So if you need to find some specific word, you may use two methods in Matcher class such as: find() to find statement specified by regex, and then get a String object specified by its group number:
String statement = "Hello, my beautiful world";
Pattern pattern = Pattern.compile("Hello, my (\\w+).*");
Matcher m = pattern.matcher(statement);
m.find();
System.out.println(m.group(1));
The above code result will be "beautiful"
Is your searchString going to be regular expression? if not simply use String.contains(CharSequence s)
System.out.println(input.matches(".*\\bthis$"));
Also works. Here the .* matches anything before the space and then this is matched to be word in the end.
I need to scroll a List and removing all strings that contains some special char. Using RegEx I'm able to remove all string that start with these special chars but, how can I find if this special char is in the middle of the string?
For instance:
Pattern.matches("[()<>/;\\*%$].*", "(123)")
returns true and I can remove this string
but it doesn't works with this kind of string: 12(3).
Is it correct to use \* to find the occurrence of "*" char into the string?
Thanks for the help!
Andrea
You are yet another victim of Java's ill-named .matches() which tries and match the whole input and contradicts the very definition of regex matching.
What you want is matching one character among ()<>/;\\*%$. With Java, you need to create a Pattern, a Matcher from this Pattern and use .find() on this matcher:
final Pattern p = pattern.compile("[()<>/;\\*%$]");
final Matcher m = p.matcher(yourinput);
if (m.find()) // match, proceed
Try the following:
!Pattern.matches("^[^()<>/;\\*%$]*$", "(123)")
This uses a negated character class to ensure that all the characters in the string are not any of the characters in the class.
You then obviously negate the expression since you are testing for a string that does not match.
Is it correct to use \* to find the occurrence of "*" char into the string?
Yes.
Pattern.matches() tries to match the whole input. So since your regex says that the input has to start with a "special" char, 12(3) doesn't match.
I need to match Twitter-Hashtags within an Android-App, but my code doesn't seem to do what it's supposed to.
What I came up with is:
ArrayList<String> tags = new ArrayList<String>(0);
Pattern p = Pattern.compile("\b#[a-z]+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(tweet); // tweet contains the tweet as a String
while(m.find()){
tags.add(m.group());
}
The variable tweet contains a regular tweet including hashtags - but find() doesn't trigger. So I guess my regular expression is wrong.
Your regex fails because of the \b word boundary anchor. This anchor only matches between a non-word character and a word-character (alphanumeric character). So putting it directly in front of the # causes the regex to fail unless there is an alphanumeric character before the #! Your regex would match a hashtag in foobarfoo#hashtag blahblahblah but not in foobarfoo #hashtag blahblahblah.
Use #\w+ instead, and remember, inside a string, you need to double the backslashes:
Pattern p = Pattern.compile("#\\w+");
Your pattern should be "#(\\w+)" if you are trying to just match the hash tag. Using this and the tweet "retweet pizza to #pizzahut", doing m.group() would give "#pizzahut" and m.group(1) would give "pizzahut".
Edit: Note, the html display is messing with the backslashes for escape, you'll need to have two for the w in your string literal in Java.