java easy Regular expression - java

I have strings like "xxxxx?434334", "xxx?411112", "xxxxxxxxx?11113" and so on.
How to substring properly to retrieve "xxxxx" (everything that comes untill '?' character)?

return s.substring(0, s.indexOf('?'));
No need for a regex for that.
If you have a problem, use a regex. Now you have two problems.

str = str.replaceAll("[?].*", "");
In other words, "remove everything after, and including, the question mark character". The ? has to be enclosed in square brackets because otherwise it has a special meaning.

I would agree with others answers that you should avoid using regex wherever possible, but if you did want to use it for this scenario you could use the following
Pattern regex = Pattern.compile("([^\\?]*)\\?{1}");
Matcher m = regex.matcher(str);
if (m.find()) {
result = m.group(1);
}
where str is your input string.
EDIT:
Description of regex match any group of characters that are not a "?" and have a single "?" after the group

The Pattern ".*(?=\?)" should work as well. ?= is a positive lookahead, which means the mattern matches everything that comes before a quotation mark, but not the quotation mark itself.

Related

Java regex for matching #<string>vs<string>

I have a string "Waiting for match #indvspak and #indvsaus" and want to match the strings "#indvspak" and "#indvsaus" seperately.
I am using the following regex (^|)#.*vs.+?\s\b. But it matches the entire string starting from the hash sign. How can i achieve my requirement please help.
I though you want to match the string which startswith # contains vs and the whole string must be preceded by a non-space character.
"(?<!\\S)#\\S*vs\\S+"
(?<!\\S) negative look-behind asserts that the match won't be preceded by a non-space character.
Code:
String s = "Waiting for match #indvspak and #indvsaus";
Matcher m = Pattern.compile("(?<!\\S)#\\S*vs\\S+").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
Output:
#indvspak
#indvsaus
You need this regex:
#[^\\s]+
it matches anything after (including) # but not spaces.
Edit:
As #AvinashRaj suggested, if you want to ensure "vs" appears in the hashtag, you should use a negative lookbehind.
I highly recommend you to go though the String API, there are many methods that can help you with your problem.
EDITED
(copied from other answer comments)
Use this:
"(?<!\\B)#\\w+vs\\o/\S#vas\\S-[]"
Easy...

Java Match string with optional hyphen

I am trying to match a series of string thats looks like this:
item1 = "some value"
item2 = "some value"
I have some strings, though, that look like this:
item-one = "some new value"
item-two = "some new value"
I am trying to parse it using regular expressions, but I can't get it to match the optional hyphen.
Here is my regex string:
Pattern p = Pattern.compile("^(\\w+[-]?)\\w+?\\s+=\\s+\"(.*)\"");
Matcher m = p.matcher(line);
m.find();
String option = m.group(1);
String value = m.group(2);
May someone please tell me what I could be doing wrong.
Thank you
I suspect that main reason of your problem is that you are expecting w+? to make w+ optional, where in reality it will make + quantifier reluctant so regex will still try to find at least one or more \\w here, consuming last character from ^(\\w+.
Maybe try this way
Pattern.compile("^(\\w+(?:-\\w+)?)\\s+=\\s+\"(.*?)\"");
in (\\w+(?:-\\w+)?) -> (?:-\\w+) part will create non-capturing group (regex wont count it as group so (.*?) will be group(2) even if this part will exist) and ? after it will make this part optional.
in \"(.*?)\" *? is reluctant quantifier which will make regex to look for minimal match that exist between quotation marks.
Demo
Your problem is that you have the ? in the wrong place:
Try this regex:
^((\\w+-)?\\w+)\\s*=\\s*\"([^\"]+)\"
But use groups 1 and 3.
I've cleaned up the regex a bit too
This regex should work for you:
^\w[\w-]*(?<=\w)\s*=\s*\"([^"]*)\"
In Java:
Pattern p = Pattern.compile("^\\w[\\w-]*(?<=\\w)\\s*=\\s*\"([^\"]*)\"");
Live Demo: http://www.rubular.com/r/0CvByDnj5H
You want something like this:
([\w\-]+)\s*=\s*"([^"]*)"
With extra backslashes for Java:
([\\w\\-]+)\\s*=\\s*\"([^\"]*)\"
If you expect other symbols to start appearing in the variable name, you could make it a character class like [^=\s] to accept any characters not = or whitespace, for example.

Need regex to match the given string

I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}

Java Regex Escape

I've got this bit of code to grab a url within a textarea. It has been working great until I tried a url with a '+' in it.
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
So I tried puting \\+ and \\\\+ in my code but it did not work. So i did some googling and stack overflow problems kept mentioning this guy
Pattern.quote("+");
However, I am not sure how I implement that statement into what I currently have now. If that is even the way I want to go. But I'm assuming I need to do something like this...
String quote = Pattern.quote("+");
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z]*)(.*)");
Matcher matcher = pattern.matcher(text);
And then add the variable quote somewhere in the pattern? Please help! I just learned this stuff today I'm brand new to it! Thank you?
just escape the quote with \, example
Pattern pattern = Pattern.compile("(.*)(https?[://.0-9-?a-z=_#!A-Z\"]*)(.*)");
(https?[://.0-9-?a-z=_#!A-Z]*)
Bear in mind that [ and ] denote a class of characters, and that this means that any character within it will be included. [aegl]+ will match "age", "a", "e", g", "eagle", and "gaggle". It also means that a character listed twice (like /) is completely redundant.
Pattern.quote is useful, but will only return the same string with a backslash preceding any special character. Pattern.quote("+") will return \+.
Because + has no significance between square brackets, you should be able to put a + unescaped within the square brackets. At that point you can also add a \\ if it makes you feel better.
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z+]*)(.*)");
Pattern pattern = Pattern.compile("(.*)(https?[:/.0-9-?a-z=_#!A-Z\\+]*)(.*)");
See it here: http://fiddle.re/0780

java Regex - split but ignore text inside quotes?

using only regular expression methods, the method String.replaceAll and ArrayList
how can i split a String into tokens, but ignore delimiters that exist inside quotes?
the delimiter is any character that is not alphanumeric or quoted text
for example:
The string :
hello^world'this*has two tokens'
should output:
hello
worldthis*has two tokens
I know there is a damn good and accepted answer already present but I would like to add another regex based (and may I say simpler) approach to split the given text using any non-alphanumeric delimiter which not inside the single quotes using
Regex:
/(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+/
Which basically means match a non-alphanumeric text if it is followed by even number of single quotes in other words match a non-alphanumeric text if it is outside single quotes.
Code:
String string = "hello^world'this*has two tokens'#2ndToken";
System.out.println(Arrays.toString(
string.split("(?=(([^']+'){2})*[^']*$)[^a-zA-Z\\d]+"))
);
Output:
[hello, world'this*has two tokens', 2ndToken]
Demo:
Here is a live working Demo of the above code.
Use a Matcher to identify the parts you want to keep, rather than the parts you want to split on:
String s = "hello^world'this*has two tokens'";
Pattern pattern = Pattern.compile("([a-zA-Z0-9]+|'[^']*')+");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
See it working online: ideone
You cannot in any reasonable way. You are posing a problem that regular expressions aren't good at.
Do not use a regular expression for this. It won't work. Use / write a parser instead.
You should use the right tool for the right task.

Categories