I have been trying to solve this problem. I have a string which has a pattern. Eg.
CW1234 has been despatched to CW334545
i.e the String can have patterns starting with CW followed by any number of intergers (at max 16).
I want to replace all these patters with an empty character. So that the string will look like
has been despatched to
I have tried the following but it replaces only the first digit followed by the CW. I'm pretty new to java. Any insights would be of great help.
if(Pattern.matches(".*[C][W][0-9].*", str1)) {
Matcher m = Pattern.compile(".*[C][W][0-9].*").matcher(str1);
while(m.find()) {
str1 = str1.replaceAll("[C][W][0-9]", "");
}
}
System.out.println(str1);
You need to have {n,m} quantifier on your digits, to enforce maximum digits. Also, for replacement purpose, you don't need to check beforehand whether the pattern is there or not. replaceAll will replace only if there is matching pattern, else will leave the string as it is.
So, remove all those Pattern and Matcher part, and change your regex to:
str1 = str1.replaceAll("CW\\d{0,16}", "");
If you want at least 1 digit, then make it {1,16}. No need to put C and W in different character classes. A character class with single character is as good as that character itself (given that it's not a special character). Also, you can use \\d instead of [0-9].
You're needlessly constructing the pattern and matching the string several times.
str1 = str1.replaceAll("CW\\d+", "");
This is sufficient. All other code is redundant.
You can also opt to do the replace by hand if performance is a problem.
Your replaceAll is missing a +:
str1 = str1.replaceAll("[C][W][0-9]+", "");
The + will make the regex match any number of digits directly following CW.
Your regex is wrong. Try with:
String str1 = CW1234;
str1 = str1.replaceAll("\\bCW\\d{0,16}\\b","");
if the "CW12134" is a single token in a string or with
String str1 = CW1234;
str1 = str1.replaceAll("^CW\\d{0,16}$","");
if the "CW1234" is a full string.
String.replaceAll("CW[0-9\\s]*", "") does what you need, and it also removes the space at the end of the number.
On another note, the whole point of Pattern.compile() is that you need to compile the required expression once in the application, and then use the matcher to find occurences. So I think your usage is inappropriate (rather than incorrect).
Pattern pattern = Pattern.compile("CD[0-9\\s]*");occurs only once in the code and then reuse it as
Matcher matcher = pattern.matcher(stringToMatch);
Related
here is my code
String a = "X^5+2X^2+3X^3+4X^4";
String exp[]=a.split("(|\\+\\d)[xX]\\^");
for(int i=0;i<exp.length;i++) {
System.out.println("exp: "+exp[i]+" ");
}
im try to find the output which is 5,2,3,4
but instead i got this answer
exp:
exp:5
exp:2
exp:3
exp:4
i dont know where is the first line space come from, and i cannot find a will to get rid of that, i try to use others regex for this and also use compile,still can get rid of the first line, i try to use new string "X+X^5+2X^2+3X^3+4X^4";the first line shows exp:X.
and i also use online regex compiler to try my problem, but their answer is 5,2,3,4, buy eclipse give a space ,and then 5,2,3,4 ,need a help to figure this out
Try to use regex, e.g:
String input = "X^5+2X^2+3X^3+4X^4";
Pattern pattern = Pattern.compile("\\^([0-9]+)");
Matcher matcher = pattern.matcher(input);
for (int i = 1; matcher.find(); i++) {
System.out.println("exp: " + matcher.group(1));
}
It gives output:
exp: 5
exp: 2
exp: 3
exp: 4
How does it work:
Pattern used: \^([0-9]+)
Which matches any strings starting with ^ followed by 1 or more digits (note the + sign). Dash (^) is prefixed with backslash (\) because it has a special meaning in regular expressions - beginning of a string - but in Your case You just want an exact match of a ^ character.
We want to wrap our matches in a groups to refer to them late during matching process. It means we need to mark them using parenthesis ( and ).
Then we want to pu our pattern into Java String. In String literal, \character has a special meaning - it is used as a control character, eg "\n" represents a new line. It means that if we put our pattern into String literal, we need to escape a \ so our pattern becomes: "\\^([0-9]+)". Note double \.
Next we iterate through all matches getting group 1 which is our number match. Note that a ^.character is not covered in our match even if it is a part of our pattern. It is so because wr used parenthesis to mark our searched group, which in our case are only digits
Because you are using the split method which looks for the occurrence of the regex and, well.. splits the string at this position. Your string starts with X^ so it very much matches your regex.
So I am working on regex comparing phone numbers and this is the result:
(?:(?:0{2}|\+)?([1-9][0-9]))? ?([1-9][0-9])? ?([1-9][0-9]{5})
As you can see there are spaces between the numbers. I want them to appear only when there is some other number before the space so:
"0022 45 432345" - should match
"45 345678" or "560032" - should match
" 324400" - shouldn't match because of the space in the beginning
I've been reading different tutorials about regexes and found out about look-behinds, but simple construction like that(just for test):
Pattern p2 = Pattern.compile("(?<=abc)aa");
Matcher m2 = p2.matcher("abcaa");
doesn't work.
Can you tell me what's wrong?
Another problem is - I want a character only happen when it is THE FIRST character in a string, otherwise it shouldn't occur. So the code:
0043 022 234567 should not work, but 022 123450 should match.
I'm stuck right now and would appreciate any help a lot.
This should work just fine. The spaces are moved into the optional groups and are themselves optional. This way, they only match if the group before them is present, but even then they are still optional. No look-behind required.
(?:(?:(?:00|\+)?([1-9][0-9]) ?)?([1-9][0-9]) ?)?([1-9][0-9]{5})
Lookbehind is a zero length match.
The javadoc for the Matcher.matches method determines if the whole String is a match.
What you're looking for is something the Matcher.find and Matcher.group methods. Something like:
final Pattern pattern = Pattern.compile("(?<=abc)aa");
final Matcher matcher = pattern.matcher("abaca");
final String subMatch;
if (matcher.find()) {
subMatch = matcher.group();
} else {
subMatch = "";
}
System.out.println(subMatch);
Example.
I need to remove a part of the string which starts with #.
My sample code works for one string and fails for another.
Failed one: Not able to remove #news4buffalo:
String regex = "\\#\\w+ || #\\w*";
String rawContent = "RT #news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Success one:
String regex = "\\#\\w+ || #\\w*";
String rawContent = "#ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(rawContent);
if (matcher.find()) {
rawContent = rawContent.replaceAll(regex, "");
}
Output:
couldn't agree more. Good crowd last night. #LetsGoFish
From your question it looks like this regex can work for you:
rawContent = rawContent.replaceAll("#\\S*", "");
You can try in this way as well.
String s = "#ZaslowShow couldn't agree more. Good crowd last night. #LetsGoFish";
System.out.println(s.replaceAll("#[^\\s]*\\s+", ""));
// Look till space is not found----^^^^ ^^^^---------remove extra spaces as well
The regex is only considering word characters whereas your input String contains a colon :. You can solve this by replacing \\w with \\S (any non-whitespace character) in your regex. Also there is no need for two patterns.
String regex = "#\\S*";
You don't need to escape # so don't add \ before it like "\\#" (it confuses people).
Don't use matcher to check if string contains part which should be replaced and than use replaceAll because you will have to iterate second time. Just use replaceAll at start, and if it doesn't have anything to replace, it will leave string unchanged. BTW. use replaceAll from Matcher instance to avoid recompiling Pattern.
Regex in form foo||bar doesn't seem right. Regex uses only one pipe | to represent OR so such regex represents foo OR emptyString OR bar. Since empty String is kind of special (every string contains empty string at start, and at end, and even in between characters) it can cause some problems like "foo".replaceAll("|foo", "x") returns xfxoxox, instead of for instance "xxx" because consumption of empty string before f prevented it from being used as potential first character of foo :/
Anyway it seems that you would like to accept any #xxxx words so consider maybe something like "#\\w+" if you want to make sure that there will be at least one character after #.
You can also add condition that # must be first character of word (in case you wouldn't want to remove part after # from e-mail addresses). To do this just use look-behind like (?<=\\s|^)# which will check that before # exist some whitespace, or it is placed at start of the string.
You can also remove space after word you wanted to remove (it there is any).
So you can try with
String regex = "(?<=\\s|^)#\\w*\\s?";
which for data like
RT #news4buffalo: Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
will return
RT : Police say a shooter fired into a crowd yesterday on the Oakmont overpass, striking and killing a 14-year-old. More: http…
But if you would also like to remove other characters beside alphabetic or numeric ones from \\w like : you can simply use \\S which represents non-whitespace-characters, so your regex can look like
String regex = "(?<=\\s|^)#\\S*\\s?";
I need a regex to match a particular string, say 1.4.5 in the below string . My string will be like
absdfsdfsdfc1.4.5kdecsdfsdff
I have a regex which is giving [c1.4.5k] as an output. But I want to match only 1.4.5. I have tried this pattern:
[^\\W](\\d\\.\\d\\.\\d)[^\\d]
But no luck. I am using Java.
Please let me know the pattern.
When I read your expression [^\\W](\\d\\.\\d\\.\\d)[^\\d] correctly, then you want a word character before and not a digit ahead. Is that correct?
For that you can use lookbehind and lookahead assertions. Those assertions do only check their condition, but they do not match, therefore that stuff is not included in the result.
(?<=\\w)(\\d\\.\\d\\.\\d)(?!\\d)
Because of that, you can remove the capturing group. You are also repeating yourself in the pattern, you can simplify that, too:
(?<=\\w)\\d(?:\\.\\d){2}(?!\\d)
Would be my pattern for that. (The ?: is a non capturing group)
Your requirements are vague. Do you need to match a series of exactly 3 numbers with exactly two dots?
[0-9]+\.[0-9]+\.[0-9]+
Which could be written as
([0-9]+\.){2}[0-9]+
Do you need to match x many cases of a number, seperated by x-1 dots in between?
([0-9]+\.)+[0-9]+
Use look ahead and look behind.
(?<=c)[\d\.]+(?=k)
Where c is the character that would be immediately before the 1.4.5 and k is the character immediately after 1.4.5. You can replace c and k with any regular expression that would suit your purposes
I think this one should do it : ([0-9]+\\.?)+
Regular Expression
((?<!\d)\d(?:\.\d(?!\d))+)
As a Java string:
"((?<!\\d)\\d(?:\\.\\d(?!\\d))+)"
String str= "absdfsdfsdfc**1.4.5**kdec456456.567sdfsdff22.33.55ffkidhfuh122.33.44";
String regex ="[0-9]{1}\\.[0-9]{1}\\.[0-9]{1}";
Matcher matcher = Pattern.compile( regex ).matcher( str);
if (matcher.find())
{
String year = matcher.group(0);
System.out.println(year);
}
else
{
System.out.println("no match found");
}
I have strings like "xxxxx?434334", "xxx?411112", "xxxxxxxxx?11113" and so on.
How to substring properly to retrieve "xxxxx" (everything that comes untill '?' character)?
return s.substring(0, s.indexOf('?'));
No need for a regex for that.
If you have a problem, use a regex. Now you have two problems.
str = str.replaceAll("[?].*", "");
In other words, "remove everything after, and including, the question mark character". The ? has to be enclosed in square brackets because otherwise it has a special meaning.
I would agree with others answers that you should avoid using regex wherever possible, but if you did want to use it for this scenario you could use the following
Pattern regex = Pattern.compile("([^\\?]*)\\?{1}");
Matcher m = regex.matcher(str);
if (m.find()) {
result = m.group(1);
}
where str is your input string.
EDIT:
Description of regex match any group of characters that are not a "?" and have a single "?" after the group
The Pattern ".*(?=\?)" should work as well. ?= is a positive lookahead, which means the mattern matches everything that comes before a quotation mark, but not the quotation mark itself.