I'm trying to take a phone number which can be in the format either +44 or +4 followed by any number of digits or hyphens, and replace the +44 or +4 with +44 or +4 followed by a space.
I believe I need a look around to match the full number but only replace the initial prefix, what I'm trying atm is
^[+]\d[0-9](?:([0-9]+))?
which matches the number (without hyphens) however I thought the lookahead would only match the number and not capture the extra digits however it seems to capture the whole thing.
Can anyone point me in the right direction as to what I've done wrong?
EDIT:
To be clearer my Java code is
Pattern pattern = Pattern.compile("^[+]\\d[0-9](?:([0-9]+))?");
if(pattern.matcher("+441234567890").matches())
String num = pattern.matcher(title).replaceFirst("$0 $1");
Thanks.
If you want to match whole number, but replace only part of it, you should not use positive lookahead, but just gruping, like in:
(^\+\d\d)([\d-]+)?
prefix will be in group 1, and the rest of number in group 2, so to add a space between these parts, just use something like group1 + space + group2.
In your example it should look like this:
Pattern pattern = Pattern.compile("(^\\+\\d\\d)([\\d-]+)?");
if(pattern.matcher("+441234567890").matches()) {
num = pattern.matcher(title).replaceFirst("$1 $2");
}
However this regex will always capture two digits in prefix, if you want to match +44 or +4 you should use:
(^\+(44|4))([\d-]+)?
so if you have more possible prefixes, you need to change this regex also.
You regex didn't work as you expected because (?:([0-9]+))? is a non capturing group, so the fragment matched by this part of regex was not captured, but it was still matched by whole regex. So $0 returned whole regex, and $1 should not return anything.
Related
I have a following string:
'pp_3', 365]
What comes after pp_ may have different length. What comes after , and is before ] is what I'd like to capture (and only it). Its length varies but it is always a number.
I've come up with (?<=pp_).*,(.*)(?=]). It gives 3', 365 as a full match and in group 1 there is what I want '365'. How do I get only 365 as a full match?
Please let me know if I am unable to explain my doubts. Thanks
Try this:
[^_]*_(\d*)'\s*,\s*(\-?\d+)\s*].
This regex captures 2 groups, that correspond to each of the numbers, the first after pp_ and the second after ', (which may be negative). If you don't want to capture the first one as a group, instead of (\d*), just write (?:\d*).
Try this expression. The second group should be what you're after:
(?<='pp_)(\d*', )(\d*)]
To match the digits only and if you want to make use of a positive lookbehind, you could make use of a quantifier in the lookbehind (which you can specify yourself) which is supported by Java
(?<=pp_[^,]{0,1000}, )\d+(?=])
Explanation
(?<= Positive lookbehind, assert what is on the left is
pp_[^,]{0,1000} Match pp_, match any char except , 0-1000 times
, Match a comma and space
) Close lookbehind
\d+ Match 1+ digits
(?=]) Positive lookahead, assert what is on the right is ]
In Java
String regex = "(?<=pp_[^,]{0,1000}, )\\d+(?=])";
Java demo
You could also use a capturing group instead:
pp_[^,]*, (\d+)]
Regex demo
Currently this regular expression:
^(?:\\S+\\s+)*?(\\S+)\\s+(?:No\\.\\s+)?(\\S+)(?:\\s+\\(.*?\\))?$
captures 418—FINAL in group number 2 for an input like:
String text="H.B. 418—FINAL VERSION";
How do I change this regular expression to only capture the number (digits) of "418" in group2 ?
EDIT:
I'd still like to capture "H.B." in a preceding group.
Just change the boundaries of the second group to only include the digits. To also save the "H.B." part, add paranthesis around that part too:
^(?:(\\S+)\\s+)*?(\\d+)\\S+\\s+(?:No\\.\\s+)?(\\S+)(?:\\s+\\(.*?\\))?$
I'm not entirely sure what your exact requirements are (your regex is looking for an optional "No." but you haven't given any examples). But this will work on the example you give:
^(?:\\S+\\s+)*?(\\S+)\\s+(?:No\\.\\s+)?(\\d+).*(?:\\s+\\(.*?\\))?$
assuming you don't need the text following the digits. That is, just change the second \S to \d. I also added .* after this to match any remaining characters following the digits up to an optional parenthesized part (without capturing them, but you can capture them if you want to).
I am having a problem splitting something like the following string:
43.80USD
What I want is to be able to split the expression into an array that has "43.80" as the first element and "USD" as the second. So the result would be something like:
["43.80", "USD"]
I am sure there is some way to do this with regex, but I am not proficient enough with it to figure it out on my own. Any help would be much appreciated.
If the format of your string is fixed you can split it as follows
String[] currency = "48.50USD".split("(?<=\\d)(?=[a-zA-Z])");
System.out.println("Amount='"+currency[0]+"'; Denomination='"+currency[1]+"'");
// prints: Amount='48.50'; Denomination='USD'
The regex above uses a positive look-behind (?<=) and a positive lookahead (?=) to find a separator (which is of zero-length here) that's preceded with a number and followed by a letter.
If your data really looks like "43.80USD" then you can use
"43.80USD".split("(?i)(?=[a-z])",2)
(?=[a-z]) will split before any of a-z characters
(?i) will make used regex case-insensitive so it will also work for uSd
second argument is max size of result array, since you don't want ["43.80", "U", "S, "D"] but ["43.80", "USD"] we need to use 2.
This regex works(\d*\.\d*)([a-zA-Z]*). Group 1 will be the amount, including the decimal. Group 2 will be the USD or other monetary name. Note that this regex only requires a decimal point, everything else is optional. So this also matches: "45123.15542ABCDEFG". Group 1 will be 45123.15542 and group 2 will be ABCDEFG. If you want more strict requirements, tell me what they are and Ill put it in. Otherwise your code will look something like:
Pattern p = Pattern.compile("(\\d*\\.\\d*)([a-zA-Z]*)");//Note the double \\ to escape twice.
Matcher m = p.matcher("43.80USD");
String amount, type;
if(m.matches){
amount = m.group(1);
type = m.group(2);
}
I am trying to write a regex for java that will match the following string:
number,number,number (it could be this simple or it could have a variable number of numbers, but each number has to have a comma after it there will not be any white space though)
here was my attempt:
[[0-9],[0-9]]+
but it seems to match anything with a number in it
You could try something along the lines of ([0-9]+,)*[0-9]+
This will match:
Only one number, e.g.: 7
Two numbers, e.g.: 7,52
Three numbers, e.g.: 7,52,999
etc.
This will not match:
Things with spaces, e.g.: 7, 52
A list ending with a comma, e.g.: 7, 52,
Many other things out of the scope of this problem.
I think this would work
\d+,(\d+,)+
Note that as you want, that will only capture number followed by a comma
I guess you are starting with a String. Why don't you just use String.split(",") ?
^ means the start of a string and $ means the end. If you don't use those, you could match something in the middle (b matched "abc").
The + works on the element before it. b is an element, [0-9] is an element, and so are groups (things wrapped in parenthesis).
So, the regex you want matches:
The start of the string ^
a number [0-9]
any amount of comas flowed by numbers (,[0-9])+
the end of the string $
or, ^[0-9](,[0-9])+$
Try regex as [\d,]* string representation as [\\d,]* e.g. below:
Pattern p4 = Pattern.compile("[\\d,]*");
Matcher m4 = p4.matcher("12,1212,1212ad,v");
System.out.println(m4.find()); //prints true
System.out.println(m4.group());//prints 12,1212,1212
If you want to match minimum one comma (,) and two numbers e.g. 12,1212 then you may want to use regex as (\d+,)+\d+ with string representation as \\d+,)+\\d+. This regex matches a a region with a number minimum one digit followed by one comma(,) followed by minimum one digit number.
I've this string
String myString ="A~BC~FGH~~zuzy|XX~ 1234~ ~~ABC~01/01/2010 06:30~BCD~01/01/2011 07:45";
and I need to extract these 3 substrings
1234
06:30
07:45
If I use this regex \\d{2}\:\\d{2} I'm only able to extract the first hour 06:30
Pattern depArrHours = Pattern.compile("\\d{2}\\:\\d{2}");
Matcher matcher = depArrHours.matcher(myString);
String firstHour = matcher.group(0);
String secondHour = matcher.group(1); (IndexOutOfBoundException no Group 1)
matcher.group(1) throws an exception.
Also I don't know how to extract 1234. This string can change but it always comes after 'XX~ '
Do you have any idea on how to match these strings with regex expressions?
UPDATE
Thanks to Adam suggestion I've now this regex that match my string
Pattern p = Pattern.compile(".*XX~ (\\d{3,4}).*(\\d{1,2}:\\d{2}).*(\\d{1,2}:\\d{2})";
I match the number, and the 2 hours with matcher.group(1); matcher.group(2); matcher.group(3);
The matcher.group() function expects to take a single integer argument: The capturing group index, starting from 1. The index 0 is special, which means "the entire match". A capturing group is created using a pair of parenthesis "(...)". Anything within the parenthesis is captures. Groups are numbered from left to right (again, starting from 1), by opening parenthesis (which means that groups can overlap). Since there are no parenthesis in your regular expression, there can be no group 1.
The javadoc on the Pattern class covers the regular expression syntax.
If you are looking for a pattern that might recur some number of times, you can use Matcher.find() repeatedly until it returns false. Matcher.group(0) once on each iteration will then return what matched that time.
If you want to build one big regular expression that matches everything all at once (which I believe is what you want) then around each of the three sets of things that you want to capture, put a set of capturing parenthesis, use Matcher.match() and then Matcher.group(n) where n is 1, 2 and 3 respectively. Of course Matcher.match() might also return false, in which case the pattern did not match, and you can't retrieve any of the groups.
In your example, what you probably want to do is have it match some preceding text, then start a capturing group, match for digits, end the capturing group, etc...I don't know enough about your exact input format, but here is an example.
Lets say I had strings of the form:
Eat 12 carrots at 12:30
Take 3 pills at 01:15
And I wanted to extract the quantity and times. My regular expression would look something like:
"\w+ (\d+) [\w ]+ (\d{1,2}:\d{2})"
The code would look something like:
Pattern p = Pattern.compile("\\w+ (\\d+) [\\w ]+ (\\d{2}:\\d{2})");
Matcher m = p.matcher(oneline);
if(m.matches()) {
System.out.println("The quantity is " + m.group(1));
System.out.println("The time is " + m.group(2));
}
The regular expression means "a string containing a word, a space, one or more digits (which are captured in group 1), a space, a set of words and spaces ending with a space, followed by a time (captured in group 2, and the time assumes that hour is always 0-padded out to 2 digits). I would give a closer example to what you are looking for, but the description of the possible input is a little vague.