For Simple Java Mail I'm trying to deal with a somewhat free-format of delimited email addresses. Note that I'm specifically not validating, just getting the addresses out of a list of addresses. For this use case the addresses can be assumed to be valid.
Here is an example of a valid input:
"name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
So there are two basic forms "name#domain.com" and "Joe Sixpack ", which can appear in a comma / semicolon delimited string, ignoring white space padding. The problem is that the names can contains delimiters as valid characters.
The following array shows the data needed (trailing spaces or delimiters would not be a big problem):
["name#domain.com",
"Sixpack, Joe 1 <name#domain.com>",
"Sixpack, Joe 2 <name#domain.com>",
"Sixpack, Joe, 3<name#domain.com>",
"nameFoo#domain.com",
"nameBar#domain.com",
"nameBaz#domain.com"]
I can't think of a clean way to deal with this. Any suggestion how I can reliably recognize whether a comma is part of a name or is a delimiter?
Final solution (variation on the accepted answer):
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
// recognize value tails and replace the delimiters there, disambiguating delimiters
const result = string
.replace(/(#.*?>?)\s*[,;]/g, "$1<|>")
.replace(/<\|>$/,"") // remove trailing delimiter
.split(/\s*<\|>\s*/) // split on delimiter including surround space
console.log(result)
Or in Java:
public static String[] extractEmailAddresses(String emailAddressList) {
return emailAddressList
.replaceAll("(#.*?>?)\\s*[,;]", "$1<|>")
.replaceAll("<\\|>$", "")
.split("\\s*<\\|>\\s*");
}
since you are not validating, i assume that the email addresses are valid.
Based on this assumption, i will look up an email address followed by ; or , this way i know its valid.
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
const result = string.match(/(.*?#.*?\..*?)[,;]/g)
console.log(result)
This pattern works for your provided examples:
([^#,;\s]+#[^#,;\s]+)|(?:$|\s*[,;])(?:\s*)(.*?)<([^#,;\s]+#[^#,;\s]+)>
([^#,;\s]+#[^#,;\s]+) # email defined by an # with connected chars except ',' ';' and white-space
| # OR
(?:$|\s*[,;])(?:\s*) # start of line OR 0 or more spaces followed by a separator, then 0 or more white-space chars
(.*?) # name
<([^#,;\s]+#[^#,;\s]+)> # email enclosed by lt-gt
PCRE Demo
Using Java's replaceAll and split functions (mimicked in javascript below), I would say lock onto what you know ends an item (the ".com"), replace separator characters with a unique temp (a uuid or something like <|>), and then split using your refactored delimiter.
Here is a javascript example, but Java's repalceAll and split can do the same job.
var string = "name#domain.com,Joe Sixpack <name#domain.com>, Sixpack, Joe <name#domain.com> ;Sixpack, Joe<name#domain.com> , name#domain.com,name#domain.com;name#domain.com;"
const result = string.replace(/(\.com>?)[\s,;]+/g, "$1<|>").replace(/<\|>$/,"").split("<|>")
console.log(result)
I need a Regex to merge multiple numbers in a line without merging them all together.
Example line :
Hello World9.99 123 456.00 7 890 123.45 0.97
My desired output is :
Hello World9.99 123456.00 7890123.45 0.97
I know basic regex but am not experienced with lookaheads/behinds.
So far I created this method :
final String regex = "(?<!\\.\\d{1,3})\\s+(?=\\d{1,3}\\.?\\d{2}?)";
public String mergeNumbers(String s){
return s.replaceAll(regex, "");
}
This works fine if the number tied to the word has a dot.
But I just can't figure out how to match this line without a dot at the beginning :
Hello World99 123 456.00 7 890 123.45 0.97
This is returning :
Hello World99123456.00 7890123.45 0.97
but I want :
Hello World99 123456.00 7890123.45 0.97
So my question is :
How can I modify my regex to match both cases?
I suggest using
.replaceAll("\\b(?<!\\.)(\\d+)\\s+(?=\\d)", "$1")
See the regex demo.
Details:
\b - a word boundary
(?<!\.) - there can be no . immediately before the current location
(\d+) - Group 1 (referred to with $1 backreference from the string replacement pattern): one or more digits
\s+ - 1+ whitespaces
(?=\\d) - there must be a digit immediately to the right of the current location.
I am learning Java programming. I have a Cisco log:
String logLine="Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP";
I am trying this regular expression:
String logPattern = "([\\w]+\\s[\\d]+\\s[\\d:]+) (\\d+:) ([*\\w]+\\s[\\d]+\\s[\\d:]+:) (\\w.+)";
But it is not fine. Could you help me?
Your string:
"Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP"
Your pattern:
"([\w]+\s[\d]+\s[\d:]+) (\d+:) ([*\w]+\s[\d]+\s[\d:]+:) (\w.+)"
The part of the pattern in the first set of parentheses matches Jul 15 21:12:41. The pattern expects this to be followed by a space, and then by at least one digit. But the string at this point contains a space and the letter r, which is not a digit. Therefore, there is no match.
I want to validate the about of money with the currency Unit.
100 USD : valid
1.11 USD : not valid
1,12 USD : not valid
12 US : not valid
So the valid string is "the number then space then 3 alphabet char".
text.matches("^\\d+ [a-zA-Z]{3}*$")
I got error:
Exception caught: Dangling meta character '*' near index 16
^\d+ [a-zA-Z]{3}*$
So how to fix it?
i fixed obmitting * then it is fine:
text.matches("^\\d+ [a-zA-Z]{3}$")
79 0009!017009!0479%0009!0479 0009!0469%0009!0469
0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449
0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419
0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009
0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339
0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032
In this data, I'm supposed to extract the number 47, 46 , 45 , 44 and so on. I´m supposed to avoid the rest. The numbers always follow this flow - 9!0 no 9%
for example: 9!0 42 9%
Which language should I go about to solve this and which function might help me?
Is there any function that can position a special character and copy the next two or three elements?
Ex: 9!0 42 9% and ' 009
look out for ! and then copy 42 from there and look out for ' that refers to another value (009). It's like two different regex to be used.
You can use whatever language you want, or even a unix command line utility like sed, awk, or grep. The regex should be something like this - you want to match 9!0 followed by digits followed by 0%. Use this regex: 9!0(\d+)0% (or if the numbers are all two digits, 9!0(\d{2})0%).
The other answers are fine, my regex solution is simply "9!.(\d\d)"
And here's a full solution in powershell, which can be easily correlated to other .net langs
$t="79 0009!017009!0479%0009!0479 0009!0469%0009!0469 0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449 0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419 0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009 0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339 0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032"
$p="9!.(\d\d)"
$ms=[regex]::match($t,$p)
while ($ms.Success) {write-host $ms.groups[1].value;$ms=$ms.NextMatch()}
This is perl:
#result = $subject =~ m/(?<=9!0)\d+(?=9%)/g;
It will give you an array of all your numbers. You didn't provide a language so I don't know if this is suitable for you or not.
Pattern regex = Pattern.compile("(?<=9!0)\\d+(?=9%)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}