How to get integers values from a mixed string - java

Say I have a string :
String words= "28282 Hello, my name is potato, my age is 245 112 274 141 bla bla bla etc";
Is there any way in java to get only the age 245 112 274 141 without having to use substring() . Because as my string's length in which I'm working on are inconsistent so I cannot use the substring() method

Using Streams (Java 8+), it's quite simple:
Stream.of("28282 Hello, my name is potato, my age is 245 112 274 141 bla bla bla etc"
.replaceAll("^(\\d*\\s*)?(.*)", "$2").split("\\s+"))
.filter(part -> part.matches("^\\d+$"))
.collect(Collectors.joining(" "));

you can use regex as words.replaceAll(".*age\\D*([\\d\\s]+).*","$1").split("\\s+")
This will fetch all the digits as string , Optionally you can split the to convert them into an array of numbers
Demo
.*age\\D*([\\d\\s]+).* : .* match anything, match age then match 0 or more non-digit number
([\\d\\s]+).* : match and capture, 1 or more ( digit and space ) represented as $1 then match anything till line break
Java Demo

Related

How to tokenize, scan or split this string of email addresses

For Simple Java Mail I'm trying to deal with a somewhat free-format of delimited email addresses. Note that I'm specifically not validating, just getting the addresses out of a list of addresses. For this use case the addresses can be assumed to be valid.
Here is an example of a valid input:
"name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
So there are two basic forms "name#domain.com" and "Joe Sixpack ", which can appear in a comma / semicolon delimited string, ignoring white space padding. The problem is that the names can contains delimiters as valid characters.
The following array shows the data needed (trailing spaces or delimiters would not be a big problem):
["name#domain.com",
"Sixpack, Joe 1 <name#domain.com>",
"Sixpack, Joe 2 <name#domain.com>",
"Sixpack, Joe, 3<name#domain.com>",
"nameFoo#domain.com",
"nameBar#domain.com",
"nameBaz#domain.com"]
I can't think of a clean way to deal with this. Any suggestion how I can reliably recognize whether a comma is part of a name or is a delimiter?
Final solution (variation on the accepted answer):
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
// recognize value tails and replace the delimiters there, disambiguating delimiters
const result = string
.replace(/(#.*?>?)\s*[,;]/g, "$1<|>")
.replace(/<\|>$/,"") // remove trailing delimiter
.split(/\s*<\|>\s*/) // split on delimiter including surround space
console.log(result)
Or in Java:
public static String[] extractEmailAddresses(String emailAddressList) {
return emailAddressList
.replaceAll("(#.*?>?)\\s*[,;]", "$1<|>")
.replaceAll("<\\|>$", "")
.split("\\s*<\\|>\\s*");
}
since you are not validating, i assume that the email addresses are valid.
Based on this assumption, i will look up an email address followed by ; or , this way i know its valid.
var string = "name#domain.com,Sixpack, Joe 1 <name#domain.com>, Sixpack, Joe 2 <name#domain.com> ;Sixpack, Joe, 3<name#domain.com> , nameFoo#domain.com,nameBar#domain.com;nameBaz#domain.com;"
const result = string.match(/(.*?#.*?\..*?)[,;]/g)
console.log(result)
This pattern works for your provided examples:
([^#,;\s]+#[^#,;\s]+)|(?:$|\s*[,;])(?:\s*)(.*?)<([^#,;\s]+#[^#,;\s]+)>
([^#,;\s]+#[^#,;\s]+) # email defined by an # with connected chars except ',' ';' and white-space
| # OR
(?:$|\s*[,;])(?:\s*) # start of line OR 0 or more spaces followed by a separator, then 0 or more white-space chars
(.*?) # name
<([^#,;\s]+#[^#,;\s]+)> # email enclosed by lt-gt
PCRE Demo
Using Java's replaceAll and split functions (mimicked in javascript below), I would say lock onto what you know ends an item (the ".com"), replace separator characters with a unique temp (a uuid or something like <|>), and then split using your refactored delimiter.
Here is a javascript example, but Java's repalceAll and split can do the same job.
var string = "name#domain.com,Joe Sixpack <name#domain.com>, Sixpack, Joe <name#domain.com> ;Sixpack, Joe<name#domain.com> , name#domain.com,name#domain.com;name#domain.com;"
const result = string.replace(/(\.com>?)[\s,;]+/g, "$1<|>").replace(/<\|>$/,"").split("<|>")
console.log(result)

Regex to merge multiple numbers with spaces in one line

I need a Regex to merge multiple numbers in a line without merging them all together.
Example line :
Hello World9.99 123 456.00 7 890 123.45 0.97
My desired output is :
Hello World9.99 123456.00 7890123.45 0.97
I know basic regex but am not experienced with lookaheads/behinds.
So far I created this method :
final String regex = "(?<!\\.\\d{1,3})\\s+(?=\\d{1,3}\\.?\\d{2}?)";
public String mergeNumbers(String s){
return s.replaceAll(regex, "");
}
This works fine if the number tied to the word has a dot.
But I just can't figure out how to match this line without a dot at the beginning :
Hello World99 123 456.00 7 890 123.45 0.97
This is returning :
Hello World99123456.00 7890123.45 0.97
but I want :
Hello World99 123456.00 7890123.45 0.97
So my question is :
How can I modify my regex to match both cases?
I suggest using
.replaceAll("\\b(?<!\\.)(\\d+)\\s+(?=\\d)", "$1")
See the regex demo.
Details:
\b - a word boundary
(?<!\.) - there can be no . immediately before the current location
(\d+) - Group 1 (referred to with $1 backreference from the string replacement pattern): one or more digits
\s+ - 1+ whitespaces
(?=\\d) - there must be a digit immediately to the right of the current location.

Parse a log file using java.regex.Matcher

I am learning Java programming. I have a Cisco log:
String logLine="Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP";
I am trying this regular expression:
String logPattern = "([\\w]+\\s[\\d]+\\s[\\d:]+) (\\d+:) ([*\\w]+\\s[\\d]+\\s[\\d:]+:) (\\w.+)";
But it is not fine. Could you help me?
Your string:
"Jul 15 21:12:41 router_provider_pe2 57: *Jul 15 21:12:26.223: %LDP-5-NBRCHG: LDP Neighbor 10.1.1.34:0 (3) is UP"
Your pattern:
"([\w]+\s[\d]+\s[\d:]+) (\d+:) ([*\w]+\s[\d]+\s[\d:]+:) (\w.+)"
The part of the pattern in the first set of parentheses matches Jul 15 21:12:41. The pattern expects this to be followed by a space, and then by at least one digit. But the string at this point contains a space and the letter r, which is not a digit. Therefore, there is no match.

What is Java Regex to match number then space then 3 alphabet char?

I want to validate the about of money with the currency Unit.
100 USD : valid
1.11 USD : not valid
1,12 USD : not valid
12 US : not valid
So the valid string is "the number then space then 3 alphabet char".
text.matches("^\\d+ [a-zA-Z]{3}*$")
I got error:
Exception caught: Dangling meta character '*' near index 16
^\d+ [a-zA-Z]{3}*$
So how to fix it?
i fixed obmitting * then it is fine:
text.matches("^\\d+ [a-zA-Z]{3}$")

Extracting data from a text file - repeated values

79 0009!017009!0479%0009!0479 0009!0469%0009!0469
0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449
0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419
0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009
0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339
0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032
In this data, I'm supposed to extract the number 47, 46 , 45 , 44 and so on. I´m supposed to avoid the rest. The numbers always follow this flow - 9!0 no 9%
for example: 9!0 42 9%
Which language should I go about to solve this and which function might help me?
Is there any function that can position a special character and copy the next two or three elements?
Ex: 9!0 42 9% and ' 009
look out for ! and then copy 42 from there and look out for ' that refers to another value (009). It's like two different regex to be used.
You can use whatever language you want, or even a unix command line utility like sed, awk, or grep. The regex should be something like this - you want to match 9!0 followed by digits followed by 0%. Use this regex: 9!0(\d+)0% (or if the numbers are all two digits, 9!0(\d{2})0%).
The other answers are fine, my regex solution is simply "9!.(\d\d)"
And here's a full solution in powershell, which can be easily correlated to other .net langs
$t="79 0009!017009!0479%0009!0479 0009!0469%0009!0469 0009!0459%0009!0459'009 0009!0459%0009!0449 0009!0449%0009!0449 0009!0439%0009!0439 0009!0429%0009!0429'009 0009!0429%0009!0419 0009!0419%0009!0409 000'009!0399 0009!0389%0009!0389'009 0009!0379%0009!0369 0009!0349%0009!0349 0009!0339%0009!0339 0009!0339%0009!0329'009 0009!0329%0009!0329 0009!032"
$p="9!.(\d\d)"
$ms=[regex]::match($t,$p)
while ($ms.Success) {write-host $ms.groups[1].value;$ms=$ms.NextMatch()}
This is perl:
#result = $subject =~ m/(?<=9!0)\d+(?=9%)/g;
It will give you an array of all your numbers. You didn't provide a language so I don't know if this is suitable for you or not.
Pattern regex = Pattern.compile("(?<=9!0)\\d+(?=9%)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}

Categories