Currency values string split by comma - java

I have a String which contains formatted currency values like 45,890.00 and multiple values seperated by comma like 45,890.00,12,345.00,23,765.34,56,908.50 ..
I want to extract and process all the currency values, but could not figure out the correct regular expression for this, This is what I have tried
public static void main(String[] args) {
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50";
String regEx = "\\.[0-9]{2}[,]";
String[] results = currencyValues.split(regEx);
//System.out.println(Arrays.toString(results));
for(String res : results) {
System.out.println(res);
}
}
The output of this is:
45,890 //removing the decimals as the reg ex is exclusive
12,345
23,765
56,908.50
Could someone please help me with this one?

You need a regex "look behind" (?<=regex), which matches, but does consume:
String regEx = "(?<=\\.[0-9]{2}),";
Here's your test case now working:
public static void main(String[] args) {
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50";
String regEx = "(?<=\\.[0-9]{2}),"; // Using the regex with the look-behind
String[] results = currencyValues.split(regEx);
for (String res : results) {
System.out.println(res);
}
}
Output:
45,890.00
12,345.00
23,765.34
56,908.50

You could also use a different regular expression to match the pattern that you're searching for (then it doesn't really matter what the separator is):
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50,55.00,345,432.00";
Pattern pattern = Pattern.compile("(\\d{1,3},)?\\d{1,3}\\.\\d{2}");
Matcher m = pattern.matcher(currencyValues);
while (m.find()) {
System.out.println(m.group());
}
prints
45,890.00
12,345.00
23,765.34
56,908.50
55.00
345,432.00
Explanation of the regex:
\\d matches a digit
\\d{1,3} matches 1-3 digits
(\\d{1,3},)? optionally matches 1-3 digits followed by a comma.
\\. matches a dot
\\d{2} matches 2 digits.
However, I would also say that having comma as a separator is probably not the best design and would probably lead to confusion.
EDIT:
As #tobias_k points out: \\d{1,3}(,\\d{3})*\\.\\d{2} would be a better regex, as it would correctly match:
1,000,000,000.00
and it won't incorrectly match:
1,00.00

In all of the above solutions, it takes care if all values in the string are decimal values with a comma. What if the currency value string looks like this:
String str = "1,123.67aed,34,234.000usd,1234euro";
Here not all values are decimals. There should be a way to decide if the currency is in decimal or integer.

Related

Java String keep numeric characters only at the end of a String

what is the regular expression so I can keep only the LAST numbers at the END of a String?
For example
Test123 -> 123
T34est56 -> 56
123Test89 -> 89
Thanks
I tried
str.replaceAll("[^A-Za-z\\s]", ""); but this removes all the numbers of the String.
I also tried str.replaceAll("\\d*$", ""); but this returns the following:
Test123 -> Test
T34est56 -> T34est
123Test89 -> 123Test
I want exactly the opposite.
Getting group of the last integers in line and then replacing string with that group seems to work:
String str = "123Test89";
String result = str.replaceAll(".*[^\\d](\\d+$)", "$1");
System.out.println(result);
This outputs:
89
You can use replaceFirst() to remove everything (.*) up to the last non-digit (\\D):
s = s.replaceFirst(".*\\D", "");
Complete example:
public class C {
public static void main(String args[]) {
String s = "T34est56";
s = s.replaceFirst(".*\\D", "");
System.out.println(s); // 56
}
}
You could use a regex like this:
String result = str.replaceFirst(".*?(\\d+$)", "$1");
Try it online.
Explanation:
.*: Any amount of leading characters
?: Optionally. This makes sure the regex part after it ((\\d+$)) has priority over the .*. Without the ?, every test case would only return the very last digit (i.e. 123Test89 would return 9 instead of 89).
\\d+: One or more digits
$: At the very end of the string
(...): Captured in a capture group
Which is then replaced with:
$1: The match of the first capture group (so the trailing digits)
To perhaps make it slightly more clear, you could add a leading ^ to the regex: "^.*?(\\d+$)", although it's not really necessary because .* already matches every leading character.
I like to use the Pattern and Matcher API:
Pattern pattern = Pattern.compile("[1-9]*$");
Matcher matcher = pattern.matcher("Test123");
if (matcher.find()) {
System.out.println(matcher.group()); // 123
}
I think use /.*?(\d+)$/, it will work.

Split a sentence in array of string with special characters or spaces intact

I want to split a sentence having spaces or any special character into an array of words with spaces or special character also an element of array.
Sentence like:
aman,amit and sumit went to top-up
should be split into an array of String:
{"aman",",","amit"," ","and"," ","sumit"," ","went"," ","to"," ","top","-","up")
Please suggest any regex or logic to split the same using java.
I missed one thing in my question. I also need to split on numeric character as well.. But using split("\b") does not split a string having something like
abc12def
into
{ "abc", "12","def") or {"abc","1","2","def")
It seems all you need is to match either word characters (\w+) or non-word ones (\W+). Combine these with an alternation operator and - perhaps - add a Pattern.UNICODE_CHARACTER_CLASS (or its inline/embedded version (?U)) to make the pattern Unicode-aware:
String value = "aman,amit and sumit went to top-up";
String pattern = "(?U)\\w+|\\W+";
List<String> lst = new ArrayList<>();
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(value);
while (m.find())
lst.add(m.group(0));
System.out.println(lst);
See the Java demo
I hope the below code snippet helps you solve this.
public static void main(final String[] args) {
String message = "aman,amit and sumit went to top-up";
String[] messages = message.split("\\b");
for(String string : messages) {
System.out.println(string);
}
}

Java regular expressions starts and ends with and contains

I have a file that I need to use regex to replace a specific character.
I have strings of the following format:
1234 4215 "aaa.bbb" 5215 1524
and I need to replace the periods with colons.
I know that these periods are always contained within quotation marks, so I need a regex that finds a substring that starts with '"', ends with '"', and contains "." and replace the "." with ":". Could someone shed some light?
You can use:
str = str.replaceAll("\\.(?!(([^"]*"){2})*[^"]*$)", ":");
RegEx Demo
This regex will find dots if those are inside double quotes by using a lookahead to make sure there are NOT even number of quotes after the dot.
Update
After thinking about it, your question says "period(s)" possibly more than one period in double quotes.
Here's a way to cover that scenario
public static void main(String[] args) throws Exception {
String str = "1234 \"aaa.bbb\" \"a.aa.b.bb\" 5215 1524 \"12.345.123\" \".sage.\" \".afwe\" \"....\"";
// Find all substrings in double quotes
Matcher matcher = Pattern.compile("\"(.*?)\"").matcher(str);
while (matcher.find()) {
// Extract the match
String match = matcher.group(1);
// Replace all the periods with colons
match = match.replaceAll("\\.", ":");
// Replace the original matched group with the new string
str = str.replace(matcher.group(1), match);
}
System.out.println(str);
}
Results:
1234 "aaa:bbb" "a:aa:b:bb" 5215 1524 "12:345:123" ":sage:" ":afwe" "::::"
And after testing #anubhava pattern, his produces the same results so more credit to him for simplicity (+1).
OLD ANSWER
You can try this pattern in a String.replaceAll()
"\"([^\\.]*?)(\\.)([^\\.]*?)\""
With a replacement of
"\"$1:$3\""
This essentially captures the contents, between double quotes, into groups (1-3).
Group 1 ($1) - All characters, present or not (*?), that is not a period
Group 2 ($2) - The period
Group 3 ($3) - All characters, present or not (*?), that is not a period
and replaces it with "{Group 1}:{Group 3}"
public static void main(String[] args) throws Exception {
String str = "1234 4215 \"aaa.bbb\" 5215 1524 \"12345.123\" \"sage.\" \".afwe\" \".\"";
System.out.println(str.replaceAll("\"([^\\.]*?)(\\.)([^\\.]*?)\"", "\"$1:$3\""));
}
Results:
1234 4215 "aaa:bbb" 5215 1524 "12345:123" "sage:" ":afwe" ":"

Replacing digits separated with commas using String.replace("","");

I have a string which looks like following:
Turns 13,000,000 years old
Now i want to convert the digits to words in English, I have a function ready for that however I am finding problems to detect the original numbers (13,000,000) in this case, because it is separated by commas.
Currently I am using the following regex to detect a number in a string:
stats = stats.replace((".*\\d.*"), (NumberToWords.start(Integer.valueOf(notification_data_greet))));
But the above seems not to work, any suggestions?
You need to extract the number using a RegEx wich allows for the commas. The most robust one I can think of right now is
\d{1,3}(,?\d{3})*
Wich matches any unsigned Integer both with correctly placed commas and without commas (and weird combinations thereof like 100,000000)
Then replace all , from the match by the empty String and you can parse as usual:
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
int n = Integer.parseInt(num);
// Do stuff with the number n
}
Working example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws InterruptedException {
String input = "1,300,000,000";
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
System.out.println(num);
int n = Integer.parseInt(num);
System.out.println(n);
}
}
}
Gives output
1300000000
1300000000
Try this regex:
[0-9][0-9]?[0-9]?([0-9][0-9][0-9](,)?)*
This matches numbers that are seperated by a comma for each 1000. So it will match
10,000,000
but not
10,1,1,1
You can do it with the help of DecimalFormat instead of a regular expression
DecimalFormat format = (DecimalFormat) DecimalFormat.getInstance();
System.out.println(format.parse("10,000,000"));
Try the below regex to match the comma separted numbers,
\d{1,3}(,\d{3})+
Make the last part as optional to match also the numbers which aren't separated by commas,
\d{1,3}(,\d{3})*

Regex to replace a repeating string pattern

I need to replace a repeated pattern within a word with each basic construct unit. For example
I have the string "TATATATA" and I want to replace it with "TA". Also I would probably replace more than 2 repetitions to avoid replacing normal words.
I am trying to do it in Java with replaceAll method.
I think you want this (works for any length of the repeated string):
String result = source.replaceAll("(.+)\\1+", "$1")
Or alternatively, to prioritize shorter matches:
String result = source.replaceAll("(.+?)\\1+", "$1")
It matches first a group of letters, and then it again (using back-reference within the match pattern itself). I tried it and it seems to do the trick.
Example
String source = "HEY HEY duuuuuuude what'''s up? Trololololo yeye .0.0.0";
System.out.println(source.replaceAll("(.+?)\\1+", "$1"));
// HEY dude what's up? Trolo ye .0
You had better use a Pattern here than .replaceAll(). For instance:
private static final Pattern PATTERN
= Pattern.compile("\\b([A-Z]{2,}?)\\1+\\b");
//...
final Matcher m = PATTERN.matcher(input);
ret = m.replaceAll("$1");
edit: example:
public static void main(final String... args)
{
System.out.println("TATATA GHRGHRGHRGHR"
.replaceAll("\\b([A-Za-z]{2,}?)\\1+\\b", "$1"));
}
This prints:
TA GHR
Since you asked for a regex solution:
(\\w)(\\w)(\\1\\2){2,};
(\w)(\w): matches every pair of consecutive word characters ((.)(.) will catch every consecutive pair of characters of any type), storing them in capturing groups 1 and 2. (\\1\\2) matches anytime the characters in those groups are repeated again immediately afterward, and {2,} matches when it repeats two or more times ({2,10} would match when it repeats more than one but less than ten times).
String s = "hello TATATATA world";
Pattern p = Pattern.compile("(\\w)(\\w)(\\1\\2){2,}");
Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group());
//prints "TATATATA"

Categories