Replacing digits separated with commas using String.replace("",""); - java

I have a string which looks like following:
Turns 13,000,000 years old
Now i want to convert the digits to words in English, I have a function ready for that however I am finding problems to detect the original numbers (13,000,000) in this case, because it is separated by commas.
Currently I am using the following regex to detect a number in a string:
stats = stats.replace((".*\\d.*"), (NumberToWords.start(Integer.valueOf(notification_data_greet))));
But the above seems not to work, any suggestions?

You need to extract the number using a RegEx wich allows for the commas. The most robust one I can think of right now is
\d{1,3}(,?\d{3})*
Wich matches any unsigned Integer both with correctly placed commas and without commas (and weird combinations thereof like 100,000000)
Then replace all , from the match by the empty String and you can parse as usual:
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
int n = Integer.parseInt(num);
// Do stuff with the number n
}
Working example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws InterruptedException {
String input = "1,300,000,000";
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
System.out.println(num);
int n = Integer.parseInt(num);
System.out.println(n);
}
}
}
Gives output
1300000000
1300000000

Try this regex:
[0-9][0-9]?[0-9]?([0-9][0-9][0-9](,)?)*
This matches numbers that are seperated by a comma for each 1000. So it will match
10,000,000
but not
10,1,1,1

You can do it with the help of DecimalFormat instead of a regular expression
DecimalFormat format = (DecimalFormat) DecimalFormat.getInstance();
System.out.println(format.parse("10,000,000"));

Try the below regex to match the comma separted numbers,
\d{1,3}(,\d{3})+
Make the last part as optional to match also the numbers which aren't separated by commas,
\d{1,3}(,\d{3})*

Related

Is there a regex where if first expression is valid then check for next [duplicate]

I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}

How to extract a number from a string in a particular format?

I have a String like this as shown below. From below string I need to extract number 123 and it can be at any position as shown below but there will be only one number in a string and it will always be in the same format _number_
text_data_123
text_data_123_abc_count
text_data_123_abc_pqr_count
text_tery_qwer_data_123
text_tery_qwer_data_123_count
text_tery_qwer_data_123_abc_pqr_count
Below is the code:
String value = "text_data_123_abc_count";
// this below code will not work as index 2 is not a number in some of the above example
int textId = Integer.parseInt(value.split("_")[2]);
What is the best way to do this?
With a little guava magic:
String value = "text_data_123_abc_count";
Integer id = Ints.tryParse(CharMatcher.inRange('0', '9').retainFrom(value)
see also CharMatcher doc
\\d+
this regex with find should do it for you.
Use Positive lookahead assertion.
Matcher m = Pattern.compile("(?<=_)\\d+(?=_)").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
You can use replaceAll to remove all non-digits to leave only one number (since you say there will be only 1 number in the input string):
String s = "text_data_123_abc_count".replaceAll("[^0-9]", "");
See IDEONE demo
Instead of [^0-9] you can use \D (which also means non-digit):
String s = "text_data_123_abc_count".replaceAll("\\D", "");
Given current requirements and restrictions, the replaceAll solution seems the most convenient (no need to use Matcher directly).
u can get all parts from that string and compare with its UPPERCASE, if it is equal then u can parse it to a number and save:
public class Main {
public static void main(String[] args) {
String txt = "text_tery_qwer_data_123_abc_pqr_count";
String[] words = txt.split("_");
int num = 0;
for (String t : words) {
if(t == t.toUpperCase())
num = Integer.parseInt(t);
}
System.out.println(num);
}
}

Currency values string split by comma

I have a String which contains formatted currency values like 45,890.00 and multiple values seperated by comma like 45,890.00,12,345.00,23,765.34,56,908.50 ..
I want to extract and process all the currency values, but could not figure out the correct regular expression for this, This is what I have tried
public static void main(String[] args) {
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50";
String regEx = "\\.[0-9]{2}[,]";
String[] results = currencyValues.split(regEx);
//System.out.println(Arrays.toString(results));
for(String res : results) {
System.out.println(res);
}
}
The output of this is:
45,890 //removing the decimals as the reg ex is exclusive
12,345
23,765
56,908.50
Could someone please help me with this one?
You need a regex "look behind" (?<=regex), which matches, but does consume:
String regEx = "(?<=\\.[0-9]{2}),";
Here's your test case now working:
public static void main(String[] args) {
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50";
String regEx = "(?<=\\.[0-9]{2}),"; // Using the regex with the look-behind
String[] results = currencyValues.split(regEx);
for (String res : results) {
System.out.println(res);
}
}
Output:
45,890.00
12,345.00
23,765.34
56,908.50
You could also use a different regular expression to match the pattern that you're searching for (then it doesn't really matter what the separator is):
String currencyValues = "45,890.00,12,345.00,23,765.34,56,908.50,55.00,345,432.00";
Pattern pattern = Pattern.compile("(\\d{1,3},)?\\d{1,3}\\.\\d{2}");
Matcher m = pattern.matcher(currencyValues);
while (m.find()) {
System.out.println(m.group());
}
prints
45,890.00
12,345.00
23,765.34
56,908.50
55.00
345,432.00
Explanation of the regex:
\\d matches a digit
\\d{1,3} matches 1-3 digits
(\\d{1,3},)? optionally matches 1-3 digits followed by a comma.
\\. matches a dot
\\d{2} matches 2 digits.
However, I would also say that having comma as a separator is probably not the best design and would probably lead to confusion.
EDIT:
As #tobias_k points out: \\d{1,3}(,\\d{3})*\\.\\d{2} would be a better regex, as it would correctly match:
1,000,000,000.00
and it won't incorrectly match:
1,00.00
In all of the above solutions, it takes care if all values in the string are decimal values with a comma. What if the currency value string looks like this:
String str = "1,123.67aed,34,234.000usd,1234euro";
Here not all values are decimals. There should be a way to decide if the currency is in decimal or integer.

Need to extract an integer from a filename string

I have several png image files with names like this -
house_number_5.png
house_number_512.png
house_number_52352.png
I need to extract the integers from these filenames...5, 12, 2352 in the case above. Anyone know how to do this?
just copy and paste. it is a really working version. (and sorry for the previous version which doesn't work)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args){
Pattern p = Pattern.compile("house_(\\d+)\\.png");
Matcher m = p.matcher("house_234.png");
if (m.find()) {
System.out.println(m.group(1)); //print the number
}
}
}
result
234
If you want to do it without regex:
/* assume valid input */
public int getNumber(String filePath)
{
int startPos = filePath.lastIndexOf("_");
int dotPos = filePath.indexOf(".", lastUnderscorePos);
String numberString = filePath.substring(startPos + 1, dotPos);
return Integer.parseInt(numberString);
}
Pattern intsOnly = Pattern.compile("\\d+");
Matcher makeMatch = intsOnly.matcher("house_number_5.png");
makeMatch.find();
String inputInt = makeMatch.group();
System.out.println(inputInt);
Get the filename
Remove the .png using substring(..) method.
Use Stringtokenizer , use split(..) method using underscore '_' as the split type.
The third token from StringTokenizer will be the number,convert it to integer using parseInt.
Replace-all works with regular expressions:
"house_number_52352.png".replaceAll (".*[^0-9]([0-9]+)\\.png", "$1")
.*[^0-9] take a long chain of characters, which end in a non digit ...
followed by at least one digit
and a literal dot
and a literal png
Replace the whole thing by the group of (at least one digit).

Most efficient way to extract all the (natural) numbers from a string

Users may want to delimit numbers as they want.
What is the most efficient (or a simple standard function) to extract all the (natural) numbers from a string?
You could use a regular expression. I modified this example from Sun's regex matcher tutorial:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Test {
private static final String REGEX = "\\d+";
private static final String INPUT = "dog dog 1342 dog doggie 2321 dogg";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
while(m.find()) {
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
It finds the start and end indexes of each number. Numbers starting with 0 are allowed with the regular expression \d+, but you could easily change that if you want to.
I'm not sure I understand your question exactly. But if all you want is to pull out all non-negative integers then this should work pretty nicely:
String foo = "12,34,56.0567 junk 6745 some - stuff tab tab 789";
String[] nums = foo.split("\\D+");
// nums = ["12", "34", "56", "0567", "6745", "789"]
and then parse out the strings as ints (if needed).
If you know the delimiter, then:
String X = "12,34,56";
String[] y = X.split(","); // d=delimiter
int[] z = new int[y.length];
for (int i = 0; i < y.length; i++ )
{
z[i] = java.lang.Integer.valueOf(y[i]).intValue();
}
If you don't, you probably need to pre-process - you could do x.replace("[A-Za-z]", " "); and replace all characters with spaces and use space as the delimiter.
Hope that helps - I don't think there is a built-in function.

Categories