Most efficient way to extract all the (natural) numbers from a string

Most efficient way to extract all the (natural) numbers from a string - java

Users may want to delimit numbers as they want.
What is the most efficient (or a simple standard function) to extract all the (natural) numbers from a string?

You could use a regular expression. I modified this example from Sun's regex matcher tutorial:
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Test {
private static final String REGEX = "\\d+";
private static final String INPUT = "dog dog 1342 dog doggie 2321 dogg";
public static void main(String[] args) {
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT); // get a matcher object
while(m.find()) {
System.out.println("start(): "+m.start());
System.out.println("end(): "+m.end());
}
}
}
It finds the start and end indexes of each number. Numbers starting with 0 are allowed with the regular expression \d+, but you could easily change that if you want to.

I'm not sure I understand your question exactly. But if all you want is to pull out all non-negative integers then this should work pretty nicely:
String foo = "12,34,56.0567 junk 6745 some - stuff tab tab 789";
String[] nums = foo.split("\\D+");
// nums = ["12", "34", "56", "0567", "6745", "789"]
and then parse out the strings as ints (if needed).

If you know the delimiter, then:
String X = "12,34,56";
String[] y = X.split(","); // d=delimiter
int[] z = new int[y.length];
for (int i = 0; i < y.length; i++ )
{
z[i] = java.lang.Integer.valueOf(y[i]).intValue();
}
If you don't, you probably need to pre-process - you could do x.replace("[A-Za-z]", " "); and replace all characters with spaces and use space as the delimiter.
Hope that helps - I don't think there is a built-in function.

Related

How to extract a number from a string in a particular format?

I have a String like this as shown below. From below string I need to extract number 123 and it can be at any position as shown below but there will be only one number in a string and it will always be in the same format _number_
text_data_123
text_data_123_abc_count
text_data_123_abc_pqr_count
text_tery_qwer_data_123
text_tery_qwer_data_123_count
text_tery_qwer_data_123_abc_pqr_count
Below is the code:
String value = "text_data_123_abc_count";
// this below code will not work as index 2 is not a number in some of the above example
int textId = Integer.parseInt(value.split("_")[2]);
What is the best way to do this?

With a little guava magic:
String value = "text_data_123_abc_count";
Integer id = Ints.tryParse(CharMatcher.inRange('0', '9').retainFrom(value)
see also CharMatcher doc

\\d+
this regex with find should do it for you.

Use Positive lookahead assertion.
Matcher m = Pattern.compile("(?<=_)\\d+(?=_)").matcher(s);
while(m.find())
{
System.out.println(m.group());
}

You can use replaceAll to remove all non-digits to leave only one number (since you say there will be only 1 number in the input string):
String s = "text_data_123_abc_count".replaceAll("[^0-9]", "");
See IDEONE demo
Instead of [^0-9] you can use \D (which also means non-digit):
String s = "text_data_123_abc_count".replaceAll("\\D", "");
Given current requirements and restrictions, the replaceAll solution seems the most convenient (no need to use Matcher directly).

u can get all parts from that string and compare with its UPPERCASE, if it is equal then u can parse it to a number and save:
public class Main {
public static void main(String[] args) {
String txt = "text_tery_qwer_data_123_abc_pqr_count";
String[] words = txt.split("_");
int num = 0;
for (String t : words) {
if(t == t.toUpperCase())
num = Integer.parseInt(t);
}
System.out.println(num);
}
}

Splitting a string between a char

I want to split a String on a delimiter.
Example String:
String str="ABCD/12346567899887455422DEFG/15479897445698742322141PQRS/141455798951";
Now I want Strings as ABCD/12346567899887455422, DEFG/15479897445698742322141 like I want
only 4 chars before /
after / any number of chars numbers and letters.
Update:
The only time I need the previous 4 characters is after a delimiter is shown, as the string may contain letters or numbers...
My code attempt:
public class StringReq {
public static void main(String[] args) {
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
testSplitStrings(str);
}
public static void testSplitStrings(String path) {
System.out.println("splitting of sprint starts \n");
String[] codeDesc = path.split("/");
String[] codeVal = new String[codeDesc.length];
for (int i = 0; i < codeDesc.length; i++) {
codeVal[i] = codeDesc[i].substring(codeDesc[i].length() - 4,
codeDesc[i].length());
System.out.println("line" + i + "==> " + codeDesc[i] + "\n");
}
for (int i = 0; i < codeVal.length - 1; i++) {
System.out.println(codeVal[i]);
}
System.out.println("splitting of sprint ends");
}
}

You claim that after / there can appear digits and alphabets, but in your example I don't see any alphabets which should be included in result after /.
So based on that assumption you can simply split in placed which has digit before and A-Z character after it.
To do so you can split with regex which is using look-around mechanism like str.split("(?<=[0-9])(?=[A-Z])")
Demo:
String str = "BONL/1234567890123456789CORT/123456789012345678901234567890HOLD/123456789012345678901234567890INTC/123456789012345678901234567890OTHR/123456789012345678901234567890PHOB/123456789012345678901234567890PHON/123456789012345678901234567890REPA/123456789012345678901234567890SDVA/123456789012345678901234567890TELI/123456789012345678901234567890";
for (String s : str.split("(?<=[0-9])(?=[A-Z])"))
System.out.println(s);
Output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
If you alphabets can actually appear in second part (after /) then you can use split which will try to find places which have four alphabetic characters and / after it like split("(?=[A-Z]{4}/)") (assuming that you are using at least Java 8, if not you will need to manually exclude case of splitting at start of the string for instance by adding (?!^) or (?<=.) at start of your regex).

you can use regex
Pattern pattern = Pattern.compile("[A-Z]{4}/[0-9]*");
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}

Instead of:
String[] codeDesc = path.split("/");
Just use this regex (4 characters before / and any characters after):
String[] codeDesc = path.split("(?=.{4}/)(?<=.)");

Even simpler using \d:
path.split("(?=[A-Za-z])(?<=\\d)");
EDIT:
Included condition for 4 any size letters only.
path.split("(?=[A-Za-z]{4})(?<=\\d)");
output:
BONL/1234567890123456789
CORT/123456789012345678901234567890
HOLD/123456789012345678901234567890
INTC/123456789012345678901234567890
OTHR/123456789012345678901234567890
PHOB/123456789012345678901234567890
PHON/123456789012345678901234567890
REPA/123456789012345678901234567890
SDVA/123456789012345678901234567890
TELI/123456789012345678901234567890
It is still unclear if this is authors expected result.

Replacing digits separated with commas using String.replace("","");

I have a string which looks like following:
Turns 13,000,000 years old
Now i want to convert the digits to words in English, I have a function ready for that however I am finding problems to detect the original numbers (13,000,000) in this case, because it is separated by commas.
Currently I am using the following regex to detect a number in a string:
stats = stats.replace((".*\\d.*"), (NumberToWords.start(Integer.valueOf(notification_data_greet))));
But the above seems not to work, any suggestions?

You need to extract the number using a RegEx wich allows for the commas. The most robust one I can think of right now is
\d{1,3}(,?\d{3})*
Wich matches any unsigned Integer both with correctly placed commas and without commas (and weird combinations thereof like 100,000000)
Then replace all , from the match by the empty String and you can parse as usual:
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
int n = Integer.parseInt(num);
// Do stuff with the number n
}
Working example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws InterruptedException {
String input = "1,300,000,000";
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
System.out.println(num);
int n = Integer.parseInt(num);
System.out.println(n);
}
}
}
Gives output
1300000000
1300000000

Try this regex:
[0-9][0-9]?[0-9]?([0-9][0-9][0-9](,)?)*
This matches numbers that are seperated by a comma for each 1000. So it will match
10,000,000
but not
10,1,1,1

You can do it with the help of DecimalFormat instead of a regular expression
DecimalFormat format = (DecimalFormat) DecimalFormat.getInstance();
System.out.println(format.parse("10,000,000"));

Try the below regex to match the comma separted numbers,
\d{1,3}(,\d{3})+
Make the last part as optional to match also the numbers which aren't separated by commas,
\d{1,3}(,\d{3})*

Need to extract an integer from a filename string

I have several png image files with names like this -
house_number_5.png
house_number_512.png
house_number_52352.png
I need to extract the integers from these filenames...5, 12, 2352 in the case above. Anyone know how to do this?

just copy and paste. it is a really working version. (and sorry for the previous version which doesn't work)
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexTest {
public static void main(String[] args){
Pattern p = Pattern.compile("house_(\\d+)\\.png");
Matcher m = p.matcher("house_234.png");
if (m.find()) {
System.out.println(m.group(1)); //print the number
}
}
}
result
234

If you want to do it without regex:
/* assume valid input */
public int getNumber(String filePath)
{
int startPos = filePath.lastIndexOf("_");
int dotPos = filePath.indexOf(".", lastUnderscorePos);
String numberString = filePath.substring(startPos + 1, dotPos);
return Integer.parseInt(numberString);
}

Pattern intsOnly = Pattern.compile("\\d+");
Matcher makeMatch = intsOnly.matcher("house_number_5.png");
makeMatch.find();
String inputInt = makeMatch.group();
System.out.println(inputInt);

Get the filename
Remove the .png using substring(..) method.
Use Stringtokenizer , use split(..) method using underscore '_' as the split type.
The third token from StringTokenizer will be the number,convert it to integer using parseInt.

Replace-all works with regular expressions:
"house_number_52352.png".replaceAll (".*[^0-9]([0-9]+)\\.png", "$1")
.*[^0-9] take a long chain of characters, which end in a non digit ...
followed by at least one digit
and a literal dot
and a literal png
Replace the whole thing by the group of (at least one digit).

replace StringTokenizer by String.split(..)

Is it possible to build a regexp for use with Javas Pattern.split(..) method to reproduce the StringTokenizer("...", "...", true) behaveiour?
So that the input is split to an alternating sequence of the predefined token characters and any abitrary strings running between them.
The JRE reference states for StringTokenizer it should be considered deprecated and String.split(..) could be used instead way. So it is considered possible there.
The reason I want to use split is that regular expressions are often highly optimized. The StringTokenizer for example is quite slow on the Android Platforms VM, while regex patterns are executed by optimized native code there it seems.

Considering that the documentation for split doesn't specify this behavior and has only one optional parameter that tells how large the array should be.. no you can't.
Also looking at the only other class I can think of that could have this feature - a scanner - it doesn't either. So I think the easiest would be to continue using the Tokenizer, even if it's deprecated. Better than writing your own class - while that shouldn't be too hard (quite trivial really) I can think of better ways to spend ones time.

a regex Pattern can help you
Patter p = Pattern.compile("(.*?)(\\s*)");
//put the boundary regex in between the second brackets (where the \\s* now is)
Matcher m = p.matcher(string);
int endindex=0;
while(m.find(endindex)){
//m.group(1) is the part between the pattern
//m.group(2) is the match found of the pattern
endindex = m.end();
}
//then the remainder of the string is string.substring(endindex);

import java.util.List;
import java.util.LinkedList;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class Splitter {
public Splitter(String s, String delimiters) {
this.string = s;
this.delimiters = delimiters;
Pattern pattern = Pattern.compile(delimiters);
this.matcher = pattern.matcher(string);
}
public String[] split() {
String[] strs = string.split(delimiters);
String[] delims = delimiters();
if (strs.length == 0) { return new String[0];}
assert(strs.length == delims.length + 1);
List<String> output = new LinkedList<String>();
int i;
for(i = 0;i < delims.length;i++) {
output.add(strs[i]);
output.add(delims[i]);
}
output.add(strs[i]);
return output.toArray(new String[0]);
}
private String[] delimiters() {
List<String> delims = new LinkedList<String>();
while(matcher.find()) {
delims.add(string.subSequence(matcher.start(), matcher.end()).toString());
}
return delims.toArray(new String[0]);
}
public static void main(String[] args) {
Splitter s = new Splitter("a b\tc", "[ \t]");
String[] tokensanddelims = s.split();
assert(tokensanddelims.length == 5);
System.out.print(tokensanddelims[0].equals("a"));
System.out.print(tokensanddelims[1].equals(" "));
System.out.print(tokensanddelims[2].equals("b"));
System.out.print(tokensanddelims[3].equals("\t"));
System.out.print(tokensanddelims[4].equals("c"));
}
private Matcher matcher;
private String string;
private String delimiters;
}

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Most efficient way to extract all the (natural) numbers from a string - java

Users may want to delimit numbers as they want. What is the most efficient (or a simple standard function) to extract all the (natural) numbers from a string?

Related

How to extract a number from a string in a particular format?

Splitting a string between a char

Replacing digits separated with commas using String.replace("","");

Need to extract an integer from a filename string

replace StringTokenizer by String.split(..)

Categories

Resources