I am using the following logic to extract double value from a String. It works fine on the first String but raise an Exception on second String. The raised Exception is java.util.NoSuchElementException.
public class StringHandling {
public String processString(String string)
{
Scanner st = new Scanner(string);
while (!st.hasNextDouble())
{
st.next();
}
double value = st.nextDouble();
return String.valueOf(value);
}
public static void main(String[] args)
{
String first = "Hey, he is 70.3 miles away.";
String second = "{\"Hey\", \"he\" \"is\": 1.0, \"miles\" away}";
StringHandling sh = new StringHandling();
System.out.println("First Value is "+sh.processString(first));
System.out.println("Second Value is "+sh.processString(second));
}
}
I just want to know why it is raising the Exception.
This is the problem:
"{\"Hey\", \"he\" \"is\": 1.0, \"miles\" away}"
The next method of class Scanner by default gives you the next input until a space is reached.
The next method would fetch it like that:
{"Hey",
"he"
"is":
1.0,
"miles"
away}
In that case, you have 1.0, which is not a double (note the comma).
That's why you get a NoSuchElementException: you keep doing st.next(), but a double is never found, so the end of the string gets reached and the Scanner doesn't find other elements.
You could use a regex pattern instead of the scanner:
public String processString(String string) {
Pattern p = Pattern.compile("(\\d+(?:\\.\\d+))");
Matcher m = p.matcher(string);
while(m.find()) {
return m.group(1);
}
return null;
}
The Problem
By default, a Scanner splits on white space. So your second example finds 1.0, as one of the inputs, which it doesn't recognise as a double.
Your code also assumes that you'll exit the while loop because you found a double. In this case, there is no value to read so you get an exception.
Solution 1 — Change Delimiter
One option is to change the delimiter used by your Scanner object. Perhaps to something like this:
st.useDelimiter("[^\\w.]");
This will break your input into "words" containing any combination of alphanumeric characters, underscores and periods.
Solution 2 — Regex
You can find this value with a regular expression. The example below looks for numbers that include a decimal point and at least one decimal place:
public static String findWithRegex(String string) {
Pattern pattern = Pattern.compile("\\d+\\.\\d+");
Matcher matcher = pattern.matcher(string);
if (matcher.find()) {
return matcher.group();
} else {
// do something more useful here
return "none";
}
}
Related
I have several strings in the rough form:
[some text] [some number] [some more text]
I want to extract the text in [some number] using the Java Regex classes.
I know roughly what regular expression I want to use (though all suggestions are welcome). What I'm really interested in are the Java calls to take the regex string and use it on the source data to produce the value of [some number].
EDIT: I should add that I'm only interested in a single [some number] (basically, the first instance). The source strings are short and I'm not going to be looking for multiple occurrences of [some number].
Full example:
private static final Pattern p = Pattern.compile("^([a-zA-Z]+)([0-9]+)(.*)");
public static void main(String[] args) {
// create matcher for pattern p and given string
Matcher m = p.matcher("Testing123Testing");
// if an occurrence if a pattern was found in a given string...
if (m.find()) {
// ...then you can use group() methods.
System.out.println(m.group(0)); // whole matched expression
System.out.println(m.group(1)); // first expression from round brackets (Testing)
System.out.println(m.group(2)); // second one (123)
System.out.println(m.group(3)); // third one (Testing)
}
}
Since you're looking for the first number, you can use such regexp:
^\D+(\d+).*
and m.group(1) will return you the first number. Note that signed numbers can contain a minus sign:
^\D+(-?\d+).*
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Regex1 {
public static void main(String[]args) {
Pattern p = Pattern.compile("\\d+");
Matcher m = p.matcher("hello1234goodboy789very2345");
while(m.find()) {
System.out.println(m.group());
}
}
}
Output:
1234
789
2345
Allain basically has the java code, so you can use that. However, his expression only matches if your numbers are only preceded by a stream of word characters.
"(\\d+)"
should be able to find the first string of digits. You don't need to specify what's before it, if you're sure that it's going to be the first string of digits. Likewise, there is no use to specify what's after it, unless you want that. If you just want the number, and are sure that it will be the first string of one or more digits then that's all you need.
If you expect it to be offset by spaces, it will make it even more distinct to specify
"\\s+(\\d+)\\s+"
might be better.
If you need all three parts, this will do:
"(\\D+)(\\d+)(.*)"
EDIT The Expressions given by Allain and Jack suggest that you need to specify some subset of non-digits in order to capture digits. If you tell the regex engine you're looking for \d then it's going to ignore everything before the digits. If J or A's expression fits your pattern, then the whole match equals the input string. And there's no reason to specify it. It probably slows a clean match down, if it isn't totally ignored.
In addition to Pattern, the Java String class also has several methods that can work with regular expressions, in your case the code will be:
"ab123abc".replaceFirst("\\D*(\\d*).*", "$1")
where \\D is a non-digit character.
In Java 1.4 and up:
String input = "...";
Matcher matcher = Pattern.compile("[^0-9]+([0-9]+)[^0-9]+").matcher(input);
if (matcher.find()) {
String someNumberStr = matcher.group(1);
// if you need this to be an int:
int someNumberInt = Integer.parseInt(someNumberStr);
}
This function collect all matching sequences from string. In this example it takes all email addresses from string.
static final String EMAIL_PATTERN = "[_A-Za-z0-9-\\+]+(\\.[_A-Za-z0-9-]+)*#"
+ "[A-Za-z0-9-]+(\\.[A-Za-z0-9]+)*(\\.[A-Za-z]{2,})";
public List<String> getAllEmails(String message) {
List<String> result = null;
Matcher matcher = Pattern.compile(EMAIL_PATTERN).matcher(message);
if (matcher.find()) {
result = new ArrayList<String>();
result.add(matcher.group());
while (matcher.find()) {
result.add(matcher.group());
}
}
return result;
}
For message = "adf#gmail.com, <another#osiem.osiem>>>> lalala#aaa.pl" it will create List of 3 elements.
Try doing something like this:
Pattern p = Pattern.compile("^.+(\\d+).+");
Matcher m = p.matcher("Testing123Testing");
if (m.find()) {
System.out.println(m.group(1));
}
Simple Solution
// Regexplanation:
// ^ beginning of line
// \\D+ 1+ non-digit characters
// (\\d+) 1+ digit characters in a capture group
// .* 0+ any character
String regexStr = "^\\D+(\\d+).*";
// Compile the regex String into a Pattern
Pattern p = Pattern.compile(regexStr);
// Create a matcher with the input String
Matcher m = p.matcher(inputStr);
// If we find a match
if (m.find()) {
// Get the String from the first capture group
String someDigits = m.group(1);
// ...do something with someDigits
}
Solution in a Util Class
public class MyUtil {
private static Pattern pattern = Pattern.compile("^\\D+(\\d+).*");
private static Matcher matcher = pattern.matcher("");
// Assumptions: inputStr is a non-null String
public static String extractFirstNumber(String inputStr){
// Reset the matcher with a new input String
matcher.reset(inputStr);
// Check if there's a match
if(matcher.find()){
// Return the number (in the first capture group)
return matcher.group(1);
}else{
// Return some default value, if there is no match
return null;
}
}
}
...
// Use the util function and print out the result
String firstNum = MyUtil.extractFirstNumber("Testing4234Things");
System.out.println(firstNum);
Look you can do it using StringTokenizer
String str = "as:"+123+"as:"+234+"as:"+345;
StringTokenizer st = new StringTokenizer(str,"as:");
while(st.hasMoreTokens())
{
String k = st.nextToken(); // you will get first numeric data i.e 123
int kk = Integer.parseInt(k);
System.out.println("k string token in integer " + kk);
String k1 = st.nextToken(); // you will get second numeric data i.e 234
int kk1 = Integer.parseInt(k1);
System.out.println("new string k1 token in integer :" + kk1);
String k2 = st.nextToken(); // you will get third numeric data i.e 345
int kk2 = Integer.parseInt(k2);
System.out.println("k2 string token is in integer : " + kk2);
}
Since we are taking these numeric data into three different variables we can use this data anywhere in the code (for further use)
How about [^\\d]*([0-9]+[\\s]*[.,]{0,1}[\\s]*[0-9]*).* I think it would take care of numbers with fractional part.
I included white spaces and included , as possible separator.
I'm trying to get the numbers out of a string including floats and taking into account that the user might make a mistake and include white spaces while typing the number.
Sometimes you can use simple .split("REGEXP") method available in java.lang.String. For example:
String input = "first,second,third";
//To retrieve 'first'
input.split(",")[0]
//second
input.split(",")[1]
//third
input.split(",")[2]
if you are reading from file then this can help you
try{
InputStream inputStream = (InputStream) mnpMainBean.getUploadedBulk().getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));
String line;
//Ref:03
while ((line = br.readLine()) != null) {
if (line.matches("[A-Z],\\d,(\\d*,){2}(\\s*\\d*\\|\\d*:)+")) {
String[] splitRecord = line.split(",");
//do something
}
else{
br.close();
//error
return;
}
}
br.close();
}
}
catch (IOException ioExpception){
logger.logDebug("Exception " + ioExpception.getStackTrace());
}
Pattern p = Pattern.compile("(\\D+)(\\d+)(.*)");
Matcher m = p.matcher("this is your number:1234 thank you");
if (m.find()) {
String someNumberStr = m.group(2);
int someNumberInt = Integer.parseInt(someNumberStr);
}
I have a String like this as shown below. From below string I need to extract number 123 and it can be at any position as shown below but there will be only one number in a string and it will always be in the same format _number_
text_data_123
text_data_123_abc_count
text_data_123_abc_pqr_count
text_tery_qwer_data_123
text_tery_qwer_data_123_count
text_tery_qwer_data_123_abc_pqr_count
Below is the code:
String value = "text_data_123_abc_count";
// this below code will not work as index 2 is not a number in some of the above example
int textId = Integer.parseInt(value.split("_")[2]);
What is the best way to do this?
With a little guava magic:
String value = "text_data_123_abc_count";
Integer id = Ints.tryParse(CharMatcher.inRange('0', '9').retainFrom(value)
see also CharMatcher doc
\\d+
this regex with find should do it for you.
Use Positive lookahead assertion.
Matcher m = Pattern.compile("(?<=_)\\d+(?=_)").matcher(s);
while(m.find())
{
System.out.println(m.group());
}
You can use replaceAll to remove all non-digits to leave only one number (since you say there will be only 1 number in the input string):
String s = "text_data_123_abc_count".replaceAll("[^0-9]", "");
See IDEONE demo
Instead of [^0-9] you can use \D (which also means non-digit):
String s = "text_data_123_abc_count".replaceAll("\\D", "");
Given current requirements and restrictions, the replaceAll solution seems the most convenient (no need to use Matcher directly).
u can get all parts from that string and compare with its UPPERCASE, if it is equal then u can parse it to a number and save:
public class Main {
public static void main(String[] args) {
String txt = "text_tery_qwer_data_123_abc_pqr_count";
String[] words = txt.split("_");
int num = 0;
for (String t : words) {
if(t == t.toUpperCase())
num = Integer.parseInt(t);
}
System.out.println(num);
}
}
I'm trying to parse a html tag so far I got the text which can be as follows:
"Guide Price £50,000"
or
"£50,000"
or even
"£50,000 - £55,000"
In the third case to make things simpler all I need is the first price listed.
My question is how can I convert the following numbers into an int or double, preferably an int as the numbers are quite large. Would number formatter do this or would I need a regex expression especially if some text trails the tag block.
Example after what I got so far
String priceNumber = url.select("span.price").text(); //using JSoup Libary
String priceNumber = priceNumber.replaceAll("[^\\d.])
This removes everything which is not a digit I think.
What if the example has 2 numbers in it how do I get the first?
Use a regex with Matcher.find to search for occurrences, then remove the commas and try to parse. Here's the decimal case:
String input = "£50,000 - £55,000";
Pattern regex = Pattern.compile("\\d[\\d,\\.]+");
Matcher finder = regex.matcher(input);
if( finder.find() ) { // or while() if you want to process each
try {
double value = Double.parseDouble(finder.group(0).replaceAll(",", ""));
// do something with value
} catch (NumberFormatException e ) {
// handle unparseable
}
}
Youu can convert any String to a int or double with Integer.parseInt(\\String you want to convert) or Double.parseDouble(\\String you want to convert) respectively.
In your first and second case this would get you 50000.
In the third cae you need to split the string into 2 first and then repeat the trick.
Your title is a bit misleading as you are not asking on how to convert from pound to lets say euro.
Use a regex to remove the unimportant characters and then parse the result as a double. You can then truncate to int if you only care about dollar values.
NumberFormat format = NumberFormat.getInstance();
format.parse(priceNumber.replaceAll("[^\\d]*([\\d,]*).*", "$1")).doubleValue()
The first part of the replace pattern [^\\d] matches and throws away leading characters, the second part ([\\d,]) saves the next series of digits and commas, then the third part .* throws away the rest of the input.
Then the whole input is replaced with the contents of the first saved match (the second part of the replace pattern).
Then you use the NumberFormat class to parse the number (you could use Double.parseDouble() if it weren't for the comma)
This will work I think!
String string = "This is £50,000 pounds, this is £5.00 pounds.";
String newString = string;
while (string.contains("£")) {
if (string.indexOf("£") != -1) {
// it contains £
string = string.substring(string.indexOf("£"));
newString = string.substring(0, string.indexOf(" "));
string = string.replaceFirst(newString, "");
newString = newString.replaceAll("£", "");
newString = newString.replaceAll(",", "");
double money = Double.parseDouble(newString);
System.out.println(money);
}
}
you can try this out (for all the cases),
String priceNumber = "£500001 wcjnwknv122333- £55,000";
String regex = "£(\\d+,?\\d+)\\D?";
Pattern p =Pattern.compile(regex);
Matcher m = p.matcher(priceNumber);
if(m.find()){
System.out.println(m.group(1));
}
Try below regex :
((\$|£)\d+\s|(\$|£)\d+-(\$|£)\d+\s)
I have a string which looks like following:
Turns 13,000,000 years old
Now i want to convert the digits to words in English, I have a function ready for that however I am finding problems to detect the original numbers (13,000,000) in this case, because it is separated by commas.
Currently I am using the following regex to detect a number in a string:
stats = stats.replace((".*\\d.*"), (NumberToWords.start(Integer.valueOf(notification_data_greet))));
But the above seems not to work, any suggestions?
You need to extract the number using a RegEx wich allows for the commas. The most robust one I can think of right now is
\d{1,3}(,?\d{3})*
Wich matches any unsigned Integer both with correctly placed commas and without commas (and weird combinations thereof like 100,000000)
Then replace all , from the match by the empty String and you can parse as usual:
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
int n = Integer.parseInt(num);
// Do stuff with the number n
}
Working example:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) throws InterruptedException {
String input = "1,300,000,000";
Pattern p = Pattern.compile("\\d{1,3}(,?\\d{3})*"); // You can store this as static final
Matcher m = p.matcher(input);
while (m.find()) { // Go through all matches
String num = m.group().replace(",", "");
System.out.println(num);
int n = Integer.parseInt(num);
System.out.println(n);
}
}
}
Gives output
1300000000
1300000000
Try this regex:
[0-9][0-9]?[0-9]?([0-9][0-9][0-9](,)?)*
This matches numbers that are seperated by a comma for each 1000. So it will match
10,000,000
but not
10,1,1,1
You can do it with the help of DecimalFormat instead of a regular expression
DecimalFormat format = (DecimalFormat) DecimalFormat.getInstance();
System.out.println(format.parse("10,000,000"));
Try the below regex to match the comma separted numbers,
\d{1,3}(,\d{3})+
Make the last part as optional to match also the numbers which aren't separated by commas,
\d{1,3}(,\d{3})*
I have a string that contains a few numbers (usually a date) and separators. The separators can either be "," or "." - or example 01.05,2000.5000
....now I need to separate those numbers and put into an array but I'm not sure how to do that (the separating part). Also, I need to check that the string is valid - it cannot be 01.,05.
I'm not asking for anyone to solve the thing for me (but if someone wants I appreciated it), just point me in the right direction :)
This is a way of doing it with StringTokenizer class, just iterate the tokens and if the obtained token is empty then you have a invalid String, also, convert the tokens to integers by the parseInt method to check if they are valid integer numbers:
import java.util.*;
public class t {
public static void main(String... args) {
String line = "01.05,2000.5000";
StringTokenizer strTok = new StringTokenizer(line, ",.");
List<Integer> values = new ArrayList<Integer>();
while (strTok.hasMoreTokens()) {
String s = strTok.nextToken();
if (s.length() == 0) {
// Found a repeated separator, String is not valid, do something about it
}
try {
int value = Integer.parseInt(s, 10);
values.add(value);
} catch(NumberFormatException e) {
// Number not valid, do something about it or continue the parsing
}
}
// At the end, get an array from the ArrayList
Integer[] arrayOfValues = values.toArray(new Integer[values.size()]);
for (Integer i : arrayOfValues) {
System.out.println(i);
}
}
}
Iterate through an String#split(regex) generated array and check each value to make sure your source String is "valid".
In:
String src = "01.05,2000.5000";
String[] numbers = src.split("[.,]");
numbers here will be an array of Strings, like {"01", "05", "2000", "5000"}. Each value is a number.
Now iterate over numbers. If you find a index that is not a number (it's a number when numbers[i].matches("\\d+") is true), then your src is invalid.
If possible, I would use guava String splitter for that. It is much more reliable, predictable and flexible than String#split. You can tell it exactly what to expect, what to omit, and so on.
For an example usage, and a small rant on how stupid javas split sometimes behaves, have a look here: http://code.google.com/p/guava-libraries/wiki/StringsExplained#Splitter
Use regex to group and match the input
String s = "01.05,2000.5000";
Pattern pattern = Pattern.compile("(\\d{2})[.,](\\d{2})[.,](\\d{4})[.,](\\d{4})");
Matcher m = pattern.matcher(s);
if(m.matches()) {
String[] matches = { m.group(1),m.group(2), m.group(3),m.group(4) };
for(String match : matches) {
System.out.println(match);
}
} else {
System.err.println("Mismatch");
}
Try this:
String str = "01.05,2000.5000";
str = str.replace(".",",");
int number = StringUtils.countMatches(str, ",");
String[] arrayStr = new String[number+1];
arrayStr = str.split(",");
StringUtils is from Apache Commons >> http://commons.apache.org/proper/commons-lang/
To validate:
if (input.matches("^(?!.*[.,]{2})[\\d.,]+))
This regex checks that:
dot and comma are never adjacent
input is comprised only of digits, dots and commas
To split:
String[] numbers = input.split("[.,]");
In order to separate the string, use split(), the argument of the method is the delimiter
array = string.split("separator");