regex, string to float number, get rid off other chars - java

With Stings like 123.456mm I would like to get one String with the number and the other with the measurement. So in the above case, one String with 123.456 and the other String with mm. So far I have this:
String str = "123.456mm";
String length = str.replaceAll("[\\D|\\.*]+","");
String lengthMeasurement = str.replaceAll("[\\W\\d]+","");
println(length, lengthMeasurement);
The output is:
123456 mm
The dot is gone and I can't get it back.
How can I keep the dots?

You can use:
String str = "123.456mm";
String length = str.replaceAll("[^\\d.]+",""); // 123.456
String lengthMeasurement = str.replaceAll("[\\d.]+",""); // mm

Try,
String str = "123.456mm";
String str1 = str.replaceAll("[a-zA-Z]", "");
String str2 = str.replaceAll("\\d|\\.", "");
System.out.println(str1);
System.out.println(str2);
Output:
123.456
mm

Try with Pattern and Matcher using below regex and get the matched group from index 1 and 2.
(\d+\.?\d*)(\D+)
Online demo
Try below sample code:
String str = "123.456mm";
Pattern p = Pattern.compile("(\\d+\\.?\\d*)(\\D+)");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println("Length: " + m.group(1));
System.out.println("Measurement : " + m.group(2));
}
output:
Length:123.456
Measurement :mm
Pattern description:
( group and capture to \1:
\d+ digits (0-9) (1 or more times)
\.? '.' (optional (0 or 1 time))
\d* digits (0-9) (0 or more times)
) end of \1
( group and capture to \2:
\D+ non-digits (all but 0-9) (1 or more times)
) end of \2

Related

Removing whitespaces at the beginning of the string with Regex gives null Java

I would like to get groups from a string that is loaded from txt file. This file looks something like this (notice the space at the beginning of file):
as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655
First part of string until first comma can be digits and letter, second part of string are only digits and third are also only digits. After | its all repeating.
First, I load txt file into string :String readFile3 = readFromTxtFile("/resources/file.txt");
Then I remove all whitespaces with regex :
String no_whitespace = readFile3.replaceAll("\\s+", "");
After that i try to get groups :
Pattern p = Pattern.compile("[a-zA-Z0-9]*,\\d*,\\d*", Pattern.MULTILINE);
Matcher m = p.matcher(ue_No_whitespace);
int lastMatchPos = 0;
while (m.find()) {
System.out.println(m.group());
lastMatchPos = m.end();
}
if (lastMatchPos != ue_No_whitespace.length())
System.out.println("Invalid string!");
Now I would like, for each group remove "," and add every value to its variable, but I am getting this groups : (notice this NULL)
nullas431431af,87546,3214
5a341fafaf,3365,54465
6adrT43,5678,5655
What am i doing wrong? Even when i physicaly remove space from the beginning of the txt file , same result occurs.
Is there any easier way to get groups in this string with regex and add each string part, before "," , to its variable?
You can split with | enclosed with optional whitespaces and then split the obtained items with , enclosed with optional whitespaces:
String str = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String[] items = str.split("\\s*\\|\\s*");
List<String[]> res = new ArrayList<>();
for(String i : items) {
String[] parts = i.split("\\s*,\\s*");
res.add(parts);
System.out.println(parts[0] + " - " + parts[1] + " - " + parts[2]);
}
See the Java demo printing
as431431af - 87546 - 3214
5a341fafaf - 3365 - 54465
6adrT43 - 5678 - 5655
The results are in the res list.
Note that
\s* - matches zero or more whitespaces
\| - matches a pipe char
The pattern that you tried only has optional quantifiers * which could also match only comma's.
You also don't need Pattern.MULTILINE as there are no anchors in the pattern.
You can use 3 capture groups and use + as the quantifier to match at least 1 or more occurrence, and after each part either match a pipe | or assert the end of the string $
([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\||$)
Regex demo | Java demo
For example
String readFile3 = "as431431af,87546,3214| 5a341fafaf,3365,54465 | 6adrT43 , 5678 , 5655";
String no_whitespace = readFile3.replaceAll("\\s+", "");
Pattern p = Pattern.compile("([a-zA-Z0-9]+),([0-9]+),([0-9]+)(?:\\||$)");
Matcher matcher = p.matcher(no_whitespace);
while (matcher.find()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println(matcher.group(i));
}
System.out.println("--------------------------------");
}
Output
as431431af
87546
3214
--------------------------------
5a341fafaf
3365
54465
--------------------------------
6adrT43
5678
5655
--------------------------------

How to extract the strings between the delimiters '<' and '>' from the string "Rahul<is>an<entrepreneur>"?

How to extract the strings between the delimiters '<' and '>' from the string
“Rahul<is>an<entrepreneur>”
I tried using substring() method, but I could only extract one string out of the primary string. How to loop this and get all the strings between the delimiters from the primary string
You could use Pattern and Matcher for pattern lookup. For example, see code below:
String STR = "Rahul<is>an<entrepreneur>";
Pattern pattern = Pattern.compile("<(.*?)>", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(STR);
while (matcher.find()) {
System.out.println(matcher.start() + " " + matcher.end() + " " + matcher.group());
}
Output of above will give you start and end indexes and group substring:
5 9 <is>
11 25 <entrepreneur>
More specifically if you just want the strings, you can get string between the group start and end indexes.
STR.substring(matcher.start() + 1, matcher.end() - 1);
This gives you only the matching strings.
This worked for me:
String str = "Rahul<is>an<entrepreneur>";
String[] tempStr = str.split("<");
for (String st : tempStr) {
if (st.contains(">")) {
int index = st.indexOf('>');
System.out.println(st.substring(0, index));
}
}
Output:
is
entrepreneur

Java Regex jumps to next match with if clause [duplicate]

I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22

Regex pattern java with commas

I have a below string which comes from an excel column
"\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\""
I would like to set regex pattern to retrieve the entire string,so that my result would be exactly like
"USE CODE ""Gef, sdf"" FROM 1/7/07"
Below is what I tried
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\", Delete , Hello , How are you ? , ";
String line2 = "Test asda ds asd, tesat2 . test3";
String dpattern = "(\"[^\"]*\")(?:,(\"[^\"]*\"))*,|([^,]+),";
// Create a Pattern object
Pattern d = Pattern.compile(dpattern);
Matcher md = d.matcher(line2);
Pattern r = Pattern.compile(dpattern);
// Now create matcher object.
Matcher m = r.matcher(line);
if (m.find( )) {
System.out.println("Found value: 0 " + m.group(0) );
// System.out.println("Found value: 1 " + m.group(1) );
//System.out.println("Found value: 2 " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
}
and the result out of it breaks after ,(comma) and hence the output is
Found value: 0 "USE CODE ""Gef,
It should be
Found value: 0 "USE CODE ""Gef sdf"" FROM 1/7/07",
and for the second line Matcher m = r.matcher(line2); the output should be
Found value: 0 "Test asda ds asd",
You may use
(?:"[^"]*(?:""[^"]*)*"|[^,])+
See the regex demo
Explanation:
" - leading quote
[^"]* - 0+ chars other than a double quote
(?:""[^"]*)* - 0+ sequences of a "" text followed with 0+ chars other than a double quote
" - trailing quote
OR:
[^,] - any char but a comma
And the whole pattern is matched 1 or more times as it is enclosed with (?:...)+ and + matches 1 or more occurrences.
IDEONE demo:
String line = "\"USE CODE \"\"Gef, sdf\"\" FROM 1/7/07\", Delete , Hello , How are you ? , ";
String line2 = "Test asda ds asd, tesat2 . test3";
Pattern pattern = Pattern.compile("(?:\"[^\"]*(?:\"\"[^\"]*)*\"|[^,])+");
Matcher matcher = pattern.matcher(line);
if (matcher.find()){ // if is used to get the 1st match only
System.out.println(matcher.group(0));
}
Matcher matcher2 = pattern.matcher(line2);
if (matcher2.find()){
System.out.println(matcher2.group(0));
}

Find the first occurrence with Regex

I would like to be able to find the first occurrence of m² and then numbers in front of it, could be integers or decimal numbers.
E.g.
"some text" 38 m² "some text" ,
"some text" 48,8 m² "some text",
"some text" 48 m² "some text", etc..
What I have so far is:
\d\d,\d\s*(\m\u00B2)|\d\d\s*(\m\u00B2)
This right now finds all occurrences, although I guess it could be fixed with findFirst(). Any ideas how to improve the Regex part?
To get the first match, you just need to use Matcher#find() inside an if block:
String rx = "\\d+(?:,\\d+)?\\s*m\\u00B2";
Pattern p = Pattern.compile(rx);
Matcher matcher = p.matcher("E.g. : 4668,68 m² some text, some text 48 m² etc");
if (matcher.find()){
System.out.println(matcher.group());
}
See IDEONE demo
Note that you can get rid of the alternation group using an optional non-capturing group (?:..)?
Pattern breakdown:
\d+ - 1+ digits
(?:,\d+)? - 0+ sequences of a comma followed with 1+ digits
\s* - 0+ whitespace symbols
m\u00B2 - m2.
This is what I came up with you help :) (work in progress, later it should return BigDecimal value), for now it seems to work:
public static String findArea(String description) {
String tempString = "";
Pattern p = Pattern.compile("\\d+(?:,\\d+)?\\s*m\\u00B2");
Matcher m = p.matcher(description);
if(m.find()) {
tempString = m.group();
}
//remove the m and /u00B2 to parse it to BigDecimal later
tempString = tempString.replaceAll("[^0-9|,]","");
System.out.println(tempString);
return tempString;
}
One simple way of doing it!
description.replaceFirst(#NotNull String regex,
#NotNull String replacement)
JAVADoc: Replaces the first substring of this string that matches the given regular expression with the given replacement.
To find only last one:
#Test
public void testFindFirstRegExp() {
String pattern = ".* (\\d+,\\d+) .*";
Pattern r = Pattern.compile(pattern);
String line = "some text 44,66 m² some 33,11 m² text 11,22 m² some text";
Matcher m = r.matcher(new StringBuilder(line).reverse().toString());
String expected = "44,66";
String actual = null;
if (m.find()) {
actual = new StringBuilder(m.group(1)).reverse().toString();
}
System.out.println("got first:" + actual);
Assert.assertEquals(expected, actual);
m = r.matcher(line);
expected = "11,22";
actual = null;
if (m.find()) {
actual = m.group(1);
}
System.out.println("got last:" + actual);
Assert.assertEquals(expected, actual);
}
prints:
got first:44,66
got last:11,22
Note: think that you need to reverse pattern when needed for ex:
pattern = ".* (\\d+,\\d+-?) .*"; //reverse for (-?\\d+,\\d+)
but this will work as waited:
pattern = " (\\-?\\d+,\\d+) ";
you get all of them in loop:
while (m.find()) {
actual = m.group(1);
System.out.println("got last:" + actual);
}
Will print:
got last:44,66
got last:33,11
got last:11,22

Categories