I'm looking for are regex for parsing money amounts. The String s10 should not match. Can someone help, or can someone simplify the regex? That's my try:
public static String[] getMoney(String s) {
List<String> ret = new ArrayList<String>();
String regex = "((\\d{1,3}[.,]?)(\\d{3}[.,]?)*[.,]\\d{1,2})(\\D|$)";
Pattern pat = Pattern.compile(regex);
Matcher mat = pat.matcher(s);
while (mat.find()) {
ret.add(mat.group(1));
}
return ret.toArray(new String[0]);
}
public static void main(String[] args) {
String s1 = "0,1"; // should match
String s2 = ",1"; // should not match
String s3 = "1,"; // should not match
String s4 = "1.234,01"; // should match
String s5 = "1234,10"; // should match
String s6 = "1234,100"; // should not match
String s7 = "1234,10a"; // should match
String s8 = "123,456,789.10"; // should match
String s9 = "123.456.789,10"; // should match
String s10 = "123,456.789,10"; // should not match (!)
System.out.println(Arrays.toString(getMoney(s1)));
System.out.println(Arrays.toString(getMoney(s2)));
System.out.println(Arrays.toString(getMoney(s3)));
System.out.println(Arrays.toString(getMoney(s4)));
System.out.println(Arrays.toString(getMoney(s5)));
System.out.println(Arrays.toString(getMoney(s6)));
System.out.println(Arrays.toString(getMoney(s7)));
System.out.println(Arrays.toString(getMoney(s8)));
System.out.println(Arrays.toString(getMoney(s9)));
System.out.println(Arrays.toString(getMoney(s10)));
}
I think you may use
(?<![\d,.])(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)?|\d+)(?:(?!\1)[.,]\d{1,2})?(?![,.\d])
See the regex demo
Details
(?<![\d,.]) - no digit, . or , allowed immediately on the left
(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)?|\d+) -
\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)? - one, two or three digits followed with an optional occurrence of a position followed with a comma or dot followed with 0 or more occurrences of the captured value and then any three digits
|\d+ - or 1 or more digits
(?:(?!\1)[.,]\d{1,2})? - an optional sequence of a comma or dot, but not the same char as in Group 1, and then 1 or 2 digits
(?![,.\d]) - no digit, . or , allowed immediately on the right
In Java, do not forget to double the backslashes:
String regex = "(?<![\\d,.])(?:\\d{1,3}(?:(?=([.,]))(?:\\1\\d{3})*)?|\\d+)(?:(?!\\1)[.,]\\d{1,2})?(?![,.\\d])";
Related
I have a string like bellow :
dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar
I am getting bellow values with this [\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar :
2500$
5405€
554668¢
885486¥
588525dollar
Problem : But I don't need to these $ € ¢ ¥ dollar . How I can delete these in top regex ?
Here is my method :
private String getPrice(String caption) {
String pricePattern = "[\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar|[\\d,]+\\s*Euro";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group());
}
if (lstPrice.size() > 0) {
return lstPrice.get(0);
}
return "";
}
If you need to return all prices, make sure your getPrice method returns List<String> and adjust the regex to match the prices but capture the numbers only:
private List<String> getPrice(String caption) {
String pricePattern = "(?i)(\\d[\\d,]*)\\s*(?:[$€¥¢]|dollar|Euro)";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group(1));
}
return lstPrice;
}
See the Java demo online.
String s = "dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar";
System.out.println(getPrice(s));
returns
[2500, 5405, 554668, 885486, 588525]
Pattern details:
(?i) - a case insensitive modifier (embedded flag option)
(\\d[\\d,]*) - Group 1 capturing a digit and then 0+ digits or ,
\\s* - 0+ whitespaces
(?:[$€¥¢]|dollar|Euro) - either $, €, ¥, ¢, dollar or euro (case insensitive search is enabled via (?i))
You can try with replaceAll
Replaces every subsequence of the input sequence that matches the
pattern with the given replacement string.
String pricePattern="2500$ 5405€ 554668¢ 885486¥ 588525dollar";
pricePattern= pricePattern.replaceAll("[^\\d+]", " "); //2500 5405 554668 885486 588525
Check Java Demo
String 1:
func1(test1)
String 2:
func1(test2)
I want to compare these 2 strings upto the first open braces '('.
So for the given example it should return true since the string upto '(' in both the strings is 'func1'.
Is there any way to do that without splitting?
String#substring() method will help on this case this combined with String#indexOf() method
String x1 = "func1(test1)";
String x2 = "func1(test1)";
String methName1 = x1.substring(0, x1.indexOf("("));
String methName2 = x2.substring(0, x2.indexOf("("));
System.out.println(methName1);
System.out.println(methName2);
System.out.println(methName1.equals(methName2));
You can use String.matches() method to test if the second string matches the splitted one from the first string:
String s1 = "func1(test1)";
String s2 = "func1(test2)";
String methName = s1.substring(0, s1.indexOf("("));
System.out.println(s2.matches(methName+ "(.*)"));
This is a working Demo.
Alternatively you can compare the strings directly by replacing everything after '(' by empty string.
String str1 = "func1(test1)";
String str2 = "func1(test2)";
System.out.println(str1.replaceAll("\\(.*", "").equals(str2.replaceAll("\\(.*", "")));
You can use regex to find every thing between any delimiters, in your case () and compare the results, for example :
String START_DEL = "\\("; //the start delimiter
String END_DEL = "\\)"; //the end delimiter
String str1 = "func1(test1)";
String str2 = "func1(test2)";
Pattern p = Pattern.compile(START_DEL + "(.*?)" + END_DEL);//This mean "\\((.*?)\\)"
Matcher m1 = p.matcher(str1);
Matcher m2 = p.matcher(str2);
if (m1.find() && m2.find()) {
System.out.println(m1.group(1).equals(m2.group(1)));
}
I have a regex for validation of UTF-8 characters.
String regex = "[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*"
I wanted to do a range check too so I modified it to
String regex = "[[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*]"
String rangeRegex = regex + "{0,30}"
Notice that it’s the same regex I just wrapped it with [ ].
Now I can validate with the range by using rangeRegex but regex is now not validating UTF-8 chars.
My question is: how is [] affecting regex? If I remove [] from the original regex it will validate UTF-8 chars but not with range. If I put [] it will validate with range but not without range!
sample test code -
public class Test {
static String regex = "[[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*]" ;
public static void main(String[] args) {
String userId = null;
//testUserId(userId);
userId = "";
testUserId(userId);
userId = "æÆbBcCćĆčČçďĎdzDzdzsDzs";
testUserId(userId);
userId = "test123";
testUserId(userId);
userId = "abcxyzsd";
testUserId(userId);
String zip = "i«♣│axy";
testZip(zip);
zip = "331fsdfsdfasdfasd02c3";
testZip(zip);
zip = "331";
testZip(zip);
}
/**
* without range check
* #param userId
*/
static void testUserId(String userId){
boolean pass = true;
if ( !stringValidator(userId, regex)) {
pass = false;
}
System.out.println(pass);
}
/**
* with a range check
* #param zip
*/
static void testZip(String zip){
boolean pass = true;
String regex1 = regex + "{0,10}";
if (StringUtils.isNotBlank(zip) && !stringValidator(zip, regex1)) {
pass = false;
}
System.out.println(pass);
}
static boolean stringValidator(String str, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
return matcher.matches();
}
}
The explanations given are rather wrong for Java regex.
In Java, unescaped paired square brackets inside a character class are not treated as literal [ and ] characters. They have a special meaning in Java character classes:
[a-d[m-p]] a through d, or m through p: [a-dm-p] (union)
[a-z&&[def]] d, e, or f (intersection)
[a-z&&[^bc]] a through z, except for b and c: [ad-z] (subtraction)
[a-z&&[^m-p]] a through z, and not m through p: [a-lq-z] (subtraction)
So, when you add a [...] to your regex, you get a union of the previous regex pattern with literal * character and means match either [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}] or a literal *.
Also, [[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]*] is equal to [\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}*] as * symbol inside a character class stops being a special character (a quantifier) and becomes a literal asterisk symbol.
If you use [[]], the engine will throw an exception: Unclosed character class near index 3
See this IDEONE demo:
System.out.println("abc[]".replaceAll("[[abc]]", "")); // => []
System.out.println("abc[]".replaceAll("[[]]", "")); // => error
Whenever you need to check the length of a string with regex, you need anchors and a limiting quantifier. Anchors are automatically added when a regex is used with Matcher#matches method:
The matches method attempts to match the entire input sequence against the pattern.
Example code:
String regex = "[\\p{L}\\p{M}\\p{N}\\p{P}\\p{Z}\\p{S}\\p{C}]";
String new_regex = regex + "{0,30}";
System.out.println("Some string".matches(new_regex)); // => true
See this IDEONE demo
UPDATE
Here is commented code of yours:
String userId = "";
testUserId(userId); // false - Correct as we test an empty string with an at-least-one-char regex
userId = "æÆbBcCćĆčČçďĎdzDzdzsDzs";
testUserId(userId); // false - Correct as we only match 1 character string, others fail
userId = "test123";
testUserId(userId); // false - see above
userId = "abcxyzsd";
testUserId(userId); // false - see above
String zip = "i«♣│axy";
testZip(zip); // true - OK, 7-symbol string matches against [...]{0,10} regex
zip = "331fsdfsdfasdfasd02c3";
testZip(zip); // false - OK, 21-symbol string does not match a regex that requires only 0 to 10 characters
zip = "331";
testZip(zip); // true - OK, 3-symbol string matches against [...]{0,10} regex
* means 0 or more, so it is almost like {0,}. i.e. you can replace the * with {0,30} and it should do what you want:
[\p{L}\p{M}\p{N}\p{P}\p{Z}\p{S}\p{C}]{0,30}
[] creates a character class, so [[]] would be "a character class of just [ followed by ] since the first ] closes the character class prematurely and doesn't really do what you want.
Also correct me if I'm wrong, but the character list you are generating is pretty much everything, so you could go with .{0,30} for the same effect.
I have the following simple code:
String d = "_|,|\\.";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
The output gives
b
a
Where does the extra space come from between b and a?
You do not have an extra space, you get an empty element in the resulting array because your regex matches only 1 character, and when there are several characters from the set on end, the string is split at each of those characters.
Thus, you should match as many of those characters in your character class as possible with + (1 or more) quantifier by placing the whole expression into a non-capturing group ((?:_|,|\\.)+), or - better - using a character class [_,.]+:
String d = "(?:_|,|\\.)+"; // Or better: String d = "[_,.]+";
String s1 = "b,_a_.";
Pattern p = Pattern.compile(d);
String[] ss = p.split(s1);
for (String str : ss){
System.out.println(str.trim());
}
See IDEONE demo
While i get puzzled my self, maybe what you want is to change your regex to
String d = "[_,\\.]+";
I want to split a string in Java some string like this, normal split function splits the string while losing the split characters:
String = "123{456]789[012*";
I want to split the string for {,[,],* character but don't want to lose them. I mean I want results like this:
part 1 = 123{
part 2 = 456]
part 3 = 789[
part 4 = 012*
Normally split function splits like this:
part 1 = 123
part 2 = 456
part 3 = 789
part 4 = 012
Is it possible?
You can use zero-width lookahead/behind expressions to define a regular expression that matches the zero-length string between one of your target characters and anything that is not one of your target characters:
(?<=[{\[\]*])(?=[^{\[\]*])
Pass this expression to String.split:
String[] parts = "123{456]789[012*".split("(?<=[{\\[\\]*])(?=[^{\\[\\]*])");
If you have a block of consecutive delimiter characters this will split once at the end of the whole block, i.e. the string "123{456][789[012*" would split into four blocks "123{", "456][", "789[", "012*". If you used just the first part (the look-behind)
(?<=[{\[\]*])
then you would get five parts "123{", "456]", "[", "789[", "012*"
Using a positive lookbehind:
(?<={|\[|\]|\*)
String str = "123{456]789[012*";
String parts[] = str.split("(?<=\\{|\\[|\\]|\\*)");
System.out.println(Arrays.toString(parts));
Output:
[123{, 456], 789[, 012*]
I think you're looking for something like
String str = "123{456]789[012*";
String[] parts = new String[] {
str.substring(0,4), str.substring(4,8), str.substring(8,12),
str.substring(12)
};
System.out.println(Arrays.toString(parts));
Output is
[123{, 456], 789[, 012*]
You can use a PatternMatcher to find the next index after a splitting character and the splitting character itself.
public static List<String> split(String string, String splitRegex) {
List<String> result = new ArrayList<String>();
Pattern p = Pattern.compile(splitRegex);
Matcher m = p.matcher(string);
int index = 0;
while (index < string.length()) {
if (m.find()) {
int splitIndex = m.end();
String splitString = m.group();
result.add(string.substring(index,splitIndex-1) + splitString);
index = splitIndex;
} else
result.add(string.substring(index));
}
return result;
}
Example code:
public static void main(String[] args) {
System.out.println(split("123{456]789[012*","\\{|\\]|\\[|\\*"));
}
Output:
[123{, 456], 789[, 012*]