I have a string like bellow :
dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar
I am getting bellow values with this [\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar :
2500$
5405€
554668¢
885486¥
588525dollar
Problem : But I don't need to these $ € ¢ ¥ dollar . How I can delete these in top regex ?
Here is my method :
private String getPrice(String caption) {
String pricePattern = "[\\d,]+\\s*\\$|[\\d,]+\\s*€|[\\d,]+\\s*¥|[\\d,]+\\s*¢|[\\d,]+\\s*dollar|[\\d,]+\\s*Euro";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group());
}
if (lstPrice.size() > 0) {
return lstPrice.get(0);
}
return "";
}
If you need to return all prices, make sure your getPrice method returns List<String> and adjust the regex to match the prices but capture the numbers only:
private List<String> getPrice(String caption) {
String pricePattern = "(?i)(\\d[\\d,]*)\\s*(?:[$€¥¢]|dollar|Euro)";
List<String> lstPrice = new ArrayList<>();
Pattern rPrice = Pattern.compile(pricePattern);
Matcher mPrice = rPrice.matcher(caption);
while (mPrice.find()) {
lstPrice.add(mPrice.group(1));
}
return lstPrice;
}
See the Java demo online.
String s = "dfdfm;lg 2500$ jshfsnefsfz5405€mnvkjdf64rfmkd554668¢ odsfrknegj 885486¥ dsflkef 588525dollar";
System.out.println(getPrice(s));
returns
[2500, 5405, 554668, 885486, 588525]
Pattern details:
(?i) - a case insensitive modifier (embedded flag option)
(\\d[\\d,]*) - Group 1 capturing a digit and then 0+ digits or ,
\\s* - 0+ whitespaces
(?:[$€¥¢]|dollar|Euro) - either $, €, ¥, ¢, dollar or euro (case insensitive search is enabled via (?i))
You can try with replaceAll
Replaces every subsequence of the input sequence that matches the
pattern with the given replacement string.
String pricePattern="2500$ 5405€ 554668¢ 885486¥ 588525dollar";
pricePattern= pricePattern.replaceAll("[^\\d+]", " "); //2500 5405 554668 885486 588525
Check Java Demo
Related
I'm looking for are regex for parsing money amounts. The String s10 should not match. Can someone help, or can someone simplify the regex? That's my try:
public static String[] getMoney(String s) {
List<String> ret = new ArrayList<String>();
String regex = "((\\d{1,3}[.,]?)(\\d{3}[.,]?)*[.,]\\d{1,2})(\\D|$)";
Pattern pat = Pattern.compile(regex);
Matcher mat = pat.matcher(s);
while (mat.find()) {
ret.add(mat.group(1));
}
return ret.toArray(new String[0]);
}
public static void main(String[] args) {
String s1 = "0,1"; // should match
String s2 = ",1"; // should not match
String s3 = "1,"; // should not match
String s4 = "1.234,01"; // should match
String s5 = "1234,10"; // should match
String s6 = "1234,100"; // should not match
String s7 = "1234,10a"; // should match
String s8 = "123,456,789.10"; // should match
String s9 = "123.456.789,10"; // should match
String s10 = "123,456.789,10"; // should not match (!)
System.out.println(Arrays.toString(getMoney(s1)));
System.out.println(Arrays.toString(getMoney(s2)));
System.out.println(Arrays.toString(getMoney(s3)));
System.out.println(Arrays.toString(getMoney(s4)));
System.out.println(Arrays.toString(getMoney(s5)));
System.out.println(Arrays.toString(getMoney(s6)));
System.out.println(Arrays.toString(getMoney(s7)));
System.out.println(Arrays.toString(getMoney(s8)));
System.out.println(Arrays.toString(getMoney(s9)));
System.out.println(Arrays.toString(getMoney(s10)));
}
I think you may use
(?<![\d,.])(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)?|\d+)(?:(?!\1)[.,]\d{1,2})?(?![,.\d])
See the regex demo
Details
(?<![\d,.]) - no digit, . or , allowed immediately on the left
(?:\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)?|\d+) -
\d{1,3}(?:(?=([.,]))(?:\1\d{3})*)? - one, two or three digits followed with an optional occurrence of a position followed with a comma or dot followed with 0 or more occurrences of the captured value and then any three digits
|\d+ - or 1 or more digits
(?:(?!\1)[.,]\d{1,2})? - an optional sequence of a comma or dot, but not the same char as in Group 1, and then 1 or 2 digits
(?![,.\d]) - no digit, . or , allowed immediately on the right
In Java, do not forget to double the backslashes:
String regex = "(?<![\\d,.])(?:\\d{1,3}(?:(?=([.,]))(?:\\1\\d{3})*)?|\\d+)(?:(?!\\1)[.,]\\d{1,2})?(?![,.\\d])";
If I have a list of strings say:
private List<String> domains;
How do I validate the strings and create a boolean to confirm they all start with a symbol that is not a word or number?
Result of string should be something like:
#yahoo.com
)cloud.com
and not:
yahoo.com
cloud.com
I have
Pattern p = Pattern.compile("/[^\\w._\\s]/g");
Matcher m = p.matcher(values.getList().get(0)); //right now it's only checking first element I'm not sure how to check them all
boolean b = m.matches();
This doesn't seem to be working.
When you're using Java 8 (or higher), you can make use of the following:
Pattern p = Pattern.compile("^[^a-zA-Z0-9].*");
boolean all = domains.stream().allMatch(st -> p.matcher(st).matches());
all then contains a boolean that checks if all the domains match the regex.
The regex matches everything that doesn't start with a lowercase, uppercase character or a numeric character.
You can use matcher.lookingAt with a simple regex \\W that means a non-word character (that includes non letters and non digits) like this:
Pattern p = Pattern.compile("\\W");
boolean status = true;
for (String str : domains) {
Matcher m = p.matcher(str);
if ((status = m.lookingAt()) == false) {
break;
}
}
System.out.println( status );
lookingAt attempts to match the input sequence, starting at the beginning of the region, against the pattern.
btw you are mixing Javascript regex syntax in Java. There is no /..../g in Java.
I'm trying to get all matches which starts with _ and ends with = from a URL which looks like
?_field1=param1,param2,paramX&_field2=param1,param2,paramX
In that case I'm looking for any instance of _fieldX=
A method which I use to get it looks like
public static List<String> getAllMatches(String url, String regex) {
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile("(?=(" + regex + "))").matcher(url);
while(m.find()) {
matches.add(m.group(1));
}
return matches;
}
called as
List<String> fieldsList = getAllMatches(url, "_.=");
but somehow is not finding anything what I have expected.
Any suggestions what I have missed?
A regex like (?=(_.=)) matches all occurrences of overlapping matches that start with _, then have any 1 char (other than a line break char) and then =.
You need no overlapping matches in the context of the string you provided.
You may just use a lazy dot matching pattern, _(.*?)=. Alternatively, you may use a negated character class based regex: _([^=]+)= (it will capture into Group 1 any one or more chars other than = symbol).
Since you are passing a regex to the method, it seems you want a generic function.
If so, you may use this method:
public static List<String> getAllMatches(String url, String start, String end) {
List<String> matches = new ArrayList<String>();
Matcher m = Pattern.compile(start + "(.*?)" + end).matcher(url);
while(m.find()) {
matches.add(m.group(1));
}
return matches;
}
and call it as:
List<String> fieldsList = getAllMatches(url, "_", "=");
I have a input like google.com and a list of values like
1. *.com
2. *go*.com
3. *abc.com
4. *le.com
5. *.*
I need to write a pattern in java which should return all the matches except *abc.com. I have tried a few but nothing worked as expected. Kindly help. Thanks in advance.
Update:
public static void main(String[] args) {
List<String> values = new ArrayList<String>();
values.add("*.com");
values.add("*go*.com");
values.add("*abc.com");
values.add("*le.com");
values.add("*.*");
String stringToMatch = "google.com";
for (String pattern : values) {
String regex = Pattern.quote(pattern).replace("*", ".*");
System.out.println(stringToMatch.matches(regex));
}
}
Output:
false
false
false
false
false
I have tried this but the pattern doesn't match.
You could transform the given patterns into regexes, and then use normal regex functions like String.matches():
for (String pattern : patterns) {
final String regex = pattern.replaceAll("[\\.\\[\\](){}?+|\\\\]", "\\\\$0").replace("*", ".*");
System.out.println(stringToMatch.matches(regex));
}
edit: Apparently Pattern.quote() just adds \Q...\E around the string. Edited to use manual quoting.
edit 2: Another possibility is:
final String regex = Pattern.quote(pattern).replace("*", "\\E.*\\Q");
Based on a previous answer of mine (read the comments of the question, very instructive), here is a wildcardsToRegex method:
public static String wildcardsToRegex(String wildcards) {
String regex = wildcards;
// .matches() auto-anchors, so add [*] (i.e. "containing")
regex = "*" + regex + "*";
// replace any pair of backslashes by [*]
regex = regex.replaceAll("(?<!\\\\)(\\\\\\\\)+(?!\\\\)", "*");
// minimize unescaped redundant wildcards
regex = regex.replaceAll("(?<!\\\\)[?]*[*][*?]+", "*");
// escape unescaped regexps special chars, but [\], [?] and [*]
regex = regex.replaceAll("(?<!\\\\)([|\\[\\]{}(),.^$+-])", "\\\\$1");
// replace unescaped [?] by [.]
regex = regex.replaceAll("(?<!\\\\)[?]", ".");
// replace unescaped [*] by [.*]
regex = regex.replaceAll("(?<!\\\\)[*]", ".*");
// return whether data matches regex or not
return regex;
}
Then, within your loop, use:
for (String pattern : values) {
System.out.println(stringToMatch.matches(wildcardsToRegex(pattern)));
}
Change this line in your code:
String regex = Pattern.quote(pattern).replace("*", ".*");
To this:
String regex = pattern.replace(".", "\\.").replace("*", ".*");
You can use :
List<String> values = new ArrayList<String>();
values.add("*.com");
values.add("*go*.com");
values.add("*abc.com");
values.add("*le.com");
values.add("*.*");
String stringToMatch = "google.com";
for (String pattern : values) {
String regex = pattern.replaceAll("[.]", "\\.").replaceAll("[*]", "\\.\\*");
System.out.println(stringToMatch.matches(regex));
}
I have a sentence: "we:PR show:V".
I want to match only those characters after ":" and before "\\s" using regex pattern matcher.
I used following pattern:
Pattern pattern=Pattern.compile("^(?!.*[\\w\\d\\:]).*$");
But it did not work.
What is the best pattern to get the output?
For a situation such as this, if you are using java, it may be easier to do something with substrings:
String input = "we:PR show:V";
String colon = ":";
String space = " ";
List<String> results = new ArrayList<String>();
int spaceLocation = -1;
int colonLocation = input.indexOf(colon);
while (colonLocation != -1) {
spaceLocation = input.indexOf(space);
spaceLocation = (spaceLocation == -1 ? input.size() : spaceLocation);
results.add(input.substring(colonLocation+1,spaceLocation);
if(spaceLocation != input.size()) {
input = input.substring(spaceLocation+1, input.size());
} else {
input = new String(); //reached the end of the string
}
}
return results;
This will be faster than trying to match on regex.
The following regex assumes that any non-whitespace characters following a colon (in turn preceded by non-colon characters) are a valid match:
[^:]+:(\S+)(?:\s+|$)
Use like:
String input = "we:PR show:V";
Pattern pattern = Pattern.compile("[^:]+:(\\S+)(?:\\s+|$)");
Matcher matcher = pattern.matcher(input);
int start = 0;
while (matcher.find(start)) {
String match = matcher.group(1); // = "PR" then "V"
// Do stuff with match
start = matcher.end( );
}
The pattern matches, in order:
At least one character that isn't a colon.
A colon.
At least non-whitespace character (our match).
At least one whitespace character, or the end of input.
The loop continues as long as the regex matches an item in the string, beginning at the index start, which is always adjusted to point to after the end of the current match.