regex for removing zeros in decimal string - java

I need to remove zeros from decimal string
eg: 007.004(100.007) should be transformed to 7.4(100.7)
I tried using a matcher based on the pattern "0+(\d)":
Pattern p = Pattern.compile(regex);
Matcher m = null;
try {
m = p.matcher(version);
while (m.find()) {
System.out.println("Group : " + m.group());
System.out.println("Group 1 : " + m.group(1));
version = version.replaceFirst(m.group(), m.group(1));
System.out.println("Version: " + version);
}
but this results in 7.4(10.7). Any thoughts on this ?

You need to do a replacement with this pattern:
(\\([^)]+\\))|0+
and this replacement string
\\1
In other words, you need to capture all that is between parenthesis first and then looking for zeros. use the replaceAll method.

There is no need to perform a replacement in another string while matching another:
while (m.find()) {
version = version.replaceFirst(m.group(), m.group(1));
You can instead use this replacement:
version = version.replaceAll("(^|\\.)0+", "$1");

If you are trying to remove leading zeroes before a nonzero digit, then you can match such runs with this pattern: "(?<!\\d)0+(?=[1-9])". That even uses a zero-length lookahead, as your tags suggest you might have wanted to do. It would be simpler to use than yours, too, because it doesn't match anything you want to keep:
Pattern p = Pattern.compile("(?<!\\d)0+(?=[1-9])");
Matcher m = p.matcher(version);;
version = matcher.replaceAll("");
If you're only going to do this once, then you can simplify to a one-liner:
version = version.replaceAll("(?<!\\d)0+(?=[1-9])", "");

Related

Java Regex expression not working

I have a problem with not working REGEX. I dont know what I am doing wrong. My code:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("\\btimetable:(.*);");
//also tried "timetable:(.*);" and "(\\btimetable:)(.*)(;)"
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("S:" + m.start() + ", E:" + m.end());
System.out.println("x: "+ test.substring(m.start(), m.end()));
}
Expected result:
(1) "timetable:xxxxxtimetable:"
(2) "timetable: fullihhghtO"
I thanks for any help.
A non-capturing group could be handy in our case:
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
Pattern p = Pattern.compile("(?:\\btimetable:(.*?);)+"); // <-- here
Matcher m = p.matcher(test);
int i = 1;
while (m.find()) {
System.out.println(i + ") "+ m.group(1));
i++;
}
OUTPUT
1) xxxxxtimetable:
2) fullihhghtO
Regex explained:
(?:\\btimetable:(.*?);)+ by using the non-capturing (?:\\btimetable:...) we'll consume the "timetable:" without capturing it, then the second matching group (.*?) captures what we want to capture (everything between \btimetable: and ;). Pay special attention to the non-greedy term: .*? which means that we'll consume the minimum possible amount of characters until the ;. If we won't use this lazy form, the regex will use "greedy" default mode and will consume all the characters until the last ; in the string!
Now, all that is relevant if you wanted to catch only the unique part, but if you wanted to catch the whole thing:
1) timetable:xxxxxtimetable:;
2) timetable: fullihhghtO;
It can be done easily by modifying the line with the regex to:
Pattern p = Pattern.compile("\\b(timetable:.*?;)+");
which is even simpler: only one capturing group (see that we still have to use the non-greedy mode!).
You don't need to use regex, a simple split would do it :
public static void main(String[] args) throws IOException {
String test = "timetable:xxxxxtimetable:; timetable: fullihhghtO;";
String[] array = test.split(";");
String str1 = array[0].trim();
String str2 = array[1].trim();
System.out.println(str1 + "\n" + str2); //timetable:xxxxxtimetable:
//timetable: fullihhghtO
}

Regex not capturing matching in expected groups

I have been working on requirement and I need to create a regex on following string:
startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
There can be many variations of this string as follows:
startDate:[*;2016-10-12T12:23:23Z]
startDate:[2016-10-12T12:23:23Z;*]
startDate:[*;*]
startDate in above expression is a key name which can be anything like endDate, updateDate etc. which means we cant hardcode that in a expression. The key name can be accepted as any word though [a-zA-Z_0-9]*
I am using the following compiled pattern
Pattern.compile("([[a-zA-Z_0-9]*):(\\[[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]];[[\\*]|[0-9]{4}-[0-9]{2}-[0-9]{2}T[0-9]{2}:[0-9]{2}:[0-9]{2}[Z]]\\]])");
The pattern matches but the groups created are not what I expect. I want the group surrounded by parenthesis below:
(startDate):([*:2016-10-12T12:23:23Z])
group1 = "startDate"
group2 = "[*;2016-10-12T12:23:23Z]"
Could you please help me with correct expression in Java and groups?
You are using [ rather than ( to wrap options (i.e. using |).
For example, the following code works for me:
Pattern pattern = Pattern.compile("(\\w+):(\\[(\\*|\\d{4}):\\*\\])");
Matcher matcher = pattern.matcher(text);
if (matcher.matches()) {
for (int i = 0; i < matcher.groupCount() + 1; i++) {
System.out.println(i + ":" + matcher.group(i));
}
} else {
System.out.println("no match");
}
To simplify things I just use the year but I'm sure it'll work with the full timestamp string.
This expression captures more than you need in groups but you can make them 'non-capturing' using the (?: ) construct.
Notice in this that I simplified some of your regexp using the predefined character classes. See http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html for more details.
Here is a solution which uses your original regex, modified so that it actually returns the groups you want:
String content = "startDate:[2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]";
Pattern pattern = Pattern.compile("([a-zA-Z_0-9]*):(\\[(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*):(?:\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}Z|\\*)\\])");
Matcher matcher = pattern.matcher(content);
// remember to call find() at least once before trying to access groups
matcher.find();
System.out.println("group1 = " + matcher.group(1));
System.out.println("group2 = " + matcher.group(2));
Output:
group1 = startDate
group2 = [2016-10-12T12:23:23Z:2016-10-12T12:23:23Z]
This code has been tested on IntelliJ and appears to be working correctly.

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

Retrieving Regex matched pattern

I need to retrieve a regex pattern matched strings from the given input.
Lets say, the pattern I need to get is like,
"http://mysite.com/<somerandomvalues>/images/<againsomerandomvalues>.jpg"
Now I created the following regex pattern for this,
http:\/\/.*\.mysite\.com\/.*\/images\/.*\.jpg
Can anybody illustrate how to retrieve all the matched pattern with this regx expression using Java?
You don't mask slashes but literal dots:
String regex = "http://(.*)\\.mysite\\.com/(.*)/images/(.*)\\.jpg";
String url = "http://www.mysite.com/work/images/cat.jpg";
Pattern pattern = Pattern.compile (regex);
Matcher matcher = pattern.matcher (url);
if (matcher.matches ())
{
int n = matcher.groupCount ();
for (int i = 0; i <= n; ++i)
System.out.println (matcher.group (i));
}
Result:
www
work
cat
Some simple Java example:
String my_regex = "http://.*.mysite.com/.*/images/.*.jpg";
Pattern pattern = Pattern.compile(my_regex);
Matcher matcher = pattern.matcher(string_to_be_matched);
// Check all occurance
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
In fact, it is not clear if you want the whole matching string or only the groups.
Bogdan Emil Mariesan's answer can be reduced to
if ( matcher.matches () ) System.out.println(string_to_be_matched);
because you know it is mathed and there are no groups.
IMHO, user unknown's answer is correct if you want to get matched groups.
I just want to add additional information (for others) that if you need matched group you can use replaceFirst() method too:
String firstGroup = string.replaceFirst( "http://mysite.com/(.*)/images/", "$1" );
But performance of Pattern.compile approach if better if there are two or more groups or if you need to do that multiple times (on the other hand in programming contests, for example, it is faster to write replaceFirst()).

Java Regex: how to capture multiple matches in the same line

I am trying to match a regex pattern in Java, and I have two questions:
Inside the pattern I'm looking for there is a known beginning and then an unknown string that I want to get up until the first occurrence of an &.
there are multiple occurrences of these patterns in the line and I would like to get each occurrence separately.
For example I have this input line:
1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate%7C120HZ&sName=View+All&subCatView=true 0 2819357575609397706
And I am interested in these strings:
Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
Screen+Refresh+Rate%7C120HZ
Assuming the known beginning is filter=**, the regular expression pattern (?:filter=\\*\\*)(.*?)(?:&) should get you what you need. Use Matcher.find() to get all occurrences of the pattern in a given string. Using the test string you provided, the following:
final Pattern p = Pattern.compile("(?:filter=\\*\\*)(.*?)(?:&)");
final Matcher m = p.matcher(testString);
int cnt = 0;
while (m.find()) {
System.out.println(++cnt + ": G1: " + m.group(1));
}
Will output:
1: G1: Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.
2: G1: Screen+Refresh+Rate%7C120HZ**
If i know that I might need other query parameters in the future, I think it'll be more prudent to decode and parse the URL.
String url = URLDecoder.decode("http://www.gold.com/shc/s/c_10153_12605_" +
"Computers+%26+Electronics_Televisions?filter=Screen+Refresh+Rate" +
"%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All&viewItems=25&subCatView=true"
,"utf-8");
Pattern amp = Pattern.compile("&");
Pattern eq = Pattern.compile("=");
Map<String, String> params = new HashMap<String, String>();
String queryString = url.substring(url.indexOf('?') + 1);
for(String param : amp.split(queryString)) {
String[] pair = eq.split(param);
params.put(pair[0], pair[1]);
}
for(Entry<String, String> param : params.entrySet()) {
System.out.format("%s = %s\n", param.getKey(), param.getValue());
}
Output
subCatView = true
viewItems = 25
sName = View All
filter = Screen Refresh Rate|120HZ^Screen Size|37 in. to 42 in.
in your example, there is sometimes a "**" at the end before the "&". but basically, (assuming "filter=" is the start pattern you are looking for) you want something like:
"filter=([^&]+)&"
Using the regular expression (?<=filter=\*{0,2})[^&]*[^&*]+ in java:
Pattern p = Pattern.compile("(?<=filter=\\*{0,2})[^&]*[^&*]+");
String s = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
EDIT:
Added [^&*]+ to the end of the regex to prevent the ** from being included in the second match.
EDIT2:
Changed regular expression to use lookbehind.
The regex you're looking for is
Screen\+Refresh\+Rate[^&]*
You could use Matcher.find() to find all matches.
are you looking for a string that follows with "filter=" and ignores the first "*" and is end with the first "&".
your can try the following:
String str = "1234567 100,110,116,129,139,140,144,146 http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ%5EScreen+Size%7C37+in.+to+42+in.&sName=View+All**&viewItems=25&subCatView=true ISx20070515x00001a http://www.gold.com/shc/s/c_10153_12605_Computers+%26+Electronics_Televisions?filter=**Screen+Refresh+Rate%7C120HZ**&sName=View+All&subCatView=true 0 2819357575609397706";
Pattern p = Pattern.compile("filter=(?:\\**)([^&]+?)(?:\\**)&");
Matcher matcher = p.matcher(str);
while(matcher.find()){
System.out.println(matcher.group(1));
}

Categories