Removing repeated characters in String - java

I am having strings like this "aaaabbbccccaaddddcfggghhhh" and i want to remove repeated characters get a string like this "abcadcfgh".
A simplistic implementation for this would be :
for(Character c:str.toCharArray()){
if(c!=prevChar){
str2.append(c);
prevChar=c;
}
}
return str2.toString();
Is it possible to have a better implementation may be using regex?

You can do this:
"aaaabbbccccaaddddcfggghhhh".replaceAll("(.)\\1+","$1");
The regex uses backreference and capturing groups.
The normal regex is (.)\1+ but you've to escape the backslash by another backslash in java.
If you want number of repeated characters:
String test = "aaaabbbccccaaddddcfggghhhh";
System.out.println(test.length() - test.replaceAll("(.)\\1+","$1").length());
Demo

With regex, you can replace (.)\1+ with the replacement string $1.

You can use Java's String.replaceAll() method to simply do this with a regular expression.
String s = "aaaabbbccccaaddddcfggghhhh";
System.out.println(s.replaceAll("(.)\\1{1,}", "$1")) //=> "abcadcfgh"
Regular expression
( group and capture to \1:
. any character except \n
) end of \1
\1{1,} what was matched by capture \1 (at least 1 times)

use this pattern /(.)(?=\1)/g and replace with nothing
Demo

Related

How to extract and replace a String with specific format?

I have input String like;
(rm01ADS21212, 'adfffddd', rmAdssssss, '1231232131', rm2321312322)
What I want to do is find all words starting with "rm" and replace them with remove function.
(remove(01ADS21212), 'adfffddd', remove(Adssssss), '1231232131', remove(2321312322))
I am trying to use replaceAll function but I don't know how to extract parts after "rm" literal.
statement.replaceAll("\\(rm*.,", "remove($1)");
Is there any way to get these parts?
You have not captured any substring with a capturing group, thus $1 is null.
You may use
.replaceAll("\\brm(\\w*)", "remove($1)")
See the regex demo
Details
\b - a word boundary (to start matching only at the start of a word)
rm - a literal part
(\w*) - Group 1: 0+ word chars (letters, digits or underscores)
The $1 in the replacement pattern stands for Group 1 value.
If you mean to match any chars other than a comma and whitespace after rm, use "\\brm([^\\s,]*)", see this regex demo.
Use "Replace" with empty string .
Eg;
string str = "(rm01ADS21212, 'adfffddd', rmAdssssss, '1231232131', rm2321312322)";
Console.WriteLine(str.Replace("rm", ""));
Output : (01ADS21212, 'adfffddd', Adssssss, '1231232131', 2321312322)

Erase any string that doesn't match a pattern using replaceall()

I need to replace ALL characters that don't follow a pattern with "".
I have strings like:
MCC-QX-1081
TEF-CO-QX-4949
SPARE-QX-4500
So far the closest I am using the following regex.
String regex = "[^QX,-,\\d]";
Using the replaceAll String method I get QX1081 and the expected result is QX-1081
You're using a character class which matches single characters, not patterns.
You want something like
String resultString = subjectString.replaceAll("^.*?(QX-\\d+)?$", "$1");
which works as long as nothing follows the QX-digits part in your strings.
Put the dash at the end of the regex: [^QX,\d-]
Next you just have to substring to filter out the first dash.
Don't know exactly what you expect for all strings but if you want to match a dash in a character class then it must be set as last character.
You are using a character class where you have to either escape the hyphen or put it at the start or at the end like [^QX,\d-] or else you are matching a range from a comma to a comma. But changing that will give you -QX-1081 which is not the desired result.
You could match your pattern and then replace with the first capturing group $1:
^(?:[A-Z]+-)+(QX-\d+)$
In Java you have to double escape matching a digit \\d
That will match:
^ Start of the string
(?:[A-Z]+-)+ Repeat 1+ times one or more uppercase charactacters followed by a hyphen
(QX-\d+) Capture in a group QX- followed by 1+ digits
$ End of the string
For example:
String result = "MCC-QX-1081".replaceAll("^(?:[A-Z]+-)+(QX-\\d+)$", "$1");
System.out.println(result); // QX-1081
See the Regex demo | Java demo
Note that if you are doing just 1 replacement, you could also use replaceFirst

regex for '(number)'

I need to replace all the 's around numbers, to nothing.. for example:
'1' to 1
'100' to 100
which is the optimal way to do this? is there a regex to do this so I can use it in the replace() function of the String class?
You can use replaceAll method with regex support:
str = str.replaceAll("'(\\d+)'", "$1");
(\\d+) will match and group digits surrounded by single quotes on either side and then we use $1 in replacement which is the back-reference to captured value in regex.
If it's in a String and you want the integer why don't you just parse it.
int a = Integer.parseInt("100");

what is missing in my java regex?

I want to fetch
http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png
from
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
I have tried this code:
String a = "";
Pattern pattern = Pattern.compile("url(.*)");
Matcher matcher = pattern.matcher(imgpath);
if (matcher.find()) {
a = (matcher.group(1));
}
return a;
but a == (http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_639_o_4746_precious_image_1419867529.png)
how can I fine tune it?
Why use a regular expression to begin with?
Given
final String s = "url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)";
If the string is always the same format a simple substring(4,s.length()-1) would be better.
That said, if you insist on a regular expression:
You have to escape the ( with \( so in Java ( you have to escape the \ ) it would be \\( same with the ).
Then you can get the grouping with url\\((.+)\\), test it here!
Learn to use RegEx101.com before coming here, it will point out errors like this immediately.
As you already seem to know ( and )` represents groups which means that in regex
url(.*)
(.*) will place everything after url in group 1, which in case of
url(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
will be
(http://d1oiazdc2hzjcz.cloudfront.net/promotions/precious/2x/p_608_o_6288_precious_image_1419866866.png)
If you want to exclude ( and ) from match you need to add their literals to regex, which means you need to escape them. There are many things to do it, like adding \ before each of them, or surrounding them with [ ].
Other problem with your regex is that .* finds maximal potential match but since . represents any character (except line separators) it can also include ( and ). To solve this problem you can make * quantifier reluctant by adding ? after it so your final regex can be written as string
"url\\((.*?)\\)"
---------------
url
\\( - ( literal
(.*?) - group 1
\\) - ) literal
or you can use instead of . character class which will accept all characters except ) like
"url\\(([^)]*)\\)"
Try this regex:
url\((.*?)\)
The outermost parentheses are escaped so they will be matched literally. The inner parentheses are for capturing a group. The question mark after the .* is to make the match lazy, so the first closing parenthesis found will end the group.
Note that to use this regex in Java, you'll have to additionally escape the backslashes in order to express the above regex as a string literal:
String regex = "url\\((.*?)\\)";
You need to escape the () to match the parenthesis in the string, and then add another set of () around the part you want to pull out in group 1, the actual url. I also changed the part inside the parenthesis to [^)]*, which will match everything until it finds a ). See below:
url\(([^)]*)\)

Regular Expression for matching parentheses

What is the regular expression for matching '(' in a string?
Following is the scenario :
I have a string
str = "abc(efg)";
I want to split the string at '(' using regular expression.For that i am using
Arrays.asList(Pattern.compile("/(").split(str))
But i am getting the following exception.
java.util.regex.PatternSyntaxException: Unclosed group near index 2
/(
Escaping '(' doesn't seems to work.
Two options:
Firstly, you can escape it using a backslash -- \(
Alternatively, since it's a single character, you can put it in a character class, where it doesn't need to be escaped -- [(]
The solution consists in a regex pattern matching open and closing parenthesis
String str = "Your(String)";
// parameter inside split method is the pattern that matches opened and closed parenthesis,
// that means all characters inside "[ ]" escaping parenthesis with "\\" -> "[\\(\\)]"
String[] parts = str.split("[\\(\\)]");
for (String part : parts) {
// I print first "Your", in the second round trip "String"
System.out.println(part);
}
Writing in Java 8's style, this can be solved in this way:
Arrays.asList("Your(String)".split("[\\(\\)]"))
.forEach(System.out::println);
I hope it is clear.
You can escape any meta-character by using a backslash, so you can match ( with the pattern
\(.
Many languages come with a build-in escaping function, for example, .Net's Regex.Escape or Java's Pattern.quote
Some flavors support \Q and \E, with literal text between them.
Some flavors (VIM, for example) match ( literally, and require \( for capturing groups.
See also: Regular Expression Basic Syntax Reference
For any special characters you should use '\'.
So, for matching parentheses - /\(/
Because ( is special in regex, you should escape it \( when matching. However, depending on what language you are using, you can easily match ( with string methods like index() or other methods that enable you to find at what position the ( is in. Sometimes, there's no need to use regex.

Categories