This question already has answers here:
Removing duplicates from a String in Java
(50 answers)
Closed 7 years ago.
If I have a string "SSSAAADDDCCC" how would I print just "SADC". Can it be done using SubString or would I have to use charAt()?
There is a simple way to do this - however since I don't see any code and I do not see any effort on your part I will not just give you the answer. Below is some psudo code you can work off to try to find the right answer. Good Luck!
currentChar = myString.charAt(0);
i = 0;
print current character //as per comments, cover the base case
while(string has more characters)
if current character != next character
print next character
i++
Use regular expression to replace all repeating characters with a single character:
"SSSAAADDDCCC".replaceAll("(.)\\1+", "$1") // returns "SADC"
(.) matches and captures a character.
\\1+ matches one or more instances of the captured character.
$1 replaces the entire matched value with the captured character.
Non-repeating characters are not matched, and are therefore left alone.
If you don't like the charAt method you could use substrings like this:
int j=0;
String in="sssdddaaaccc";
String out="";
for(int i=0;i<4;i++)
{
out=out+in.subString(j,j+1);
for(j=j; j<3;j++);
}
System.out.println(out);
Related
This question already has answers here:
String.replaceAll() is not working for some strings
(6 answers)
Closed 2 years ago.
I am using the string.replaceFirst() method in order to replace the first instance of <text> with another string. I used the indexOf method to search for both brackets, and then the replaceFirst method. It works perfectly if text is replaced with any string with an alphanumeric character at the end, but fails to replace when I do something like <some string$>. For reference, the method is
public static String substituteWord(String original, String word) {
int index1 = original.indexOf("<");
int index2 = original.indexOf(">");
storyLine = original.replaceFirst(original.substring(index1,index2+1), word);
return original;
}
The code doesn't look broken, but why does using a dollar sign make this method fail?
Strictly speaking, the first argument to replaceFirst() and replaceAll() is a regular expression, and a dollar sign in the replacement string has a special meaning of 'group x that was matched against (captured by) the regular expression'.
So the solution is to wrap the first argument in Pattern.quote() and the second argument in Matcher.quoteReplacement() to avoid this special behaviour:
String strToReplace = original.substring(index1,index2+1);
storyLine = original.replaceFirst(Pattern.quote(strToReplace), Matcher.quoteReplacement(word));
As an example of when you would want the special behaviour with the dollar sign, consider this example:
str = str.replaceAll("<b>([^<]*)</b>", "<i>$1</i>");
This would take a piece of bold text from some HTML and replace it with 'whatever was inside the bold tags, but in italics instead'. The parentheses () in the regular expression mean 'capture this substring' as group 1, and then the $1 means 'replace with whatever was captured as group 1'.
This question already has answers here:
regexp dot with square brackets or not
(2 answers)
Closed 5 years ago.
I use a String where I need to get rid of any occurence of: < any string here >
I tried line = line.replaceAll("<[.]+>", "");
but it gives the same String... How can I delete the < any string between > substrings?
As per my original comment...
Brief
Your regex <[.]+> says to match <, followed by the dot character . (literally) one or more times, followed by >
Removing [] will get you a semi-appropriate answer. The problem with this is that it's greedy, so it'll actually replace everything from the first occurrence of < to the last occurrence of > in the entire string (see the link to see it in action).
What you want is to either make the quantifier lazy or use a character class to ensure it's not going past the ending character.
Code
Method 1 - Lazy Quantifier
This method .*? matches any character any number of times, but as few as possible
See regex in use here
<.*?>
Method 2 - Character Set
This method [^>]* matches any character except > any number of times
See regex in use here
<[^>]*>
Note: This method performs much better than the first.
line = line.replaceAll("<[^>]*>", "");
This question already has answers here:
Removing repeated characters in String
(4 answers)
Closed 8 years ago.
Lets say I have a string:
tttteeeeeeessssssttttttt
Using the power of regex, how can that string be turned into:
test
At first look it seems easy to do, but the current code (not regex) I have for it is not behaving well and im pretty sure regex is the way to go.
You can use:
str = str.replaceAll("([A-Za-z])\\1+", "$1");
RegEx Demo
Use string.replaceAll function.
strng.replaceAll("(.)\\1+", "$1");
The above regex captures the first character in the sequence of same characters and matches all the following one or more characters (which must be same as the one inside the capturing group) . Replacing those characters with the character inside group index 1 will give you the desired output.
Example:
System.out.println("tttteeeeeeessssssttttttt".replaceAll("(.)\\1+","$1" ));
Output:
test
(.)(?=\1)
Try this.Replace by empty string.See demo.
https://regex101.com/r/tX2bH4/41
str = str.replaceAll("(.)(?=\\1)", "");
I have string with spaces and some non-informative characters and substrings required to be excluded and just to keep some important sections. I used the split as below:
String myString[]={"01: Hi you look tired today? Can I help you?"};
myString=myString[0].split("[\\s+]");// Split based on any white spaces
for(int ii=0;ii<myString.length;ii++)
System.out.println(myString[ii]);
The result is :
01:
Hi
you
look
tired
today?
Can
I
help
you?
The spaces appeared after the split as sub strings when the regex is “[\s+]” but disappeared when the regex is "\s+". I am confused and not able to find answer in the related stack overflow pages. The link regex-Pattern made me more confused.
Please help, I am new with java.
19/1/2015:Edit
After your valuable advice, I reached to point in my program where a conditional statements is required to be decomposed and processed. The case I have is:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\,]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result is fine till now as:
01:IF
rd.h
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
with
0.4610;
My next step is to add string "with" to the regex and get rid of this word while doing the split.
I tried it this way:
String s1="01:IF rd.h && dq.L && o.LL && v.L THEN la.VHB , av.VHR with 0.4610;";
String [] s2=s1.split(("[\\s\\&\\, with]+"));
for(int ii=0;ii<s2.length;ii++)System.out.println(s2[ii]);
The result not perfect, because I got unwonted extra split at every "h" letter as:
01:IF
rd.
dq.L
o.LL
v.L
THEN
la.VHB
av.VHR
0.4610;
Any advice on how to specify string with mixed white spaces and separation marks?
Many thanks.
inside square brackets, [\s+] will represent the whitespace character class with the plus sign added. it is only one character so a sequence of spaces will split many empty strings as Todd noted, and will also use + as separator.
you should use \s+ (without brackets) as the separator. that means one or more whitespace characters.
myString=myString[0].split("\\s+");
Your biggest problem is not understanding enough about regular expressions to write them properly. One key point you don't comprehend is that [...] is a character class, which is a list of characters any one of which can match. For example:
[abc] matches either a, b or c (it does not match "abc")
[\\s+] matches any whitespace or "+" character
[with] matches a single character that is either w, i, t or h
[.$&^?] matches those literal characters - most characters lose their special regex meaning when in a character class
To split on any number of whitespace, comma and ampersand and consume "with" (if it appears), do this:
String [] s2 = s1.split("[\\s,&]+(with[\\s,&]+)?");
You can try it easily here Online Regex and get useful comments.
This question already has answers here:
Verify if String is hexadecimal
(8 answers)
Closed 8 years ago.
I need user input for a hexadecimal number so as long as their input contains the characters A-F or 0-9 it won't re-prompt them.
This is what I have which runs as long as the inputed string contains A-F and or 0-9, it still runs if you add on other characters which I don't want.
do {
System.out.print("Enter a # in hex: ");
inputHexNum = keyboard.next();
} while(!(inputHexNum.toUpperCase().matches(".*[A-F0-9].*")));
Could you not change your regex to be [A-F0-9]+?
So your code would look like the following:
do {
System.out.print("Enter a # in hex: ");
inputHexNum = keyboard.next();
} while(!(inputHexNum.toUpperCase().matches("[A-F0-9]+")));
As I understand the question, the problem with your current regex is that it allows any character to occur zero or more times, followed by a hex character, followed by any old character zero or more times again. This restricts the entire input to only containing at least one character that consists of the letters A-F (uppercase) and the digits 0-9.
Your regular expression probably doesn't do what you want. .* matches anything at all (empty string up to any number of arbitrary characters). Then you expect a single hex character followed again by anything.
So these would be valid inputs:
--0--
a
JFK
You should either say "I want a string which contains only valid hex digits. Then your condition would be:
while(!(inputHexNum.toUpperCase().matches("[A-F0-9]+")));
or you can check for any illegal characters with the pattern [^A-F0-9]. In this case, you'd need to create a Matcher yourself:
Pattern illegalCharacters = Pattern.compile("[^A-F0-9]");
Matcher matcher;
do {
...
matcher = illegalCharacters.matches(inputHexNum.toUpperCase());
} while( matcher.find() );
The regular expression that you are using matches every string that contains at least one hex digit. Judging from the first paragraph of the question this is exactly what you want. This is because "." matches any character (but possibly not linebreaks), so ".*" matches any (possibly empty) sequence of characters. Thus the regex ".*[A-F0-9].*" means "first, some arbitrary characters, then a hex digit, then some more characters". But from the second paragraph of the question it looks like you want to use the regex "[A-F0-9]+" which means "some hex digits (but at least one, and nothing else)". I assume you are confused about what needs to be done, but actually want the second.