How to repeat every character of a given String in java? - java

How to repeat every character of a given String in java?
For example:
String s = "Hello";
Becomes:
s = "HHeelllloo";

Use regex!
s = s.replaceAll(".", "$0$0");
OK, so how does this work?
The replaceAll() method takes a regex as the search term, and a dot matches every character. So every character will be replaced.
The replacement term can contain back references to captured groups, which are coded as $n, where n is 1-9. But there's a special implicit group zero that is the entire match, so $0$0 means "the whole match twice".
Overall, in English this means "replace every character with two copies of itself".

Related

Capturing groups and Pattern split method in regular expression

How can I understand the output of the below code? The code's first four print statements are about the Capturing Groups in Regular Expression in Java and the rest of the code is about the Pattern split method. I referred a few documents to perceive the code's output (shown in the pic) but could not figured it out how exactly it's working and showing this output.
Java Code
import java.util.*;
import java.util.regex.*;
import java.lang.*;
import java.io.*;
/* Name of the class has to be "Main" only if the class is public. */
public class Codechef
{
public static void main(String[] args) {
//Capturing Group in Regular Expression
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
// using pattern split method
Pattern pattern = Pattern.compile("\\W");
String[] words = pattern.split("one#two#three:four$five");
System.out.println(words);
for (String s : words) {
System.out.println("Split using Pattern.split(): " + s);
}
}
}
Results
Edit-1
Queries
If I talk about Capturing Groups, I cannot figure out what’s use of ‘\1’ or ‘\2’ here? How these are evaluating to true or false.
If I talk about Pattern split method, I wish to know how the string split is happening. How does this split method work differently than a normal string split method?
The first console print lines...
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
utilizes the matches() method which always returns a boolean (true or false). This method is mostly used for String validation of one sort or another. Taking the first and second example regular expressions which both are: "(\\w\\d)\\1" and then work that expression against the two supplied strings ("a2a2" and "a2b2") though the matches() method as they have done you will definitely be returned a boolean true and a false in that order.
The real key here is knowing what that particular Regular Expression is suppose to validate. The expression above is only working against 1 Capturing Group which is denoted by the parentheses. The \\w is used for matching any single word character which is equal to a-z or A-Z or 0-9 and _ (the underscore character). The \\d is used for matching a single digit equal to any number from 0 to 9.
Note: In reality the expression Meta characters are written as \w and \d but because the Escape Character (\) in Java Strings need to be escaped you have to add an additional Escape
Character.
The \1 is used to see if there is a single match of the same text as most recently matched by the 1st capturing group. Since there is only one capturing group specified you can only use a value of 1 here. Well, that's not entirely true, you could use the value of 0 here but then your not looking for a match in any capturing group which eliminates the purpose here. Any other value greater than 1 would create a expression exception since you have only 1 Capturing Group.
Bottom line, The expression looks at the first two characters within the supplied string:
Is the first character (\\w) within the supplied string a upper or lower case
A to Z or _ or a number from 0 to 9? If it isn't then there is no match and boolean false is returned but, if there is then.....
Is the second character (\\d) within the supplied string a digit
from 0 to 9? If it isn't then boolean false is returned but, if there is then....
Are the remaining 2 characters exactly the same (including letter
case if a-z or A-Z are used). If the remaining 2 characters are not
identical or there are more than two remaining characters then boolean
false is returned. If however those two remaining characters are identical then return boolean true.
Basically, the expression is merely used to validate that the Last Two characters within the supplied String match the First Two characters of the same supplied String. This is why the second console print:
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2b2")); //false
returns a boolean false, b2 is not the same as a2 whereas in the first console print:
System.out.println(Pattern.matches("(\\w\\d)\\1", "a2a2")); //true
the Last Two characters a2 do indeed match the First Two characters a2 and therefore boolean true is returned.
You will now notice that in the other two console prints:
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B2AB")); //true
System.out.println(Pattern.matches("(AB)(B\\d)\\2\\1", "ABB2B3AB")); //false
the Regular Expression used contains 2 Capture Groups (two sets of parentheses). The same sort of matching applies here but against two capture groups instead of one like the first two console prints.
If you want to see how these Regular Expressions play out and get explanations on what the expressions mean then use Regular Expression Tester at regex101.com. This is also a good Regular Expressions resource.
Pattern.split():
In this case, the use of the Pattern.split() method is a little overkill in my opinion since String.split() accepts Regular Expressions but does have it's purpose in other areas. Never the less it is a good example of how it can be used. The .split() method is used here to carry out the grouping based on the String that was supplied to it and what was deemed as the Regular Expression through Pattern which in this case is "\\W" (otherwise: \W). The \W (uppercase W) means 'match any non-word character which is not equal to a-z or A-Z or 0-9 or _. This expression is basically the opposite of "\w" (with the lowercase w). The characters #, #, :, and $ contained within the supplied String (yes... the comma, semicolon, exclamation, etc):
"one#two#three:four$five"
are considered non-word characters and therefore the split is carried out on any one of them resulting in a String Array containing:
[one, two, three, four, five]
The very same thing can be accomplished doing it this way using the String.split() method since tis method allows for a Regular Expression to be applied:
String[] s = "one#two#three;four$five".split("\\W");
or even:
String[] s = "one#two#three;four$five".split("[##:$]");
or even:
String[] s = "one#two#three;four$five".split("#|#|:|\\$");
// The $ character is a reserved RegEx symbol and therefore
// needs to be escaped.
or on and on and on...
Yup... "\\W" is easier since it covers all non-word characters. ;)
If i talk about Capturing Groups, I cannot figure out what is usage of ‘\1’ or ‘\2’ here? How these are evaluating to true or false.
Answer:
\\1 repeats the first captured group (i.e. a2 captured by (\\w\\d))
\\2 repeats the second captured group (i.e. B2 captured by (B\\d))
The actual name for those combinations is backreferences:
The section of the input string matching the capturing
group(s) is saved in memory for later recall via backreference. A
backreference is specified in the regular expression as a backslash
() followed by a digit indicating the number of the group to be
recalled.
If i talk about Pattern split method, I wish to know how the string split is happening. How does this split method work differently than a normal string split method?
Answer:
The split() method in the Pattern class can split a text into an array of String's, using the regular expression (the pattern) as delimiter
Rather than explicitly split a string using a fixes string or character, here you provide a regex, which is much more powerful and elastic.

Java Regex Quantifiers in String Split

The code:
String s = "a12ij";
System.out.println(Arrays.toString(s.split("\\d?")));
The output is [a, , , i, j], which confuses me. If the expression is greedy, shouldn't it try and match as much as possible, thereby splitting on each digit? I would assume that the output should be [a, , i, j] instead. Where is that extra empty character coming from?
The pattern you're using only matches one digit a time:
\d match a digit [0-9]
? matches between zero and one time (greedy)
Since you have more than one digit it's going to split on both of them individually. You can easily match more than one digit at a time more than a few different ways, here are a couple:
\d match a digit [0-9]
+? matches between one and unlimited times (lazy)
Or you could just do:
\d match a digit [0-9]
+ matches between one and unlimited times (greedy)
Which would likely be the closest to what I would think you would want, although it's unclear.
Explanation:
Since the token \d is using the ? quantifier the regex engine is telling your split function to match a digit between zero and one time. So that must include all of your characters (zero), as well as each digit matched (once).
You can picture it something like this:
a,1,2,i,j // each character represents (zero) and is split
| |
a, , ,i,j // digit 1 and 2 are each matched (once)
Digit 1 and 2 were matched but not captured — so they are tossed out, however, the comma still remains from the split, and is not removed basically producing two empty strings.
If you're specifically looking to have your result as a, ,i,j then I'll give you a hint. You'll want to (capture the \digits as a group between one and unlimited times+) followed up by the greedy qualifier ?. I recommend visiting one of the popular regex sites that allows you to experiment with patterns and quantifiers; it's also a great way to learn and can teach you a lot!
↳ The solution can be found here
The javadoc for split() is not clear on what happens when a pattern can match the empty string. My best guess here is the delimiters found by split() are what would be found by successive find() calls of a Matcher. The javadoc for find() says:
This method starts at the beginning of this matcher's region, or, if a
previous invocation of the method was successful and the matcher has
not since been reset, at the first character not matched by the
previous match.
So if the string is "a12ij" and the pattern matches either a single digit or an empty string, then find() should find the following:
Empty string starting at position 0 (before a)
The string "1"
The string "2"
Empty string starting at position 3 (before i). This is because "the first character not matched by the previous match" is the i.
Empty string starting at position 4 (before j).
Empty string starting at position 5 (at the end of the string).
So if the matches found are the substrings denoted by the x, where an x under a blank means the match is an empty string:
a 1 2 i j
x x x x x x
Now if we look at the substrings between the x's, they are "a", "", "", "i", "j" as you are seeing. (The substring before the first empty string is not returned, because the split() javadoc says "A zero-width match at the beginning however never produces such empty leading substring." [Note that this may be new behavior with Java 8.] Also, split() doesn't return empty trailing substrings.)
I'd have to look at the code for split() to confirm this behavior. But it makes sense looking at the Matcher javadoc and it is consistent with the behavior you're seeing.
MORE: I've confirmed from the source that split() does rely on Matcher and find(), except for an optimization for the common case of splitting on a one-known-character delimiter. So that explains the behavior.

Why doesn't [[a-z]*&&[^a]] catch "bc", but "b"?

Ok, so I have tried to become more familiar with the intersection in regex (&&).
On the java.util.Pattern page all the regex are explained and && is only ever used next to a range (like [a-z&&[^e]]). But I tried to use it like this: [[a-z]*&&[^a]]. To me it seemed logical that this would match all lower case strings, expect the string "a", but instead it seems to be equivalent with [a-z&&[^a]].
So the actual question is: Where did the * operator go? How does this only catch single character strings?
I think your approach is wrong to use an intersection: To match all lowercase strings except "a":
^(?!a$)[a-z]+$
And you can drop the wrapping ^ and $ when calling matches()"
if (input.matches("(?!a$)[a-z]+")) {
// it's an all-lowercase string, but not "a"
}
Of course you don't need regex. although it's a little long winded:
if (input.equals(input.toLowerCase()) && !input.equals("a"))
but you can read it more easily.
Inside a character class (marked by []) the * character has no special meaning. It simply represents the character itself.
So the regular expression
[[a-z]*&&[^a]]
allows exactly one character being one of the following:
b, c, d, ..., z, *
The [a-z] and the following * are unioned, and the resulting character class is intersected with [^a] which simply removes the a character.
Valid strings are (for example):
b
*
c
But
a
is not, as well as each string that contains more than one character.
Now to the solution for what you want. You want to have strings (allowing more than one character, I assume) that could also contain the letter 'a' but not the string "a" alone. The easiest is a group that does this distinction:
(?!a$)[a-z]*
The group (?!a$) is called a zero-width negative lookahead. It means that the looked at character is not consumed (zero-width), and it is not allowed (negative). The '$' character looks till the end. Otherwise, words beginning with 'a' would also be rejected.
Character Class Intersection is supported in Java. The problem is that inside a character class, * looses its special meaning and the literal star "*" will be matched instead. Your regex should be:
[a-z&&[^a]]*
Now it'll match all characters in the range "a-z" except the "a" character.
Example:
Pattern p = Pattern.compile("[a-z&&[^a]]");
Matcher m = p.matcher("a");
System.out.println(m.matches()); // false
Try to use * outside of class:
[[a-z]&&[^a]]*
Interception of two character classes gives you another character class.
And as said in other answers, * doesn't mean quantity inside class. So, use it outside.

Regex for First word and last word of a string separates with

I'm trying to get a regex for the following expression but can't make it:
String have 4 words separated with dots(.).
First word matches a given one (HELLO for example).
Second and third words could have any character but dot itself (.).
Last word matches a given one again(csv for example).
So:
HELLO.something.Somethi#gElse.csv should match.
something.HELLO.?.csv shouldn't match.
HELLO.something...csv shouldn't match.
HELLO.something.somethingelse.notcsv shouldn't match
I can do it with split(.) and then check for individual words, but I'm trying to get it working with Regex and Pattern class.
Any help would be really appreciated.
This is relatively straightforward, as long as you understand character classes. A regex with square brackets [xyz] matches any character from the list {x, y, z}; a regex [^xyz] matches any character except {x, y, z}.
Now you can construct your expression:
^HELLO\.[^.]+\.[^.]+\.csv$
+ means "one or more of the preceding expression"; \. means "dot itself". ^ means "the beginning of the string"; $ means "the end of the string". These anchors prevent regex from matching
blahblahHELLO.world.world.csvblahblah
Demo.
A common goal for writing regular expressions like that is to capture some content, for example, the string between the first and the second dot, and the string between the second and the third dot. Use capturing groups to bring the content of these strings into your Java program:
^HELLO\.([^.]+)\.([^.]+)\.csv$
Each pair of parentheses defines a capturing group, indexed from 1 (group at index zero represents the capture of the entire expression). Once you obtain a match object from the pattern, you can query it for the groups, and extract the corresponding strings.
Note that backslashes in Java regex need to be doubled.
(^HELLO\.[^.]+\.[^.]+\.csv$)
Here is the same regex with token explanation on regex101.

Replace multiple capture groups using regexp with java

I have this requirement - for an input string such as the one shown below
8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs
I would like to strip the matched word boundaries (where the matching pair is 8 or & or % etc) and will result in the following
This is really a test of repl%acing %mul%tiple matched 9pairs
This list of characters that is used for the pairs can vary e.g. 8,9,%,# etc and only the words matching the start and end with each type will be stripped of those characters, with the same character embedded in the word remaining where it is.
Using Java I can do a pattern as \\b8([^\\s]*)8\\b and replacement as $1, to capture and replace all occurrences of 8...8, but how do I do this for all the types of pairs?
I can provide a pattern such as \\b8([^\\s]*)8\\b|\\b9([^\\s]*)9\\b .. and so on that will match all types of matching pairs *8,9,..), but how do I specify a 'variable' replacement group -
e.g. if the match is 9...9, the the replacement should be $2.
I can of course run it through multiple of these, each replacing a specific type of pair, but I am wondering if there is a more elegant way.
Or is there a completely different way of approaching this problem?
Thanks.
You could use the below regex and then replace the matched characters by the characters present inside the group index 2.
(?<!\S)(\S)(\S+)\1(?=\s|$)
OR
(?<!\S)(\S)(\S*)\1(?=\s|$)
Java regex would be,
(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)
DEMO
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)(\\S)(\\S+)\\1(?=\\s|$)", "$2"));
Output:
This is reallly a test of repl%acing %mul%tiple matched 9pairs
Explanation:
(?<!\\S) Negative lookbehind, asserts that the match wouldn't be preceded by a non-space character.
(\\S) Captures the first non-space character and stores it into group index 1.
(\\S+) Captures one or more non-space characters.
\\1 Refers to the character inside first captured group.
(?=\\s|$) And the match must be followed by a space or end of the line anchor.
This makes sure that the first character and last character of the string must be the same. If so, then it replaces the whole match by the characters which are present inside the group index 2.
For this specific case, you could modify the above regex as,
String s1 = "8This8 is &reallly& a #test# of %repl%acing% %mul%tiple 9matched9 9pairs";
System.out.println(s1.replaceAll("(?<!\\S)([89&#%])(\\S+)\\1(?=\\s|$)", "$2"));
DEMO
(?<![a-zA-Z])[8&#%9](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[8&#%9](?![a-zA-Z])
Try this.Replace with $1 or \1.See demo.
https://regex101.com/r/qB0jV1/15
(?<![a-zA-Z])[^a-zA-Z](?=[a-zA-Z])([^\s]*?)(?<=[a-zA-Z])[^a-zA-Z](?![a-zA-Z])
Use this if you have many delimiters.

Categories