Java: is "$1" a placeholder? [duplicate] - java

This question already has answers here:
JAVA - replaceAll in a regex with $1
(1 answer)
What does RegExp.$1 do
(6 answers)
Closed 1 year ago.
I was given a Java exercise:
Break up camelCase writing into words, for example the input "camelCaseTest" should give the output "camel Case Test".
I found this solution online, but I don't understand all of it
public static String camelCaseBetter(String input) {
input = input.replaceAll("([A-Z])", " $1");
return input;
}
What does the $1 do? I think it just takes the String that is to be replaced (A-Z) and replaces it with itself (in this case the method also appends a space to break up the words)
I couldn't find a good explanation for $1, so I hope somebody here can explain it or share a link to the right resource which can explain it.

From the documentation of the String class:
Note that backslashes (\) and dollar signs ($) in the replacement string may cause the results to be different than if it were being treated as a literal replacement string; see Matcher.replaceAll.
From Matcher.replaceAll
The replacement string may contain references to captured subsequences as in the appendReplacement method.
Then the appendReplacement method:
The replacement string may contain references to subsequences captured during the previous match: Each occurrence of ${name} or $g will be replaced by the result of evaluating the corresponding group(name) or group(g) respectively. For $g, the first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference. If the second group matched the string "foo", for example, then passing the replacement string "$2bar" would cause "foobar" to be appended to the string buffer. A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).
So, $1 will reference the the first capturing group (whatever matches the pattern within the first parentheses of the regular expression).
([A-Z]) will match any uppercase character and place it in the first capturing group. $1 will then replace it with a space, followed by the matched uppercase character.

Related

What is the functionality of this regex? [duplicate]

This question already has an answer here:
Reference - What does this regex mean?
(1 answer)
Closed 2 years ago.
I am recently learning regex and i am not quite sure how the following regex works:
str.replaceAll("(\\w)(\\w*)", "$2$1ay");
This allows us to do the following:
input string: "Hello World !"
return string: "elloHay orldWay !"
From what I know: w is supposed to match all word characters including 0-9 and underscore and $ matches stuff at the end of string.
In the replaceAll method, the first parameter can be a regex. It matches all words in the string with the regex and changes them to the second parameter.
In simple cases replaceAll works like this:
str = "I,am,a,person"
str.replaceAll(",", " ") // I am a person
It matched all the commas and replaced them with a space.
In your case, the match is every alphabetic character(\w), followed by a stream of alphabetic characters(\w*).
The () around \w is to group them. So you have two groups, the first letter and the remaining part. If you use regex101 or some similar website you can see a visualization of this.
Your replacement is $2 -> Second group, followed by $1(remaining part), followed by ay.
Hope this clears it up for you.
Enclosing a regex expression in brackets () will make it a Capturing group.
Here you have 2 capturing groups , (\w) captures a single word character, and (\w*) catches zero or more.
$1 and $2 are used to refer to the captured groups, first and second respectively.
Also replaceAll takes each word individually.
So in this example in 'Hello' , 'H' is the first captured groups and 'ello' is the second. It's replaced by a reordered version - $2$1 which is basically swapping the captured groups.
So you get '$2$1ay' as 'elloHay'
The same for the next word also.

How to write a regular expression in Java to remove all characters in a String after the matching character? [duplicate]

I need a regular expression able to match everything but a string starting with a specific pattern (specifically index.php and what follows, like index.php?id=2342343).
Regex: match everything but:
a string starting with a specific pattern (e.g. any - empty, too - string not starting with foo):
Lookahead-based solution for NFAs:
^(?!foo).*$
^(?!foo)
Negated character class based solution for regex engines not supporting lookarounds:
^(([^f].{2}|.[^o].|.{2}[^o]).*|.{0,2})$
^([^f].{2}|.[^o].|.{2}[^o])|^.{0,2}$
a string ending with a specific pattern (say, no world. at the end):
Lookbehind-based solution:
(?<!world\.)$
^.*(?<!world\.)$
Lookahead solution:
^(?!.*world\.$).*
^(?!.*world\.$)
POSIX workaround:
^(.*([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.])|.{0,5})$
([^w].{5}|.[^o].{4}|.{2}[^r].{3}|.{3}[^l].{2}|.{4}[^d].|.{5}[^.]$|^.{0,5})$
a string containing specific text (say, not match a string having foo):
Lookaround-based solution:
^(?!.*foo)
^(?!.*foo).*$
POSIX workaround:
Use the online regex generator at www.formauri.es/personal/pgimeno/misc/non-match-regex
a string containing specific character (say, avoid matching a string having a | symbol):
^[^|]*$
a string equal to some string (say, not equal to foo):
Lookaround-based:
^(?!foo$)
^(?!foo$).*$
POSIX:
^(.{0,2}|.{4,}|[^f]..|.[^o].|..[^o])$
a sequence of characters:
PCRE (match any text but cat): /cat(*SKIP)(*FAIL)|[^c]*(?:c(?!at)[^c]*)*/i or /cat(*SKIP)(*FAIL)|(?:(?!cat).)+/is
Other engines allowing lookarounds: (cat)|[^c]*(?:c(?!at)[^c]*)* (or (?s)(cat)|(?:(?!cat).)*, or (cat)|[^c]+(?:c(?!at)[^c]*)*|(?:c(?!at)[^c]*)+[^c]*) and then check with language means: if Group 1 matched, it is not what we need, else, grab the match value if not empty
a certain single character or a set of characters:
Use a negated character class: [^a-z]+ (any char other than a lowercase ASCII letter)
Matching any char(s) but |: [^|]+
Demo note: the newline \n is used inside negated character classes in demos to avoid match overflow to the neighboring line(s). They are not necessary when testing individual strings.
Anchor note: In many languages, use \A to define the unambiguous start of string, and \z (in Python, it is \Z, in JavaScript, $ is OK) to define the very end of the string.
Dot note: In many flavors (but not POSIX, TRE, TCL), . matches any char but a newline char. Make sure you use a corresponding DOTALL modifier (/s in PCRE/Boost/.NET/Python/Java and /m in Ruby) for the . to match any char including a newline.
Backslash note: In languages where you have to declare patterns with C strings allowing escape sequences (like \n for a newline), you need to double the backslashes escaping special characters so that the engine could treat them as literal characters (e.g. in Java, world\. will be declared as "world\\.", or use a character class: "world[.]"). Use raw string literals (Python r'\bworld\b'), C# verbatim string literals #"world\.", or slashy strings/regex literal notations like /world\./.
You could use a negative lookahead from the start, e.g., ^(?!foo).*$ shouldn't match anything starting with foo.
You can put a ^ in the beginning of a character set to match anything but those characters.
[^=]*
will match everything but =
Just match /^index\.php/, and then reject whatever matches it.
In Python:
>>> import re
>>> p='^(?!index\.php\?[0-9]+).*$'
>>> s1='index.php?12345'
>>> re.match(p,s1)
>>> s2='index.html?12345'
>>> re.match(p,s2)
<_sre.SRE_Match object at 0xb7d65fa8>
Came across this thread after a long search. I had this problem for multiple searches and replace of some occurrences. But the pattern I used was matching till the end. Example below
import re
text = "start![image]xxx(xx.png) yyy xx![image]xxx(xxx.png) end"
replaced_text = re.sub(r'!\[image\](.*)\(.*\.png\)', '*', text)
print(replaced_text)
gave
start* end
Basically, the regex was matching from the first ![image] to the last .png, swallowing the middle yyy
Used the method posted above https://stackoverflow.com/a/17761124/429476 by Firish to break the match between the occurrence. Here the space is not matched; as the words are separated by space.
replaced_text = re.sub(r'!\[image\]([^ ]*)\([^ ]*\.png\)', '*', text)
and got what I wanted
start* yyy xx* end

Regex in Android JAVA - How to specify more than 9 backreferences?

I use multiple groups in a Regex search and replace many parts of a string. I use $1 $2 etc in Android JAVA when using String.replaceFirst.
If I use more than nine groups in my Regex search when trying to reference them in replaceFirst for example $10 , it will replace the first back reference and then prints a literal 0.
Is there anyway I can use a tenth reference? Is there a different way of referencing it?
Example, but I'm trying to use more than nine back references. $10 sees only $1.
String.replaceFirst("(hello)(.*)(this)","$1middle$2");
TL;DR If you experience that $10 is treated as $1 and a 0, then your regex doesn't have 10 capture groups.
The $ back-references in the replacement value is documented in the javadoc for the appendReplacement method:
The replacement string may contain references to subsequences captured during the previous match: Each occurrence of ${name} or $g will be replaced by the result of evaluating the corresponding group(name) or group(g) respectively. For $g, the first number after the $ is always treated as part of the group reference. Subsequent numbers are incorporated into g if they would form a legal group reference. Only the numerals '0' through '9' are considered as potential components of the group reference. If the second group matched the string "foo", for example, then passing the replacement string "$2bar" would cause "foobar" to be appended to the string buffer. A dollar sign ($) may be included as a literal in the replacement string by preceding it with a backslash (\$).
So, let's say we have 11 groups:
System.out.println("ABCDEFGHIJKLMN".replaceFirst("(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)",
"$11$10$9$3$2$1"));
Here we capture the first 11 characters as individual groups, so e.g. group(1) returns "A" and group(11) returns "K". The input string has 14 characters, so the last 3 (LMN) are not replaced. The result is:
KJICBALMN
If we remove capture group 11 from the regex, then $11 is not a legal group reference, and will be interpreted as $1 and the literal 1:
System.out.println("ABCDEFGHIJKLMN".replaceFirst("(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)",
"$11$10$9$3$2$1"));
Prints:
A1JICBAKLMN
So, if you experience that $10 is treated as a $1 back-reference and a literal 0, then your regex doesn't have 10 groups.
You can also name them with (?<name>...) and then reference them with ${name}.
String.replaceFirst("(?<g1>hello)(?<g2>.*)(?<g3>this)","${g1}middle${g2}");

regex help in java

I'm trying to compare following strings with regex:
#[xyz="1","2"'"4"] ------- valid
#[xyz] ------------- valid
#[xyz="a5","4r"'"8dsa"] -- valid
#[xyz="asd"] -- invalid
#[xyz"asd"] --- invalid
#[xyz="8s"'"4"] - invalid
The valid pattern should be:
#[xyz then = sign then some chars then , then some chars then ' then some chars and finally ]. This means if there is characters after xyz then they must be in format ="XXX","XXX"'"XXX".
Or only #[xyz]. No character after xyz.
I have tried following regex, but it did not worked:
String regex = "#[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"]";
Here the quotations (in part after xyz) are optional and number of characters between quotes are also not fixed and there could also be some characters before and after this pattern like asdadad #[xyz] adadad.
You can use the regex:
#\[xyz(?:="[a-zA-z0-9]+","[a-zA-z0-9]+"'"[a-zA-z0-9]+")?\]
See it
Expressed as Java string it'll be:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
What was wrong with your regex?
[...] defines a character class. When you want to match literal [ and ] you need to escape it by preceding with a \.
[a-zA-z][0-9] match a single letter followed by a single digit. But you want one or more alphanumeric characters. So you need [a-zA-Z0-9]+
Use this:
String regex = "#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\]";
When you write [a-zA-z][0-9] it expects a letter character and a digit after it. And you also have to escape first and last square braces because square braces have special meaning in regexes.
Explanation:
[a-zA-z0-9]+ means alphanumeric character (but not an underline) one or more times.
(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? means that expression in parentheses can be one time or not at all.
Since square brackets have a special meaning in regex, you used it by yourself, they define character classes, you need to escape them if you want to match them literally.
String regex = "#\\[xyz=\"[a-zA-z][0-9]\",\"[a-zA-z][0-9]\"'\"[a-zA-z][0-9]\"\\]";
The next problem is with '"[a-zA-z][0-9]' you define "first a letter, second a digit", you need to join those classes and add a quantifier:
String regex = "#\\[xyz=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\"\\]";
See it here on Regexr
there could also be some characters before and after this pattern like
asdadad #[xyz] adadad.
Regex should be:
String regex = "(.)*#\\[xyz(=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")?\\](.)*";
The First and last (.)* will allow any string before the pattern as you have mentioned in your edit. As said by #ademiban this (=\"[a-zA-z0-9]+\",\"[a-zA-z0-9]+\"'\"[a-zA-z0-9]+\")? will come one time or not at all. Other mistakes are also very well explained by Others +1 to all other.

How to replace a special character with single slash

I have a question about strings in Java. Let's say, I have a string like so:
String str = "The . startup trace ?state is info?";
As the string contains the special character like "?" I need the string to be replaced with "\?" as per my requirement. How do I replace special characters with "\"? I tried the following way.
str.replace("?","\?");
But it gives a compilation error. Then I tried the following:
str.replace("?","\\?");
When I do this it replaces the special characters with "\\". But when I print the string, it prints with single slash. I thought it is taking single slash only but when I debugged I found that the variable is taking "\\".
Can anyone suggest how to replace the special characters with single slash ("\")?
On escape sequences
A declaration like:
String s = "\\";
defines a string containing a single backslash. That is, s.length() == 1.
This is because \ is a Java escape character for String and char literals. Here are some other examples:
"\n" is a String of length 1 containing the newline character
"\t" is a String of length 1 containing the tab character
"\"" is a String of length 1 containing the double quote character
"\/" contains an invalid escape sequence, and therefore is not a valid String literal
it causes compilation error
Naturally you can combine escape sequences with normal unescaped characters in a String literal:
System.out.println("\"Hey\\\nHow\tare you?");
The above prints (tab spacing may vary):
"Hey\
How are you?
References
JLS 3.10.6 Escape Sequences for Character and String Literals
See also
Is the char literal '\"' the same as '"' ?(backslash-doublequote vs only-doublequote)
Back to the problem
Your problem definition is very vague, but the following snippet works as it should:
System.out.println("How are you? Really??? Awesome!".replace("?", "\\?"));
The above snippet replaces ? with \?, and thus prints:
How are you\? Really\?\?\? Awesome!
If instead you want to replace a char with another char, then there's also an overload for that:
System.out.println("How are you? Really??? Awesome!".replace('?', '\\'));
The above snippet replaces ? with \, and thus prints:
How are you\ Really\\\ Awesome!
String API links
replace(CharSequence target, CharSequence replacement)
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence.
replace(char oldChar, char newChar)
Returns a new string resulting from replacing all occurrences of oldChar in this string with newChar.
On how regex complicates things
If you're using replaceAll or any other regex-based methods, then things becomes somewhat more complicated. It can be greatly simplified if you understand some basic rules.
Regex patterns in Java is given as String values
Metacharacters (such as ? and .) have special meanings, and may need to be escaped by preceding with a backslash to be matched literally
The backslash is also a special character in replacement String values
The above factors can lead to the need for numerous backslashes in patterns and replacement strings in a Java source code.
It doesn't look like you need regex for this problem, but here's a simple example to show what it can do:
System.out.println(
"Who you gonna call? GHOSTBUSTERS!!!"
.replaceAll("[?!]+", "<$0>")
);
The above prints:
Who you gonna call<?> GHOSTBUSTERS<!!!>
The pattern [?!]+ matches one-or-more (+) of any characters in the character class [...] definition (which contains a ? and ! in this case). The replacement string <$0> essentially puts the entire match $0 within angled brackets.
Related questions
Having trouble with Splitting text. - discusses common mistakes like split(".") and split("|")
Regular expressions references
regular-expressions.info
Character class and Repetition with Star and Plus
java.util.regex.Pattern and Matcher
In case you want to replace ? with \?, there are 2 possibilities: replace and replaceAll (for regular expressions):
str.replace("?", "\\?")
str.replaceAll("\\?","\\\\?");
The result is "The . startup trace \?state is info\?"
If you want to replace ? with \, just remove the ? character from the second argument.
But when I print the string, it prints
with single slash.
Good. That's exactly what you want, isn't it?
There are two simple rules:
A backslash inside a String literal has to be specified as two to satisfy the compiler, i.e. "\". Otherwise it is taken as a special-character escape.
A backslash in a regular expresion has to be specified as two to satisfy regex, otherwise it is taken as a regex escape. Because of (1) this means you have to write 2x2=4 of them:"\\\\" (and because of the forum software I actually had to write 8!).
String str="\\";
str=str.replace(str,"\\\\");
System.out.println("New String="+str);
Out put:- New String=\
In java "\\" treat as "\". So, the above code replace a "\" single slash into "\\".

Categories