Replace regex pattern to lowercase in java - java

I'm trying to replace a url string to lowercase but wanted to keep the certain pattern string as it is.
eg: for input like:
http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
The expected output would be lowercased url but the multiple macros are original:
http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
I was trying to capture the strings using regex but didn't figure out a proper way to do the replacement. Also it seemed using replaceAll() doesn't do the job. Any hint please?

It looks like you want to change any uppercase character which is not inside ${...} to its lowercase form.
With construct
Matcher matcher = ...
StringBuffer buffer = new StringBuffer();
while (matcher.find()){
String matchedPart = ...
...
matcher.appendReplacement(buffer, replacement);
}
matcher.appendTail(buffer);
String result = buffer.toString();
or since Java 9 we can use Matcher#replaceAll​(Function<MatchResult,String> replacer) and rewrite it like
String replaced = matcher.replaceAll(m -> {
String matchedPart = m.group();
...
return replacement;
});
you can dynamically build replacement based on matchedPart.
So you can let your regex first try to match ${...} and later (when ${..} will not be matched because regex cursor will not be placed before it) let it match [A-Z]. While iterating over matches you can decide based on match result (like its length or if it starts with $) if you want to use use as replacement its lowercase form or original form.
BTW regex engine allows us to place in replacement part $x (where x is group id) or ${name} (where name is named group) so we could reuse those parts of match. But if we want to place ${..} as literal in replacement we need to escape \$. To not do it manually we can use Matcher.quoteReplacement.
Demo:
String yourUrlString = "http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}";
Pattern p = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]");
Matcher m = p.matcher(yourUrlString);
StringBuffer sb = new StringBuffer();
while(m.find()){
String match = m.group();
if (match.length() == 1){
m.appendReplacement(sb, match.toLowerCase());
} else {
m.appendReplacement(sb, Matcher.quoteReplacement(match));
}
}
m.appendTail(sb);
String replaced = sb.toString();
System.out.println(replaced);
or in Java 9
String replaced = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]")
.matcher(yourUrlString)
.replaceAll(m -> {
String match = m.group();
if (match.length() == 1)
return match.toLowerCase();
else
return Matcher.quoteReplacement(match);
});
System.out.println(replaced);
Output: http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}

This regex will match all the characters before the first &macro, and put everything between http:// and the first &macro in its own group so you can modify it.
http://(.*?)&macro
Tested here
UPDATE: If you don't want to use groups, this regex will match only the characters between http:// and the first &macro
(?<=http://)(.*?)(?=&macro)
Tested here

Related

Extract multiple tokens from json path using Regex

I have to extract tokens from a text which I need to match using regex. An example text would be something like this.
data.orderType.`order.created.time`
Right now I'm using the following regex to tokenize this string.
`(.*?)`|[^.]+
This regex tokenizes the string partially, and gives tokens as
data,orderType,`order.created.time`
the problem here is when the tokens are taken backtick also gets included. How can I dump the backtick and just get the following?
data,orderType,order.created.time
You already captured the part between backticks, just grab matcher.group(1) if it participated in the match (=if it matched):
Java demo:
String s = "data.orderType.`order.created.time`";
String regex = "`([^`]*)`|[^.`]+";
List<String> result = new ArrayList<>();
Matcher m = Pattern.compile(regex).matcher(s);
while (m.find()) {
if (m.group(1) != null) {
result.add(m.group(1));
} else {
result.add(m.group());
}
}
System.out.println(result);
// => [data, orderType, order.created.time]
Note I also added a backtick to the negated character class, [^.`]+ as I assume the backticks can only be paired.

How to Regex replace characters at the end of a string

I have an issue in Java when trying to remove the characters from the end of a string. This has now become a generic pattern match issue that I cannot resolve.
PROBLEM = remove all pluses, minuses and spaces (not bothered about whitespace) from the end of a string.
Pattern myRegex;
Matcher myMatch;
String myPattern = "";
String myString = "";
String myResult = "";
myString="surname; forename--+ + --++ "
myPattern="^(.*)[-+ ]*$"
//expected result = "surname; forename"
myRegex = Pattern.compile(myPattern);
myMatch = myRegex.matcher(myString);
if (myMatch.find( )) {
myResult = myMatch.group(1);
} else {
myResult = myString;
}
The only way I can get this to work is by reversing the string and reversing the pattern match, then I reverse the result to get the right answer!
In the following pattern:
^(.*)[-+ ]*$
... the .* is a greedy match. This means that it will match as many characters as possible while still allowing the entire pattern to match.
You need to change it to non-greedy by adding ?.
^(.*?)[-+ ]*$

Regex to get value between two colon excluding the colons

I have a string like this:
something:POST:/some/path
Now I want to take the POST alone from the string. I did this by using this regex
:([a-zA-Z]+):
But this gives me a value along with colons. ie I get this:
:POST:
but I need this
POST
My code to match the same and replace it is as follows:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
System.out.println(matcher.group());
ss = ss.replaceFirst(":([a-zA-Z]+):", "*");
}
System.out.println(ss);
EDIT:
I've decided to use the lookahead/lookbehind regex since I did not want to use replace with colons such as :*:. This is my final solution.
String s = "something:POST:/some/path/";
String regex = "(?<=:)[a-zA-Z]+(?=:)";
Matcher matcher = Pattern.compile(regex).matcher(s);
if (matcher.find()) {
s = s.replaceFirst(matcher.group(), "*");
System.out.println("replaced: " + s);
}
else {
System.out.println("not replaced: " + s);
}
There are two approaches:
Keep your Java code, and use lookahead/lookbehind (?<=:)[a-zA-Z]+(?=:), or
Change your Java code to replace the result with ":*:"
Note: You may want to define a String constant for your regex, since you use it in different calls.
As pointed out, the reqex captured group can be used to replace.
The following code did it:
String ss = "something:POST:/some/path/";
Pattern pattern = Pattern.compile(":([a-zA-Z]+):");
Matcher matcher = pattern.matcher(ss);
if (matcher.find()) {
ss = ss.replaceFirst(matcher.group(1), "*");
}
System.out.println(ss);
UPDATE
Looking at your update, you just need ReplaceFirst only:
String result = s.replaceFirst(":[a-zA-Z]+:", ":*:");
See the Java demo
When you use (?<=:)[a-zA-Z]+(?=:), the regex engine checks each location inside the string for a * before it, and once found, tries to match 1+ ASCII letters and then assert that there is a : after them. With :[A-Za-z]+:, the checking only starts after a regex engine found : character. Then, after matching :POST:, the replacement pattern replaces the whole match. It is totlally OK to hardcode colons in the replacement pattern since they are hardcoded in the regex pattern.
Original answer
You just need to access Group 1:
if (matcher.find()) {
System.out.println(matcher.group(1));
}
See Java demo
Your :([a-zA-Z]+): regex contains a capturing group (see (....) subpattern). These groups are numbered automatically: the first one has an index of 1, the second has the index of 2, etc.
To replace it, use Matcher#appendReplacement():
String s = "something:POST:/some/path/";
StringBuffer result = new StringBuffer();
Matcher m = Pattern.compile(":([a-zA-Z]+):").matcher(s);
while (m.find()) {
m.appendReplacement(result, ":*:");
}
m.appendTail(result);
System.out.println(result.toString());
See another demo
This is your solution:
regex = (:)([a-zA-Z]+)(:)
And code is:
String ss = "something:POST:/some/path/";
ss = ss.replaceFirst("(:)([a-zA-Z]+)(:)", "$1*$3");
ss now contains:
something:*:/some/path/
Which I believe is what you are looking for...

Remove occurrences of a given character sequence at the beginning of a string using Java Regex

I have a string that begins with one or more occurrences of the sequence "Re:". This "Re:" can be of any combinations, for ex. Re<any number of spaces>:, re:, re<any number of spaces>:, RE:, RE<any number of spaces>:, etc.
Sample sequence of string : Re: Re : Re : re : RE: This is a Re: sample string.
I want to define a java regular expression that will identify and strip off all occurrences of Re:, but only the ones at the beginning of the string and not the ones occurring within the string.
So the output should look like This is a Re: sample string.
Here is what I have tried:
String REGEX = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)";
String INPUT = title;
String REPLACE = "";
Pattern p = Pattern.compile(REGEX);
Matcher m = p.matcher(INPUT);
while(m.find()){
m.appendReplacement(sb,REPLACE);
}
m.appendTail(sb);
I am using p{Z} to match whitespaces(have found this somewhere in this forum, as Java regex does not identify \s).
The problem I am facing with this code is that the search stops at the first match, and escapes the while loop.
Try something like this replace statement:
yourString = yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
Explanation of the regex:
(?i) make it case insensitive
^ anchor to start of string
( start a group (this is the "re:")
\\s* any amount of optional whitespace
re "re"
\\s* optional whitespace
: ":"
\\s* optional whitespace
) end the group (the "re:" string)
+ one or more times
in your regex:
String regex = "^(Re*\\p{Z}*:?|re*\\p{Z}*:?|\\p{Z}Re*\\p{Z}*:?)"
here is what it does:
see it live here
it matches strings like:
\p{Z}Reee\p{Z: or
R\p{Z}}}
which make no sense for what you try to do:
you'd better use a regex like the following:
yourString.replaceAll("(?i)^(\\s*re\\s*:\\s*)+", "");
or to make #Doorknob happy, here's another way to achieve this, using a Matcher:
Pattern p = Pattern.compile("(?i)^(\\s*re\\s*:\\s*)+");
Matcher m = p.matcher(yourString);
if (m.find())
yourString = m.replaceAll("");
(which is as the doc says the exact same thing as yourString.replaceAll())
Look it up here
(I had the same regex as #Doorknob, but thanks to #jlordo for the replaceAll and #Doorknob for thinking about the (?i) case insensitivity part ;-) )

Remove all non-word char except if & or &apos; pattern

I am trying to clean a string of all non-word character except when it is & i.e. pattern might be like &[\w]+;
For example:
abc; => abc
abc & => abc &
abc& => abc
if i use string.replaceAll("\W","") it removes ; and '&' too from second example which I don't want.
Can using negative look-ahead in this problem could give a quick solution regex pattern?
First of all, I really like the question. Now, what you want could not be done with a single replaceAll, because for that, we would need a negative look-behind with variable length, which is not allowed. If it was allowed, then it would not have been that difficult.
Anyways, since single replaceAll is no option here, you can use a little hack here. Like first replacing the last semi-colon of you entity reference, with some character sequence, which you are sure won't be there in the rest of the string, like XXX or anything. I know this is not correct, but you sure can't help it out.
So, here's what you can try:
String str = "a;b&c &";
str = str.replaceAll("(&\\w+);", "$1XXX")
.replaceAll("&(?!\\w+?XXX)|[^\\w&]", "")
.replaceAll("(&\\w+)XXX", "$1;");
System.out.println(str);
Explanation:
The first replaceAll, replaces the pattern like & with &ampXXX, or any other sequence replaced for last ;.
The second replaceAll, replaces any & not followed by \\w+XXX, or any non-word, non & character. This will replace all the &'s which are not a part of & kind of pattern. Plus, also replaces any other non-word character.
The third replaceAll, re-replaces XXX with ;, to create back & from &ampXXX
And to make it easier to understand, you can rather use Pattern and Matcher classes and I would always prefer to use them whenever the replacement criteria is complex.
String str = "a;b&c &";
Pattern pattern = Pattern.compile("&\\w+;|[^\\w]");
Matcher matcher = pattern.matcher(str);
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
if (!match.matches("&\\w+;")) {
matcher.appendReplacement(sb, "");
} else {
matcher.appendReplacement(sb, match);
}
}
matcher.appendTail(sb);
System.out.println(sb.toString());
This one is similar to #Eric's code, but is a generalization over it. That one will only work for & of course if it was improved to remove NullPointerException that is thrown in it.
I'm not sure you can do this using a simple String.replaceAll. You should probably use a Pattern and Matcher to loop through the matches, effectively doing a manual search and replace. Something like the following code should do the trick.
public String replaceString(String origString) {
Pattern pattern = Pattern.compile("&(\w+);|[^\w]");
Matcher matcher = pattern.matcher(origString);
StringBuffer sb = new StringBuffer();
while (matcher.find()) {
if (matcher.group().startsWith("&") && !matcher.group(1).equals("amp")) {
matcher.appendReplacement(sb, matcher.group());
} else {
matcher.appendReplacement(sb, "");
}
}
matcher.appendTail(sb);
return sb.toString();
}
I would suggest you use a negative lookahead like this:
string.replace(/&(?!\w+;)/ig, '');
Which replaces all & not followed by a word characters ending with a semicolon.
EDIT (Java):
string.replaceAll("/&(?!\w+;)/i", '');

Categories