Regex replace space and word to toFirstUpper of word - java

I was trying to use regex to change the following string
String input = "Creation of book orders"
to
String output = "CreationOfBookOrders"
I tried the following expecting to replace the space and word with word.
input.replaceAll("\\s\\w", "(\\w)");
input.replaceAll("\\s\\w", "\\w");
but here the string is replacing space and word with character 'w' instead of the word.
I am in a position not to use any WordUtils or StringUtils or such Util classes. Else I could have replaced all spaces with empty string and applied WordUtils.capitalize or similar methods.
How else (preferably using regex) can I get the above output from input.

I don't think you can do that with String.replaceAll. The only modifications that you can make in the replacement string are to interpolate groups matched by the regex.
The javadoc for Matcher.replaceAll explains how the replacement string is handled.
You will need use a loop. Here's a simple version:
StringBuilder sb = new StringBuilder(input);
Pattern pattern = Pattern.compile("\\s\\w");
Matcher matcher = pattern.matcher(s);
int pos = 0;
while (matcher.find(pos)) {
String replacement = matcher.group().substring(1).toUpperCase();
pos = matcher.start();
sb.replace(pos, pos + 2, replacement);
pos += 1;
}
output = sb.toString();
(This could be done more efficiently, but it is complicated.)

Related

Extract substring where startString and endString are same

I want to extract the sentence/word, where the start string and end string are same,
for example :
String originalString = "this is an example to extract sentence between is";
here start string and end string is same that is : "is"
So the final output should be : an example to extract sentence between
I tried as below, but it returned output as "is" only
String originalString = "this is an example to extract sentence between is";
String startEndString = "is";
int startIndex = originalString.indexOf(startEndString);
int endIndex = originalString.indexOf(startEndString, startIndex + startEndString.length());
String substring = originalString.substring(startIndex, endIndex);
System.out.println(substring);
I also checked org.apache.commons.lang.StringUtils substring methods, but could not find any to fulfill this type of extract. Is there any java8 / StringUtil method / API already available to do this job?
You can obtain the desired result using regex:
String originalString = "this is an example to extract sentence between is";
Pattern p = Pattern.compile( "(?<=(?:\\bis\\b))(.*)(?=(?:\\bis\\b))" );
Matcher m = p.matcher( originalString );
if ( m.find() ) {
System.out.println(m.group(1));
}
(?<=(?:\\bis\\b)) is a positive look behind, with a non capturing group. The word you're looking for is placed between the boundary keywords \b. The boundaries make sure it will only look for whole words (so 'this' will be skipped). (?=(?:\\bis\\b)) is a positive look ahead. The end result will be anything between the two groups.

Replace regex pattern to lowercase in java

I'm trying to replace a url string to lowercase but wanted to keep the certain pattern string as it is.
eg: for input like:
http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
The expected output would be lowercased url but the multiple macros are original:
http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
I was trying to capture the strings using regex but didn't figure out a proper way to do the replacement. Also it seemed using replaceAll() doesn't do the job. Any hint please?
It looks like you want to change any uppercase character which is not inside ${...} to its lowercase form.
With construct
Matcher matcher = ...
StringBuffer buffer = new StringBuffer();
while (matcher.find()){
String matchedPart = ...
...
matcher.appendReplacement(buffer, replacement);
}
matcher.appendTail(buffer);
String result = buffer.toString();
or since Java 9 we can use Matcher#replaceAll​(Function<MatchResult,String> replacer) and rewrite it like
String replaced = matcher.replaceAll(m -> {
String matchedPart = m.group();
...
return replacement;
});
you can dynamically build replacement based on matchedPart.
So you can let your regex first try to match ${...} and later (when ${..} will not be matched because regex cursor will not be placed before it) let it match [A-Z]. While iterating over matches you can decide based on match result (like its length or if it starts with $) if you want to use use as replacement its lowercase form or original form.
BTW regex engine allows us to place in replacement part $x (where x is group id) or ${name} (where name is named group) so we could reuse those parts of match. But if we want to place ${..} as literal in replacement we need to escape \$. To not do it manually we can use Matcher.quoteReplacement.
Demo:
String yourUrlString = "http://BLABLABLA?qUERY=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}";
Pattern p = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]");
Matcher m = p.matcher(yourUrlString);
StringBuffer sb = new StringBuffer();
while(m.find()){
String match = m.group();
if (match.length() == 1){
m.appendReplacement(sb, match.toLowerCase());
} else {
m.appendReplacement(sb, Matcher.quoteReplacement(match));
}
}
m.appendTail(sb);
String replaced = sb.toString();
System.out.println(replaced);
or in Java 9
String replaced = Pattern.compile("\\$\\{[^}]+\\}|[A-Z]")
.matcher(yourUrlString)
.replaceAll(m -> {
String match = m.group();
if (match.length() == 1)
return match.toLowerCase();
else
return Matcher.quoteReplacement(match);
});
System.out.println(replaced);
Output: http://blablabla?query=sth&macro1=${MACRO_STR1}&macro2=${macro_str2}
This regex will match all the characters before the first &macro, and put everything between http:// and the first &macro in its own group so you can modify it.
http://(.*?)&macro
Tested here
UPDATE: If you don't want to use groups, this regex will match only the characters between http:// and the first &macro
(?<=http://)(.*?)(?=&macro)
Tested here

Removing dashes in a Java string and capitalizing the character after the dash [duplicate]

This question already has answers here:
What is the most elegant way to convert a hyphen separated word (e.g. "do-some-stuff") to the lower camel-case variation (e.g. "doSomeStuff")?
(11 answers)
Closed 5 years ago.
I have a String nba-west-teams blazers and I want to convert the string into a format like nbaWestTeams blazers. Essentially, I want to remove all the dashes and replace the characters after the dash with it's uppercase equivalent.
I know I can use the String method replaceAll to remove all the dashes, but how do I get the character after the dash and uppercase it?
// Input
String withDashes = "nba-west-teams blazers"
String removeDashes = withDashes.replaceAll(....?)
// Expected conversion
String withoutDashes = "nbaWestTeams blazers"
Check out the indexOf and the replace method of the StringBuilder class. StringBuilder allows fast editing of Strings.
When you are finished use toString.
If you need more help just make a comment.
You can use Patterns with regex like this \-([a-z]):
String str = "nba-west-teams blazers";
String regex = "\\-([a-z])";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(str);
while (matcher.find()) {
str = str.replaceFirst(matcher.group(), matcher.group(1).toUpperCase());
}
System.out.println(str);//Output = nbaWestTeams blazers
So it will matche the first alphabets after the dash and replace the matched with the upper alphabets
You can iterate through the string and when a hyphen is found, just skip the hyphen and transform the next character to uppercase. You can use a StringBuilder to store the partial results as follows:
public static String toCamelCase(String str) {
// if the last char is '-', lets set the length to length - 1 to avoid out of bounds
final int len = str.charAt(str.length() - 1) == '-' ? str.length() - 1 : str.length();
StringBuilder builder = new StringBuilder(len);
for (int i = 0; i < len; ++i) {
char c = str.charAt(i);
if (c == '-') {
++i;
builder.append(Character.toUpperCase(str.charAt(i)));
} else {
builder.append(c);
}
}
return builder.toString();
}
You can split the string at the space and use https://github.com/google/guava/wiki/StringsExplained#caseformat to convert the dashed substring into a camel cased string.

String matching in java

I am currently struggling with my "dirty word" filter finding partial matches.
example: if I pass in these two params replaceWord("ass", "passing pass passed ass")
to this method
private static String replaceWord(String word, String input) {
Pattern legacyPattern = Pattern.compile(word, Pattern.CASE_INSENSITIVE);
Matcher matcher = legacyPattern.matcher(input);
StringBuilder returnString = new StringBuilder();
int index = 0;
while(matcher.find()) {
returnString.append(input.substring(index,matcher.start()));
for(int i = 0; i < word.length() - 1; i++) {
returnString.append('*');
}
returnString.append(word.substring(word.length()-1));
index = matcher.end();
}
if(index < input.length() - 1){
returnString.append(input.substring(index));
}
return returnString.toString();
}
I get p*sing p*s p**sed **s
When I really just want "passing pass passed **s.
Does anyone know how to avoid this partial matching with this method??
Any help would be great thanks!
This tutorial from Oracle should point you in the right direction.
You want to use a word boundary in your pattern:
Pattern p = Pattern.compile("\\bword\\b", Pattern.CASE_INSENSITIVE);
Note, however that this still is problematic (as profanity filtering always is). A "non-word character" that defines the boundary is anything not included in [0-9A-Za-z_]
So for example, _ass would not match.
You also have the problem of profanity derived terms ... where the term is prepended to say, "hole", "wipe", etc
I'm working on a dirty word filter as we speak, and the option I chose to go with was Soundex and some regex.
I first filter out strange character with \w which is [a-zA-Z_0-9].
Then use soundex(String) to make a string that you can check against the soundex string of the word you want to test.
String soundExOfDirtyWord = Soundex.soundex(dirtyWord);
String soundExOfTestWord = Soundex.soundex(testWord);
if (soundExOfTestWord.equals(soundExOfDirtyWord)) {
System.out.println("The test words sounds like " + dirtyWord);
}
I just keep a list of dirty words in the program and have SoundEx run through them to check. The algorithm is something worth looking at.
You could also use replaceAll() method from the Matcher class. It replaces all the occurences of the pattern with your specified replacement word. Something like below.
private static String replaceWord(String word, String input) {
Pattern legacyPattern = Pattern.compile("\\b" + word + "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = legacyPattern.matcher(input);
String replacement = "";
for (int i = 0; i < word.length() - 1; i++) {
replacement += "*";
}
replacement += word.charAt(word.length() - 1);
return matcher.replaceAll(replacement);
}

Extract every complete word that contains a certain substring

I'm trying to write a function that extracts each word from a sentence that contains a certain substring e.g. Looking for 'Po' in 'Porky Pork Chop' will return Porky Pork.
I've tested my regex on regexpal but the Java code doesn't seem to work. What am I doing wrong?
private static String foo()
{
String searchTerm = "Pizza";
String text = "Cheese Pizza";
String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b";
Pattern pattern = Pattern.compile ( sPattern );
Matcher matcher = pattern.matcher ( text );
if(matcher.find ())
{
String result = "-";
for(int i=0;i < matcher.groupCount ();i++)
{
result+= matcher.group ( i ) + " ";
}
return result.trim ();
}else
{
System.out.println("No Luck");
}
}
In Java to pass \b word boundaries to regex engine you need to write it as \\b. \b represents backspace in String object.
Judging by your example you want to return all words that contains your substring. To do this don't use for(int i=0;i < matcher.groupCount ();i++) but while(matcher.find()) since group count will iterate over all groups in single match, not over all matches.
In case your string can contain some special characters you probably should use Pattern.quote(searchTerm)
In your code you are trying to find "Pizza" in "Cheese Pizza" so I assume that you also want to find strings that same as searched substring. Although your regex will work fine for it, you can change your last part (.+?)?) to \\w* and also add \\w* at start if substring should also be matched in the middle of word (not only at start).
So your code can look like
private static String foo() {
String searchTerm = "Pizza";
String text = "Cheese Pizza, Other Pizzas";
String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b";
StringBuilder result = new StringBuilder("-").append(searchTerm).append(": ");
Pattern pattern = Pattern.compile(sPattern);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
result.append(matcher.group()).append(' ');
}
return result.toString().trim();
}
While the regex approach is certainly a valid method, I find it easier to think through when you split the words up by whitespace. This can be done with String's split method.
public List<String> doIt(final String inputString, final String term) {
final List<String> output = new ArrayList<String>();
final String[] parts = input.split("\\s+");
for(final String part : parts) {
if(part.indexOf(term) > 0) {
output.add(part);
}
}
return output;
}
Of course it is worth nothing that doing this will effectively be doing two passes through your input String. The first pass to find the characters that are whitespace to split on, and the second pass looking through each split word for your substring.
If one pass is necessary though, the regex path is better.
I find nicholas.hauschild's answer to be the best.
However if you really wanted to use regex, you could do it as such:
String searchTerm = "Pizza";
String text = "Cheese Pizza";
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm)
+ "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Pizza
The pattern should have been
String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b";
You want to capture the whole (pizza)string.?: ensures you don't capture a part of the string twice.
Try this pattern:
String searchTerm = "Po";
String text = "Porky Pork Chop oPod zzz llPo";
Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+");
Matcher m = p.matcher(myString);
while(m.find()) {
System.out.println(">> " + m.group());
}
Ok, I give you a pattern in raw style (not java style, you must double escape yourself):
(?i)\b[a-z]*po[a-z]*\b
And that's all.

Categories