How to retain . dot in java regular expressions - java

My program is as below,
/**
* #param args
*/
public static void main(String[] args) {
RegularExpressions r = new RegularExpressions();
// TODO Auto-generated method stub
String []input = {" Dear [name],\n",
"\n",
"Thanks for buying the [num] [item].\n",
"We appreciate your patronage\n",
"\n",
"Best, [sales_person]\n"};
HashMap<String, String> dic = new HashMap<String, String>();
dic.put("name", "Anna Bell Smith");
dic.put("num", "eight");
dic.put("item", "Boxes of Soap.");
dic.put("sales_person", "Karmine Smithe");
String []afterChange = r.replace(input, dic);
r.display(afterChange);
}
String [] replace(String []strings, Map<String, String> dict){
String patternStr = ".(";
for(String key:dict.keySet()){
patternStr = patternStr + key + "|";
}
patternStr = patternStr.substring(0, patternStr.length()-1);
patternStr = patternStr+").";
Pattern pattern = Pattern.compile(patternStr);
for(int i=0;i<strings.length;i++){
StringBuffer sb = new StringBuffer();
Matcher matcher = pattern.matcher(strings[i]);
boolean isMatcherFind = false;
while(matcher.find()){
matcher.appendReplacement(sb, dict.get(matcher.group(1)));
isMatcherFind = true;
}
if(isMatcherFind){
strings[i] = sb.toString();
}else{
strings[i] = strings[i];
}
}
return strings;
}
void display(String []str){
for(String s:str){
System.out.println(s);
}
}
}
The above program is giving an output like
Dear Anna Bell Smith
Thanks for buying the eight Boxes of SoapWe appreciate your patronage
Best, Karmine Smithe
While I am expecting output as
Dear Anna Bell Smith
Thanks for buying the eight Boxes of Soap.
We appreciate your patronage
Best, Karmine Smithe.
Meaning, dot(.) and "\n" should be retained instead it is replacing with empty spaces. I am java 8 version, let me know how can i retain dot(.) and "\n"

You have a loop that repeatedly calls matcher.find() to find the next match in the string; then it calls appendReplacement(). The javadoc for appendReplacement says it does this:
It reads characters from the input sequence, starting at the append position, and appends them to the given string buffer. It stops after reading the last character preceding the previous match, that is, the character at index start() - 1.
It appends the given replacement string to the string buffer.
It sets the append position of this matcher to the index of the last character matched, plus one, that is, to end().
So for each match, it appends the characters up to the match, then it appends the replacement string (instead of the string that got matched). So far, so good.
But what happens when there's no more matches? There are still characters left in the input, to the right of the last match, that didn't get appended to the output.
Fortunately, there's a method that takes care of exactly that for you: appendTail.

You'd have to use appendReplacement and appendTail explicitly. Unfortunately you have to use StringBuffer to do this. Here's a snippet:
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
StringBuffer sb = new StringBuffer();
final int N = 3;
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(sb, "b");
} else {
break;
}
}
m.appendTail(sb);
System.out.println(sb); // bbbaaaaaaa
according to matcher-replace-method

Related

How to trim a string using regex?

I have this alphabet: {'faa','fa','af'}
and I have this string: "faaf"
I have this regex: "(faa|fa|af)*" which helps me match the string with the alphabet.
How do I make Java trim my string into: {fa,af}, which is the correct way to write the string: "faaf" based on my alphabet?
here is my code:
String regex = "(faa|fa|af)*";
String str = "faaf";
boolean isMatch = Pattern.matches(regex, str);
if(isMatch)
{
//trim the string
while(str.length()!=0)
{
Pattern pattern = Pattern.compile("^(faa|fa|af)(faa|fa|af)*$");
Matcher mc = pattern.matcher(str);
if (mc.find())
{
String l =mc.group(1);
alphabet.add(l);
str = str.substring(l.length());
System.out.println("\n"+ l);
}
}
}
Thanks to Aaron who helped me with this problem.
You need a loop.
Pattern pattern = Pattern.compile(regex + "*");
LinkedList<String> parts = new LinkedList<>();
while (!str.isEmpty()) {
Matcher m = pattern.matcher(str);
if (!m.matches()) { // In the first loop step.
break;
}
parts.addFirst(m.group(1)); // The last repetition matching group.
str = str.substring(0, m.start(1));
}
String result = parts.stream().collect(Collectors.joining(", ", "{", "}"));
This utilizes that a match (X)+ will yield in m.group(1) the last occurrence's value of X.
Unfortunately the regex module does not provide a bored-open matches, such as the overloaded replaceAll with a lambda working on a single MatchResult.
Note that matches applies to the entire string.

Remove all the leading zero from the number part of a string

I am trying to remove all the leading zero from the number part of a string. I have came up with this code (below). From the given example it worked. But when I add a '0' in the begining it will not give the proper output. Anybody know how to achive this? Thanks in advance
input: (2016)abc00701def00019z -> output: (2016)abc701def19z -> resut: correct
input: 0(2016)abc00701def00019z -> output: (2016)abc71def19z -> result: wrong -> expected output: (2016)abc701def19z
EDIT: The string can contain other than english alphabet.
String localReference = "(2016)abc00701def00019z";
String localReference1 = localReference.replaceAll("[^0-9]+", " ");
List<String> lists = Arrays.asList(localReference1.trim().split(" "));
System.out.println(lists.toString());
String[] replacedString = new String[5];
String[] searchedString = new String[5];
int counter = 0;
for (String list : lists) {
String s = CharMatcher.is('0').trimLeadingFrom(list);
replacedString[counter] = s;
searchedString[counter++] = list;
System.out.println(String.format("Search: %s, replace: %s", list,s));
}
System.out.println(StringUtils.replaceEach(localReference, searchedString, replacedString));
str.replaceAll("(^|[^0-9])0+", "$1");
This removes any row of zeroes after non-digit characters and at the beginning of the string.
I tried doing the task using Regex and was able to do the required according to the two test cases you gave. Also $1 and $2 in the code below are the parts in the () brackets in preceding Regex.
Please find the code below:
public class Demo {
public static void main(String[] args) {
String str = "0(2016)abc00701def00019z";
/*Below line replaces all 0's which come after any a-z or A-Z and which have any number after them from 1-9. */
str = str.replaceAll("([a-zA-Z]+)0+([1-9]+)", "$1$2");
//Below line only replace the 0's coming in the start of the string
str = str.replaceAll("^0+","");
System.out.println(str);
}
}
java has \P{Alpha}+, which matches any non-alphabetic character and then removing the the starting Zero's.
String stringToSearch = "0(2016)abc00701def00019z";
Pattern p1 = Pattern.compile("\\P{Alpha}+");
Matcher m = p1.matcher(stringToSearch);
StringBuffer sb = new StringBuffer();
while(m.find()){
m.appendReplacement(sb,m.group().replaceAll("\\b0+",""));
}
m.appendTail(sb);
System.out.println(sb.toString());
output:
(2016)abc701def19z

How to replace all {!XXX} from string?

I have string with multiple {!XXX} phrases. For example:
Kumar gaurav {!str1} is just {!str2}, adasdas {!str3}
I need to replace all {!str} values with corresponding str, how to replace all {!str} from my string?
You can use a Pattern and Matcher, which provides you the means to query the string for a unknown number of elements, in combination with a regular expression of \{!str\d\} which will allow you to break the text down based on the tags
For example...
String text = "All that {!str1} is {!str2}";
Map<String, String> values = new HashMap<>(25);
values.put("{!str1}", "glitters");
values.put("{!str2}", "gold");
Pattern p = Pattern.compile("\\{!str\\d\\}");
Matcher matcher = p.matcher(text);
while (matcher.find()) {
String match = matcher.group();
text = text.replaceAll("\\" + match, values.get(match));
}
System.out.println(text);
Which outputs
All that glitters is gold
You could also use something like...
int previousStart = 0;
StringBuilder sb = new StringBuilder();
while (matcher.find()) {
String match = matcher.group();
int start = matcher.start();
int end = matcher.end();
sb.append(text.substring(previousStart, start));
sb.append(values.get(match));
previousStart = end;
}
if (previousStart < text.length()) {
sb.append(text.substring(previousStart));
}
Which does away with the String creation in a loop and relies more on the position of the match to cut the original text around the tokens, which makes me happier ;)
use this regex, simple
String string="hello world {!hello}";
string=string.replaceAll("\\{!(.*?)\\}", "replace");
System.out.println(string); //this will print (hello world replace)

How can I make the following regex match my censors? Java

I am trying to censor specific strings, and patterns within my application but my matcher doesn't seem to be finding any results when searching for the Pattern.
public String censorString(String s) {
System.out.println("Censoring... "+ s);
if (findPatterns(s)) {
System.out.println("Found pattern");
for (String censor : foundPatterns) {
for (int i = 0; i < censor.length(); i++)
s.replace(censor.charAt(i), (char)42);
}
}
return s;
}
public boolean findPatterns(String s) {
for (String censor : censoredWords) {
Pattern p = Pattern.compile("(.*)["+censor+"](.*)");//regex
Matcher m = p.matcher(s);
while (m.find()) {
foundPatterns.add(censor);
return true;
}
}
return false;
}
At the moment I'm focusing on just the one pattern, if the censor is found in the string. I've tried many combinations and none of them seem to return "true".
"(.*)["+censor+"](.*)"
"(.*)["+censor+"]"
"["+censor+"]"
"["+censor+"]+"
Any help would be appreciated.
Usage: My censored words are "hello", "goodbye"
String s = "hello there, today is a fine day."
System.out.println(censorString(s));
is supposed to print " ***** today is a fine day. "
Your regex is right!!!!. The problem is here.
s.replace(censor.charAt(i), (char)42);
If you expect this line to rewrite the censored parts of your string it will not. Please check the java doc for string.
Please find below the program which will do what you intend to do. I removed your findpattern method and just used the replaceall with regex in String API. Hope this helps.
public class Regex_SO {
private String[] censoredWords = new String[]{"hello"};
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
Regex_SO regex_SO = new Regex_SO();
regex_SO.censorString("hello there, today is a fine day. hello again");
}
public String censorString(String s) {
System.out.println("Censoring... "+ s);
for(String censoredWord : censoredWords){
String replaceStr = "";
for(int index = 0; index < censoredWord.length();index++){
replaceStr = replaceStr + "*";
}
s = s.replaceAll(censoredWord, replaceStr);
}
System.out.println("Censored String is .. " + s);
return s;
}
}
Since this seem like homework I cant give you working code, but here are few pointers
consider using \\b(word1|word2|word3)\\b regex to find specific words
to create char representing * you can write it as '*'. Don't use (char)42 to avoid magic numbers
to create new string which will have same length as old string but will be filled with only specific characters you can use String newString = oldString.replaceAll(".","*")
to replace on-the-fly founded match with new value you can use appendReplacement and appendTail methods from Matcher class. Here is how code using it should look like
StringBuffer sb = new StringBuffer();//buffer for string with replaced values
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(yourText);
while (m.find()){
String match = m.group(); //this will represent current match
String newValue = ...; //here you need to decide how to replace it
m.appentReplacemenet(sb, newValue );
}
m.appendTail(sb);
String censoredString = sb.toString();

Extract every complete word that contains a certain substring

I'm trying to write a function that extracts each word from a sentence that contains a certain substring e.g. Looking for 'Po' in 'Porky Pork Chop' will return Porky Pork.
I've tested my regex on regexpal but the Java code doesn't seem to work. What am I doing wrong?
private static String foo()
{
String searchTerm = "Pizza";
String text = "Cheese Pizza";
String sPattern = "(?i)\b("+searchTerm+"(.+?)?)\b";
Pattern pattern = Pattern.compile ( sPattern );
Matcher matcher = pattern.matcher ( text );
if(matcher.find ())
{
String result = "-";
for(int i=0;i < matcher.groupCount ();i++)
{
result+= matcher.group ( i ) + " ";
}
return result.trim ();
}else
{
System.out.println("No Luck");
}
}
In Java to pass \b word boundaries to regex engine you need to write it as \\b. \b represents backspace in String object.
Judging by your example you want to return all words that contains your substring. To do this don't use for(int i=0;i < matcher.groupCount ();i++) but while(matcher.find()) since group count will iterate over all groups in single match, not over all matches.
In case your string can contain some special characters you probably should use Pattern.quote(searchTerm)
In your code you are trying to find "Pizza" in "Cheese Pizza" so I assume that you also want to find strings that same as searched substring. Although your regex will work fine for it, you can change your last part (.+?)?) to \\w* and also add \\w* at start if substring should also be matched in the middle of word (not only at start).
So your code can look like
private static String foo() {
String searchTerm = "Pizza";
String text = "Cheese Pizza, Other Pizzas";
String sPattern = "(?i)\\b\\w*" + Pattern.quote(searchTerm) + "\\w*\\b";
StringBuilder result = new StringBuilder("-").append(searchTerm).append(": ");
Pattern pattern = Pattern.compile(sPattern);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
result.append(matcher.group()).append(' ');
}
return result.toString().trim();
}
While the regex approach is certainly a valid method, I find it easier to think through when you split the words up by whitespace. This can be done with String's split method.
public List<String> doIt(final String inputString, final String term) {
final List<String> output = new ArrayList<String>();
final String[] parts = input.split("\\s+");
for(final String part : parts) {
if(part.indexOf(term) > 0) {
output.add(part);
}
}
return output;
}
Of course it is worth nothing that doing this will effectively be doing two passes through your input String. The first pass to find the characters that are whitespace to split on, and the second pass looking through each split word for your substring.
If one pass is necessary though, the regex path is better.
I find nicholas.hauschild's answer to be the best.
However if you really wanted to use regex, you could do it as such:
String searchTerm = "Pizza";
String text = "Cheese Pizza";
Pattern pattern = Pattern.compile("\\b" + Pattern.quote(searchTerm)
+ "\\b", Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
Pizza
The pattern should have been
String sPattern = "(?i)\\b("+searchTerm+"(?:.+?)?)\\b";
You want to capture the whole (pizza)string.?: ensures you don't capture a part of the string twice.
Try this pattern:
String searchTerm = "Po";
String text = "Porky Pork Chop oPod zzz llPo";
Pattern p = Pattern.compile("\\p{Alpha}+" + substring + "|\\p{Alpha}+" + substring + "\\p{Alpha}+|" + substring + "\\p{Alpha}+");
Matcher m = p.matcher(myString);
while(m.find()) {
System.out.println(">> " + m.group());
}
Ok, I give you a pattern in raw style (not java style, you must double escape yourself):
(?i)\b[a-z]*po[a-z]*\b
And that's all.

Categories