I am trying to censor specific strings, and patterns within my application but my matcher doesn't seem to be finding any results when searching for the Pattern.
public String censorString(String s) {
System.out.println("Censoring... "+ s);
if (findPatterns(s)) {
System.out.println("Found pattern");
for (String censor : foundPatterns) {
for (int i = 0; i < censor.length(); i++)
s.replace(censor.charAt(i), (char)42);
}
}
return s;
}
public boolean findPatterns(String s) {
for (String censor : censoredWords) {
Pattern p = Pattern.compile("(.*)["+censor+"](.*)");//regex
Matcher m = p.matcher(s);
while (m.find()) {
foundPatterns.add(censor);
return true;
}
}
return false;
}
At the moment I'm focusing on just the one pattern, if the censor is found in the string. I've tried many combinations and none of them seem to return "true".
"(.*)["+censor+"](.*)"
"(.*)["+censor+"]"
"["+censor+"]"
"["+censor+"]+"
Any help would be appreciated.
Usage: My censored words are "hello", "goodbye"
String s = "hello there, today is a fine day."
System.out.println(censorString(s));
is supposed to print " ***** today is a fine day. "
Your regex is right!!!!. The problem is here.
s.replace(censor.charAt(i), (char)42);
If you expect this line to rewrite the censored parts of your string it will not. Please check the java doc for string.
Please find below the program which will do what you intend to do. I removed your findpattern method and just used the replaceall with regex in String API. Hope this helps.
public class Regex_SO {
private String[] censoredWords = new String[]{"hello"};
/**
* #param args the command line arguments
*/
public static void main(String[] args) {
Regex_SO regex_SO = new Regex_SO();
regex_SO.censorString("hello there, today is a fine day. hello again");
}
public String censorString(String s) {
System.out.println("Censoring... "+ s);
for(String censoredWord : censoredWords){
String replaceStr = "";
for(int index = 0; index < censoredWord.length();index++){
replaceStr = replaceStr + "*";
}
s = s.replaceAll(censoredWord, replaceStr);
}
System.out.println("Censored String is .. " + s);
return s;
}
}
Since this seem like homework I cant give you working code, but here are few pointers
consider using \\b(word1|word2|word3)\\b regex to find specific words
to create char representing * you can write it as '*'. Don't use (char)42 to avoid magic numbers
to create new string which will have same length as old string but will be filled with only specific characters you can use String newString = oldString.replaceAll(".","*")
to replace on-the-fly founded match with new value you can use appendReplacement and appendTail methods from Matcher class. Here is how code using it should look like
StringBuffer sb = new StringBuffer();//buffer for string with replaced values
Pattern p = Pattern.compile(yourRegex);
Matcher m = p.matcher(yourText);
while (m.find()){
String match = m.group(); //this will represent current match
String newValue = ...; //here you need to decide how to replace it
m.appentReplacemenet(sb, newValue );
}
m.appendTail(sb);
String censoredString = sb.toString();
Related
I have a java project and the following regex pattern with named capture groups:
(?<department>\w+(-\w)??)\s{1,5}(?<number>\w+(-\w+)?)-(?<section>\w+)\s(?<term>\d+)\s(?<campus>\w{2})
I wanted to replace the value of one of the named group with a wild card character (*). All of the replace methods in the Matcher class appear to be tied to replacing a specific regex value. Since the string is not guaranteed to be unique, I want to replace by the group name.
Is there a way to leverage the Matcher class to provide this substitution capability?
I realized that I can use the start and end methods of the matcher to determine the range of characters that need to be replaced. I can then use a StringBuilder to delete the range and insert the specified replacement value. I wrote the following method to handle this situation.
public static String replaceNamedGroup(String source, Pattern pattern, String groupName, String replaceValue) {
if (source == null || pattern == null) {
return null;
}
Matcher m = pattern.matcher(source);
if (m.find()) {
int start = m.start(groupName);
int end = m.end(groupName);
StringBuilder sb = new StringBuilder(source);
sb = sb.delete(start, end);
if (replaceValue != null) {
sb = sb.insert(start, replaceValue);
}
return sb.toString();
} else {
return source;
}
}
Below is some code to show how it is used
String str = "ABC 123-123 1234 AB";
Pattern pattern = Pattern.compile("(?<department>\w+(-\w)??)\s{1,5}(?<number>\w+(-\w+)?)-(?<section>\w+)\s(?<term>\d+)\s(?<campus>\w{2})");
String output = replaceNamedGroup(str, pattern, "term", "*");
//outputs Output: ABC 123-123 * AB
System.out.println("Output: " + output);
I am writing a program and want the program to not loop and request another search pattern if the search pattern (word) contains any non alpha numeric characters.
I have setup a Boolean word to false and an if statement to change the Boolean to true if the word contains letters or numbers. Then another if statement to allow the program to execute if the Boolean is true.
My logic must be off because it is still executing through the search pattern if I simply enter "/". The search pattern cannot contain any non alpha numeric characters to include spaces. I am trying to use Regex to solve this problem.
Sample problematic output:
Please enter a search pattern: /
Line number 1
this.a 2test/austin
^
Line number 8
ra charity Charityis 4 times a day/a.a-A
^
Here is my applicable code:
while (again) {
boolean found = false;
System.out.printf("%n%s", "Please enter a search pattern: ", "%n");
String wordToSearch = input.next();
if (wordToSearch.equals("EINPUT")) {
System.out.printf("%s", "Bye!");
System.exit(0);
}
Pattern p = Pattern.compile("\\W*");
Matcher m = p.matcher(wordToSearch);
if (m.find())
found = true;
String data;
int lineCount = 1;
if (found = true) {
try (FileInputStream fis =
new FileInputStream(this.inputPath.getPath())) {
File file1 = this.inputPath;
byte[] buffer2 = new byte[fis.available()];
fis.read(buffer2);
data = new String(buffer2);
Scanner in = new Scanner(data).useDelimiter("\\\\|[^a-zA-z0-9]+");
while (in.hasNextLine()) {
String line = in.nextLine();
Pattern pattern = Pattern.compile("\\b" + wordToSearch + "\\b");
Matcher matcher = pattern.matcher(line);
if (matcher.find()) {
System.out.println("Line number " + lineCount);
String stringToFile = f.findWords(line, wordToSearch);
System.out.println();
}
lineCount++;
}
}
}
}
Stop reinventing the wheel.
Read this: Apache StringUtils,
Focus on isAlpha,
isAlphanumeric,
and isAlphanumericSpace
One of those is likely to provide the functionality you want.
Create a separate method to call the String you are searching through:
public boolean isAlphanumeric(String str)
{
char[] charArray = str.toCharArray();
for(char c:charArray)
{
if (!Character.isLetterOrDigit(c))
return false;
}
return true;
}
Then, add the following if statement to the above code prior to the second try statement.
if (isAlphanumeric(wordToSearch) == true)
Well since no one posted REGEX one, here you go:
package com.company;
public class Main {
public static void main(String[] args) {
String x = "ABCDEF123456";
String y = "ABC$DEF123456";
isValid(x);
isValid(y);
}
public static void isValid(String s){
if (s.matches("[A-Za-z0-9]*"))
System.out.println("String doesn't contain non alphanumeric characters !");
else
System.out.println("Invalid characters in string !");
}
}
Right now, what's happening is if the search pattern contains non alphanumeric characters, then do the loop. This is because found = true when the non alphanumeric characters are detected.
if(m.find())
found = true;
What it should be:
if(!m.find())
found = true;
It should be checking for the absence of nonalphanumeric characters.
Also, the boolean flag can just be simplified to:
boolean found = !m.find();
You don't need to use the if statement.
My program is as below,
/**
* #param args
*/
public static void main(String[] args) {
RegularExpressions r = new RegularExpressions();
// TODO Auto-generated method stub
String []input = {" Dear [name],\n",
"\n",
"Thanks for buying the [num] [item].\n",
"We appreciate your patronage\n",
"\n",
"Best, [sales_person]\n"};
HashMap<String, String> dic = new HashMap<String, String>();
dic.put("name", "Anna Bell Smith");
dic.put("num", "eight");
dic.put("item", "Boxes of Soap.");
dic.put("sales_person", "Karmine Smithe");
String []afterChange = r.replace(input, dic);
r.display(afterChange);
}
String [] replace(String []strings, Map<String, String> dict){
String patternStr = ".(";
for(String key:dict.keySet()){
patternStr = patternStr + key + "|";
}
patternStr = patternStr.substring(0, patternStr.length()-1);
patternStr = patternStr+").";
Pattern pattern = Pattern.compile(patternStr);
for(int i=0;i<strings.length;i++){
StringBuffer sb = new StringBuffer();
Matcher matcher = pattern.matcher(strings[i]);
boolean isMatcherFind = false;
while(matcher.find()){
matcher.appendReplacement(sb, dict.get(matcher.group(1)));
isMatcherFind = true;
}
if(isMatcherFind){
strings[i] = sb.toString();
}else{
strings[i] = strings[i];
}
}
return strings;
}
void display(String []str){
for(String s:str){
System.out.println(s);
}
}
}
The above program is giving an output like
Dear Anna Bell Smith
Thanks for buying the eight Boxes of SoapWe appreciate your patronage
Best, Karmine Smithe
While I am expecting output as
Dear Anna Bell Smith
Thanks for buying the eight Boxes of Soap.
We appreciate your patronage
Best, Karmine Smithe.
Meaning, dot(.) and "\n" should be retained instead it is replacing with empty spaces. I am java 8 version, let me know how can i retain dot(.) and "\n"
You have a loop that repeatedly calls matcher.find() to find the next match in the string; then it calls appendReplacement(). The javadoc for appendReplacement says it does this:
It reads characters from the input sequence, starting at the append position, and appends them to the given string buffer. It stops after reading the last character preceding the previous match, that is, the character at index start() - 1.
It appends the given replacement string to the string buffer.
It sets the append position of this matcher to the index of the last character matched, plus one, that is, to end().
So for each match, it appends the characters up to the match, then it appends the replacement string (instead of the string that got matched). So far, so good.
But what happens when there's no more matches? There are still characters left in the input, to the right of the last match, that didn't get appended to the output.
Fortunately, there's a method that takes care of exactly that for you: appendTail.
You'd have to use appendReplacement and appendTail explicitly. Unfortunately you have to use StringBuffer to do this. Here's a snippet:
String content="aaaaaaaaaa";
Pattern pattern = Pattern.compile("a");
Matcher m = pattern.matcher(content);
StringBuffer sb = new StringBuffer();
final int N = 3;
for (int i = 0; i < N; i++) {
if (m.find()) {
m.appendReplacement(sb, "b");
} else {
break;
}
}
m.appendTail(sb);
System.out.println(sb); // bbbaaaaaaa
according to matcher-replace-method
I'm trying to replace some case when I put a generic term (here called tampon).
Rules:
I want to replace "AM into "AN","EM" into "AN", IM"into"IN","OM"into "ON","UM" into "UN" and "YM" into "IN".
I also want to replace them only if a consonant is after them except "M" and "N".
I need to replace only the case too when they are alone or at the end of the string.
I've tried some regex but still got some failures into my test (5/18).
Got faillure with "UMUMMUM" the test expects "UMUMMUM" but I've got "UMUMMUN".
Here is my code now :
public class Phonom {
static String[] consonnant={"B","C","D","F","G","H","J","K","L","P","Q","R","S","T","V","W","X","Z",""};
public static String phonom1(final String tampon){
if (tampon == null){
return "";
}
if (tampon.isEmpty()){
return "";
}
int pos=tampon.indexOf("EM");
int pos1=tampon.indexOf("AM");
int pos2=tampon.indexOf("IM");
int pos3=tampon.indexOf("OM");
int pos4=tampon.indexOf("UM");
int pos5=tampon.indexOf("YM");
if(pos==tampon.length()-2 ||pos1==tampon.length()-2|pos2==tampon.length()-2
||pos3==tampon.length()-2||pos4==tampon.length()-2||pos5==tampon.length()-2){
String temp=tampon.replaceAll("AM","AN");
String temp1=temp.replaceAll("EM","AN");
String temp2=temp1.replaceAll("IM","IN");
String temp3=temp2.replaceAll("OM","ON");
String temp4=temp3.replaceAll("UM","UN");
String result=temp4.replaceAll("YM","IN");
return result;
}
String temp=tampon.replaceAll("AM[^AEIOUMNY]","AN");
String temp1=temp.replaceAll("EM[^AEIOUMNY]","AN");
String temp2=temp1.replaceAll("IM[^AEIOUMNY]","IN");
String temp3=temp2.replaceAll("OM[^AEIOUMNY]","ON");
String temp4=temp3.replaceAll("UM[^AEIOUMNY]","UN");
String result=temp4.replaceAll("YM[^AEIOUMNY]","IN");
return result;
}
}
You could have done this in one line if YM was replaced with YN not IN.
tampon.replaceAll("(?<=[AEIOUY])(M)(?![AEIOUYMN])", "N");
Because of the YM to IN rule you will need to use appendReplacement and appendTail instead. The below code uses a negative look ahead to ensure possible replacements aren't followed by a vowel, M or N. If the first group is a Y we replace the match with IN. If not we use a back reference to the character in group 1 and follow it with an N.
public class Phonom {
private static final Pattern PATTERN = Pattern.compile("([AEIOUY])(M)(?![AEIOUYMN])");
public static String phonom1(String tampon) {
Matcher m = PATTERN.matcher(tampon);
StringBuffer sb = new StringBuffer();
while (m.find()) {
if ("Y".equals(m.group(1))) {
m.appendReplacement(sb, "IN");
} else {
m.appendReplacement(sb, "\1N");
}
}
m.appendTail(sb);
return sb.toString();
}
}
Conditions:
there are many rules ,maybe hundreds, which are like :
{aab*, aabc*,
aabcdd*, dtctddds*,
*ddt*,
*cddt*,
*bcddt*,
*t,
*ttt,
*ccddttt}
each time I will get one string, then I should find the longest matched rule.
Examples:
example 1.string is aabcddttt the matched rule should be: aabcdd*
example 2. string is accddttt the matched rule should be *ccddttt
Question:
I don't want to use the rules in a long array to match the string one by one,that is inefficient method.maybe I should use the string as a regex to match the hundred rules.But yet I can't find a elegant way to solve this problem.
Can I use some regexes to get the result?
Which is the best/fastest way to match?
Java, plain C or shell are preferred,please don't use C++ STL
Longest common substring
Perhaps this algorithm is what you are looking for =).
Why not do it simply?
String[] rules = {"^aab", "bcd", "aabcdd$", "dtctddds$", "^ddt$", "^cddt$", "^bcddt$", "^t", "^ttt", "^ccddttt"};
String testCase = "aabcddttt";
for (int i = 0; i < rules.length; i++) {
Pattern p = Pattern.compile(rules[i]);
Matcher m = p.matcher(testCase);
if (m.find()) {
System.out.println("String: " + testCase + " has matched the pattern " + rules[i]);
}
}
So basically in this case, rules[0], which is ^aab found because carrot (^) means string must begin with ^aab. On the other hand, bba$ means string must end with bba. And rules1 is found because it means the rule can appear anywhere from the testCase (e.g. bcd).
You could try matching them all at once with a brackets around each sub-rule. You could use the group to determine which matched.
public static void main(String... ignored) {
for (String test : "aabaa,wwwaabcdddd,abcddtxyz".split(",")) {
System.out.println(test + " matches " + longestMatch(test, "aab*", "aabc*", "aabcdd*", "dtctddds*", "ddt"));
}
}
public static String longestMatch(String text, String... regex) {
String[] sortedRegex = regex.clone();
Arrays.sort(sortedRegex, new Comparator<String>() {
#Override
public int compare(String o1, String o2) {
return o2.length() - o1.length();
}
});
StringBuilder sb = new StringBuilder();
String sep = "(";
for (String s : sortedRegex) {
sb.append(sep).append('(').append(s).append(')');
sep = "|";
}
sb.append(")");
Matcher matcher = Pattern.compile(sb.toString()).matcher(text);
if (matcher.find()) {
for (int i = 2; i <= matcher.groupCount(); i++) {
String group = matcher.group(i);
if (group != null)
return sortedRegex[i - 2];
}
}
return "";
}
prints
aabaa matches aabc*
wwwaabcdddd matches aabcdd*
abcddtxyz matches ddt