Use a regular expression but keep what was replaced? - java

I need a regular expression to remove certain characters but preserve what was removed into a new string. I'm hoping to avoid using two separate expressions.
Example,
Lets say I want to remove all numbers from a string but preserve them and place them in a different string.
"a1b23c" would become "abc" AND a new string for "123"
Thanks for any help!

You can do what you describe with a find / replace loop using Matcher.appendReplacement() and Matcher.appendTail(). For your example:
Matcher matcher = Pattern.compile("\\d+").matcher("a1b23c");
StringBuffer nonDigits = new StringBuffer();
StringBuffer digits = new StringBuffer();
while (matcher.find()) {
digits.append(matcher.group());
matcher.appendReplacement(nonDigits, "");
}
matcher.appendTail(nonDigits);
System.out.println(nonDigits);
System.out.println(digits);
Output:
abc
123
You do have to use StringBuffer instead of StringBuilder for this approach, because that's what Matcher supports.

If you are doing simple things like removing digits, it would be easier to use a pair of StringBuilders:
StringBuilder digits = new StringBuilder();
StringBuilder nonDigits = new StringBuilder();
for (int i = 0; i < str.length(); ++i) {
char ch = str.charAt(i);
if (Character.isDigit(ch) {
digits.append(ch);
} else {
nonDigits.append(ch);
}
}
System.out.println(nonDigits);
System.out.println(digits);

Related

How can I find multiple words in Java regex

I want to check prohibition words.
In my codes,
public static String filterText(String sText) {
Pattern p = Pattern.compile("test", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(sText);
StringBuffer buf = new StringBuffer();
while (m.find()){
m.appendReplacement(buf, maskWord(m.group()));
}
m.appendTail(buf);
return buf.toString();
}
public static String maskWord(String str) {
StringBuffer buf = new StringBuffer();
char[] ch = str.toCharArray();
for (int i = 0; i < ch.length; i++) {
buf.append("*");
}
return buf.toString();
}
If you receive the sentence "test is test", it will be expressed as "**** is ****" using the above code.
But I want to filter out at least a few tens to a few hundred words.
The words are stored in the DB.(DB Type: Oralce)
So how do I check multiple words?
Assuming you are using Java 9 you could use Matcher.replaceAll to replace the words in one statement. You can also use String.replaceAll to replace every character with '*'.
A pattern can contain many alternatives in it. You could construct a pattern with all the words required.
Pattern pattern = Pattern.compile("(word1|word2|word3)");
String result = pattern.matcher(input)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));
Alternatively, you could have a list of patterns and then replace each in turn:
for (Pattern pattern: patternList)
result = pattern.matcher(result)
.replaceAll(w -> w.group(1).replaceAll(".", "*"));

Easier way to convert string of ascii numbers to string of letters?

so I have the string "116,101,115,116,49,50,51,52" and I want to convert it from ASCII decimals to ASCII letters. This is the code I'm using to do that:
String charInts = "116,101,115,116,49,50,51,52";
String[] tokenizedCharInts = charInts.split(",");
String phrase = "";
for (int i = 0; i < tokenizedCharInts.length; i++) {
int digit = Integer.parseInt(tokenizedCharInts[i]);
phrase += (char) digit;
}
System.out.println(phrase);
It works so I'm fairly happy with it, but I'm wondering if anyone knows a more elegant way of doing this instead, perhaps without using that for loop (having to convert each split string to an int, then a char, then append it, for every small sub-string, makes me feel like their must be a cleaner solution).
Using StringBuilder and other for maybe? :
String charInts = "116,101,115,116,49,50,51,52";
String[] tokenizedCharInts = charInts.split(",");
StringBuilder phrase = new StringBuilder();
for (String a : tokenizedCharInts) {
phrase.append((char)Integer
.parseInt(a));
}
System.out.println(phrase);
You can use an enhanced for loop, since you don't need the array element index.
You should use a StringBuilder to accumulate the string instead of String phrase, to avoid repeated string construction.
You can use Scanner and StringBuilder combo : (no need to split the String and loop though it.)
String charInts = "116,101,115,116,49,50,51,52";
Scanner scanner=new Scanner(charInts);
scanner.useDelimiter(",");
StringBuilder sb=new StringBuilder();
while (scanner.hasNextInt()) {
sb.append((char)scanner.nextInt());
}
scanner.close();
System.out.println(sb.toString());
Java-8 solution:
String charInts = "116,101,115,116,49,50,51,52";
String phrase = Arrays.stream(charInts.split(",")).mapToInt(Integer::parseInt)
.collect(StringBuilder::new, StringBuilder::appendCodePoint, StringBuilder::append)
.toString();
System.out.println(phrase);
IF this fits your use-case:
char[] charInts = new char[] { 116,101,115,116,49,50,51,52 };
String s = new String(charInts);
Yes, Java provides an elegant, single-line solution using Streams API.
String charInts = "116,101,115,116,49,50,51,52";
StringBuilder sb = new StringBuilder();
Arrays.stream(charInts.split(","))
.map(Integer::parseInt)
.map(Character::toChars)
.forEach(sb::append);
System.out.println(sb.toString());

Underscore to camel case except for certain prefixes

I am currently creating a Java program to rewrite some outdated Java classes in our software. Part of the conversion includes changing variable names from containing underscores to using camelCase instead. The problem is, I cannot simply replace all underscores in the code. We have some classes with constants and for those, the underscore should remain.
How can I replace instances like string_label with stringLabel, but DO NOT replace underscores that occur after the prefix "Parameters."?
I am currently using the following which obviously does not handle excluding certain prefixes:
public String stripUnderscores(String line) {
Pattern p = Pattern.compile("_(.)");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, m.group(1).toUpperCase());
}
m.appendTail(sb);
return sb.toString();
}
You could possibly try something like:
Pattern.compile("(?<!(class\\s+Parameters.+|Parameters\\.[\\w_]+))_(.)")
which uses a negative lookbehind.
You would probably be better served using some kind of refactoring tool that understood scoping semantics.
If all you check for is a qualified name like Parameters.is_module_installed then you will replace
class Parameters {
static boolean is_module_installed;
}
by mistake. And there are more corner cases like this. (import static Parameters.*;, etc., etc.)
Using regular expressions alone seems troublesome to me. One way you can make the routine smarter is to use regex just to capture an expression of identifiers and then you can examine it separately:
static List<String> exclude = Arrays.asList("Parameters");
static String getReplacement(String in) {
for(String ex : exclude) {
if(in.startsWith(ex + "."))
return in;
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
static String stripUnderscores(String line) {
Pattern p = Pattern.compile("([_$\\w][_$\\w\\d]+\\.?)+");
Matcher m = p.matcher(line);
StringBuffer sb = new StringBuffer();
while(m.find()) {
m.appendReplacement(sb, getReplacement(m.group()));
}
m.appendTail(sb);
return sb.toString();
}
But that will still fail for e.g. class Parameters { is_module_installed; }.
It could be made more robust by further breaking down each expression:
static String getReplacement(String in) {
if(in.contains(".")) {
StringBuilder result = new StringBuilder();
String[] parts = in.split("\\.");
for(int i = 0; i < parts.length; ++i) {
if(i > 0) {
result.append(".");
}
String part = parts[i];
if(i == 0 || !exclude.contains(parts[i - 1])) {
part = getReplacement(part);
}
result.append(part);
}
return result.toString();
}
StringBuffer b = new StringBuffer();
Matcher m = Pattern.compile("_(.)").matcher(in);
while(m.find()) {
m.appendReplacement(b, m.group(1).toUpperCase());
}
m.appendTail(b);
return b.toString();
}
That would handle a situation like
Parameters.a_b.Parameters.a_b.c_d
and output
Parameters.a_b.Parameters.a_b.cD
That's impossible Java syntax but I hope you see what I mean. Doing a little parsing yourself goes a long way.
Maybe you can have another Pattern:
Pattern p = Pattern.compile("^Parameters.*"); //^ means the beginning of a line
If this matches , don't replace anything.

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:
String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;
I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.
This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:
String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
Spaces are initially left in the input so the split will still work.
By removing the rubbish characters before splitting, you avoid having to loop through the elements.
You can use following regular expression construct
Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
inputString.replaceAll("\\p{Punct}", "");
You may try this:-
Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);
[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.
If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:
public String modified(final String input){
final StringBuilder builder = new StringBuilder();
for(final char c : input.toCharArray())
if(Character.isLetterOrDigit(c))
builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
return builder.toString();
}
It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.
I don't like to use regex, so here is another simple solution.
public String removePunctuations(String s) {
String res = "";
for (Character c : s.toCharArray()) {
if(Character.isLetterOrDigit(c))
res += c;
}
return res;
}
Note: This will include both Letters and Digits
If your goal is to REMOVE punctuation, then refer to the above. If the goal is to find words, none of the above solutions does that.
INPUT: "This. and:that. with'the-other".
OUTPUT: ["This", "and", "that", "with", "the", "other"]
but what most of these "replaceAll" solutions is actually giving you is:
OUTPUT: ["This", "andthat", "withtheother"]

How to replace each indice of a substring

String s = "Elephant";
String srep = (s.replaceAll(s.substring(4,6), "_" ));
System.out.println(srep);
So my code outputs Elep_nt But I want it to replace each individual indice of that substring with an underscore so that it would output Elep__nt
is there anyway to do this in a single line? would I have to use a loop?
The problem with yours is that you are matching "ha" at once, thus it gets replaced by only one char. (Notice also if you had "Elephantha" the last "ha" would be replaced as well.)
You could use a lookbehind to determine each single character to be replaced. So to "replace chars from position 4 to 5" you could use:
String s = "Elephant";
String srep = s.replaceAll("(?<=^.{4,5}).", "_");
System.out.println(srep);
Output:
Elep__nt
You can use a StringBuilder:
StringBuilder result = new StringBuilder(s.length());
result.append(s.substring(0, 4));
for (int i = 4; i < 6; i++)
result.append('_');
result.append(s.substring(6));
String srep = result.toString();
System.out.println(srep);
Elep__nt
Since you have asked for oneliner here is another possible way.
String srep = s.substring(0,4)+s.substring(4,6).replaceAll(".", "_")+s.substring(6);
Or using StringBuilder
String srep = new StringBuilder(s).replace(4, 6, s.substring(4,6).replaceAll(".", "_")).toString();
Output
Elep__nt
But note that internally regex replaceAll uses loop anyways
int difference = 6-4;
StringBuilder sb = new StringBuilder();
for(int count=0; count<difference; count++){
sb.append("_");
}
String s = "Elephant";
String srep = (s.replaceAll(s.substring(4,6), sbsb.toString() ));

Categories