Remove a specific word from a string - java

I'm trying to remove a specific word from a certain string using the function replace() or replaceAll() but these remove all the occurrences of this word even if it's part of another word!
Example:
String content = "is not like is, but mistakes are common";
content = content.replace("is", "");
output: "not like , but mtakes are common"
desired output: "not like , but mistakes are common"
How can I substitute only whole words from a string?

What the heck,
String regex = "\\s*\\bis\\b\\s*";
content = content.replaceAll(regex, "");
Remember you need to use replaceAll(...) to use regular expressions, not replace(...)
\\b gives you the word boundaries
\\s* sops up any white space on either side of the word being removed (if you want to remove this too).

content = content.replaceAll("\\Wis\\W|^is\\W|\\Wis$", "");

You can try replacing " is " by " ". The is with a space before and one after, replaced by a single space.
Update:
To make it work for the first "is" in the sentence, also do another replace of "is " for "". Replacing the first is and the first space, with an empty string.

public static void main(String[] args) {
Scanner s = new Scanner(System.in);
String input = s.nextLine();
char c = s.next().charAt(0);
System.out.println(removeAllOccurrencesOfChar(input, c));
}
public static String removeAllOccurrencesOfChar(String input, char c) {
String r = "";
for (int i = 0; i < input.length(); i ++) {
if (input.charAt(i) != c) r += input.charAt(i);
}
return r;
}
}

Related

I am trying to write a numerify code like in Javafakers API

So, I need to replace '#' in a string with a random number(0-9).
Eg: String: "I am l#earning abou#t Jav#a".
I am expecting an output like "I am l1earning abou5t Jav3a".
From the below code, I am getting an output like
"I am l2earning abou2t Jav2a" where the code is generating random numbers but after reruns.
What changes have to be performed in the code to generate different random numbers in the same string?
Java
import java.util.Random;
public class numerify {
public static void main(String[] args) {
String str = " I am l#earning abo#ut Jav#a.";
String num="1234567890";
Random r= new Random();
int n=num.length();
for (int i = 0; i < str.length() - 1; i++) {
char ran= num.charAt(r.nextInt(n));
if (str.charAt(i) == '#') {
str = str.replace('#',ran);
}
}
System.out.println(str);
}
}
The method replace replaces all occurances.
Try using replaceFirst https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#replaceFirst-java.lang.String-java.lang.String-
str = str.replaceFirst('#',ran);
If the condition is only true for the first time and after that replace will replace all occurrences of #, so there is no # available to replace next time.
You need to replace only one # at a time. Use replaceFirst instead.
if (str.charAt(i) == '#') {
char ran= num.charAt(index);
str = str.replaceFirst("#",Character.toString(ran));
}
ReplaceFirst() needs to be used instead of Replace().
ReplaceFirst()method replaces the first substring of this string that matches the given regular expression with the given replacement.

Java Finding all words begining with a letter

I am trying to get all words that begin with a letter from a long string. How would you do this is java? I don't want to loop through every letter or something inefficient.
EDIT: I also can't use any in built data structures (except arrays of course)- its for a cs class. I can however make my own data structures (which i have created sevral).
You could try obtaining an array collection from your String and then iterating through it:
String s = "my very long string to test";
for(String st : s.split(" ")){
if(st.startsWith("t")){
System.out.println(st);
}
}
You need to be clear about some things. What is a "word"? You want to find only "words" starting with a letter, so I assume that words can have other characters too. But what chars are allowed? What defines the start of such a word? Whitespace, any non letter, any non letter/non digit, ...?
e.g.:
String TestInput = "test séntènce îwhere I'm want,to üfind 1words starting $with le11ers.";
String regex = "(?<=^|\\s)\\pL\\w*";
Pattern p = Pattern.compile(regex, Pattern.UNICODE_CHARACTER_CLASS);
Matcher matcher = p.matcher(TestInput);
while (matcher.find()) {
System.out.println(matcher.group());
}
The regex (?<=^|\s)\pL\w* will find sequences that starts with a letter (\pL is a Unicode property for letter), followed by 0 or more "word" characters (Unicode letters and numbers, because of the modifier Pattern.UNICODE_CHARACTER_CLASS).
The lookbehind assertion (?<=^|\s) ensures that there is the start of the string or a whitespace before the sequence.
So my code will print:
test
séntènce ==> contains non ASCII letters
îwhere ==> starts with a non ASCII letter
I ==> 'm is missing, because `'` is not in `\w`
want
üfind ==> starts with a non ASCII letter
starting
le11ers ==> contains digits
Missing words:
,to ==> starting with a ","
1words ==> starting with a digit
$with ==> starting with a "$"
You could build a HashMap -
HashMap<String,String> map = new HashMap<String,String>();
example -
ant, bat, art, cat
Hashmap
a -> ant,art
b -> bat
c -> cat
to find all words that begin with "a", just do
map.get("a")
You can get the first letter of the string and check with API method that if it is letter or not.
String input = "jkk ds 32";
String[] array = input.split(" ");
for (String word : array) {
char[] arr = word.toCharArray();
char c = arr[0];
if (Character.isLetter(c)) {
System.out.println( word + "\t isLetter");
} else {
System.out.println(word + "\t not Letter");
}
}
Following are some sample output:
jkk isLetter
ds isLetter
32 not Letter
Scanner scan = new Scanner(text); // text being the string you are looking in
char test = 'x'; //whatever letter you are looking for
while(scan.hasNext()){
String wordFound = scan.next();
if(wordFound.charAt(0)==test){
//do something with the wordFound
}
}
this will do what you are looking for, inside the if statement do what you want with the word
Regexp way:
public static void main(String[] args) {
String text = "my very long string to test";
Matcher m = Pattern.compile("(^|\\W)(\\w*)").matcher(text);
while (m.find()) {
System.out.println("Found: "+m.group(2));
}
}
You can use split() method. Here is an example :
String string = "your string";
String[] parts = string.split(" C");
for(int i=0; i<parts.length; i++) {
String[] word = parts[i].split(" ");
if( i > 0 ) {
// ignore the rest words because don't starting with C
System.out.println("C" + word[0]);
}
else { // Check 1st excplicitly
for(int j=0; j<word.length; j++) {
if ( word[j].startsWith("c") || word[j].startsWith("C"))
System.out.println(word[j]);
}
}
}
where "C" is you letter. Just then loop around the array. For parts[0] you have to check if it starts with "C". It was my mistake to start looping from i=1. The correct is from 0.

How can I remove punctuation from input text in Java?

I am trying to get a sentence using input from the user in Java, and i need to make it lowercase and remove all punctuation. Here is my code:
String[] words = instring.split("\\s+");
for (int i = 0; i < words.length; i++) {
words[i] = words[i].toLowerCase();
}
String[] wordsout = new String[50];
Arrays.fill(wordsout,"");
int e = 0;
for (int i = 0; i < words.length; i++) {
if (words[i] != "") {
wordsout[e] = words[e];
wordsout[e] = wordsout[e].replaceAll(" ", "");
e++;
}
}
return wordsout;
I cant seem to find any way to remove all non-letter characters. I have tried using regexes and iterators with no luck. Thanks for any help.
This first removes all non-letter characters, folds to lowercase, then splits the input, doing all the work in a single line:
String[] words = instring.replaceAll("[^a-zA-Z ]", "").toLowerCase().split("\\s+");
Spaces are initially left in the input so the split will still work.
By removing the rubbish characters before splitting, you avoid having to loop through the elements.
You can use following regular expression construct
Punctuation: One of !"#$%&'()*+,-./:;<=>?#[]^_`{|}~
inputString.replaceAll("\\p{Punct}", "");
You may try this:-
Scanner scan = new Scanner(System.in);
System.out.println("Type a sentence and press enter.");
String input = scan.nextLine();
String strippedInput = input.replaceAll("\\W", "");
System.out.println("Your string: " + strippedInput);
[^\w] matches a non-word character, so the above regular expression will match and remove all non-word characters.
If you don't want to use RegEx (which seems highly unnecessary given your problem), perhaps you should try something like this:
public String modified(final String input){
final StringBuilder builder = new StringBuilder();
for(final char c : input.toCharArray())
if(Character.isLetterOrDigit(c))
builder.append(Character.isLowerCase(c) ? c : Character.toLowerCase(c));
return builder.toString();
}
It loops through the underlying char[] in the String and only appends the char if it is a letter or digit (filtering out all symbols, which I am assuming is what you are trying to accomplish) and then appends the lower case version of the char.
I don't like to use regex, so here is another simple solution.
public String removePunctuations(String s) {
String res = "";
for (Character c : s.toCharArray()) {
if(Character.isLetterOrDigit(c))
res += c;
}
return res;
}
Note: This will include both Letters and Digits
If your goal is to REMOVE punctuation, then refer to the above. If the goal is to find words, none of the above solutions does that.
INPUT: "This. and:that. with'the-other".
OUTPUT: ["This", "and", "that", "with", "the", "other"]
but what most of these "replaceAll" solutions is actually giving you is:
OUTPUT: ["This", "andthat", "withtheother"]

How to get alphabets only from given albha-numberic word in java?

sorry for this if this is a silly question.but i need to know about this.
If i have a word like alphabets,numeric and special charters. I need to extract alphabets only.No need for numeric and special characters.I need to know is there default function is there in Java to split characters only?
eg.String word="te123##st";
I need test only.
This solution works with accentued/non-ascii caracters :
"te123##st\néàø_".replaceAll("[\\p{Digit}\\p{Punct}\\p{Space}]", "");
try this word.replaceAll("[^a-zA-Z]", "");
This will remove all non alphanumeric characters, but it will still remove accented characters.
String word = "te123##st";
word = word.replaceAll("[^\\p{Alpha}]", "");
// or word = word.replaceAll("[\\P{Alpha}]", "");
See apidoc reference.
try
word = word.replaceAll("\\P{Alpha}", "");
String word = "te123##st";
word = word.replaceAll("[\\W\\d._]", "");
try this:
word = word.replaceAll("[\\d##_]", "");
- I won't make this complicated using Regex, but will use inbuilt Java functionalities to answer this.
- First use subString() method to get the "abcd" part of the String, then use toCharArray() method to break the String into char elements, then use Character class's isDigit() method to know whether its a digit or not.
public class T1 {
public static void main(String[] args){
String s = "te123##st";
String str = s.substring(0,4);
System.out.println(str);
String tempStr = new String();
char[] cArr = str.toCharArray();
for(char a :cArr){
if(Character.isAlphabetic(a)){
System.out.println(a+" is a alphabet");
tempStr = tempStr + a;
}else{
System.out.println(a+" is not a alphabet");
}
}
System.out.println("The extracted String is: "+tempStr);
}
}

Java regex to filter phone numbers

I have following example string that needs to be filtered
0173556677 (Alice), 017545454 (Bob)
This is how phone numbers are added to a text view. I want the text to look like that
0173556677;017545454
Is there a way to change the text using regular expression. How would such an expression look like? Or do you recommend an other method?
You can do as follows:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String regex = " \\(.+?\\)";
String res = orig.replaceAll(regex, "").replaceAll(",", ";");
// ^remove all content in parenthesis
// ^ replace comma with semicolon
Use the expression in android.util.Patterns
Access the static variable
Patterns.PHONE
or use this expression here (Android Source Code)
Here's a resource that can guide you :
http://www.zparacha.com/validate-email-ssn-phone-number-using-java-regular-expression/
This solution works with phone numbers separated with any string that does not contain numbers:
String orig = "0173556677 (Alice), 017545454 (Bob)";
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
StringBuilder sb = new StringBuilder();
if (numbers.length > 0) {
sb.append(numbers[0]);
for (int i = 1; i < numbers.length; i++) { //concatenate all that is left
sb.append(";");
sb.append(numbers[i]);
}
}
String res = sb.toString();
or, with com.google.common.base.Joiner:
String[] numbers = orig.split("\\D+"); //split at everything that is not a digit
String res = Joiner.on(";").join(numbers);
PS. There is a minor deviation from the requirements in the best voted example, but it seems I cannot just add one character (should be replaceAll(", ", ";"), with a space after the coma, or a \\s) and I do not want to mess somebody's code.

Categories