Java replace New Lines, Commas, and Spaces at end of String

Java replace New Lines, Commas, and Spaces at end of String - java

I am using
mString.replaceAll("[\n,\\s]$", "");
Not working, what is the correct way to remove newlines commas or spaces from the end of a string if the can appear in any order.

Try this
mString = mString.replaceAll("[\n,\\s]+$", "");

There are two reasons your attempt
mString.replaceAll("[\n,\\s]$", "");
doesn't work. First of all, replaceAll does not modify the String instance, because Strings are immutable. It returns the modified string as the result of the method. But the above statement discards the result. So you at least need
mString = mString.replaceAll(...);
The second reason is that the replacement method looks for the pattern in order. If it started over at the beginning of the string after each replacement, then your expression would replace a newline, comma, or whitespace at the end of the string, then it would keep doing it until there were no more such characters at the end. But it doesn't do things this way (and if it did, it would be way too easy to write replaceAll expressions that looped infinitely). replaceAll works like this: It searches for the pattern, and if it finds it, it copies all characters before the pattern to the result. Then, it copies the replacement string to the result. Then, it resets the matcher to the character after the match. In your case, since the pattern match goes to the end of the input (because of the $), the character after the match will be the end of the string, and there can be no more matches. Thus, the matcher would only be able to replace one character. That's why you need to add + to the pattern, as in the other correct answers, like Anubhava's:
mString = mString.replaceAll("[,\\s]+$", "");

You can just take out \n since \s includes new lines also. You also need to add + quantifier to make it match more than 1 occurrence of whitespace or comma at end.
mString = mString.replaceAll("[,\\s]+$", "");

Try mString = mString.replaceAll("(\\n|,|\\s)+$", "");

Related

Regex expression that keeps upper/lower case characters AND whitespace?

I need to parse some text, I am doing this using a Regex expression within the replaceAll() method. This is the line where I use it:
String parsedValue = selectedValue.replaceAll("[^A-Za-z]", "");
This is nearly perfect, it removes the numbers from the string, however it also gets rid of the spaces and I need to keep the spaces? How can I modify it to do this?
For example, "Local Police 101" would become "Local Police".

You're so close! You just need to add a space to your list of "not", so you end up with "[^A-Za-z ]";
String parsedValue = selectedValue.replaceAll("[^A-Za-z ]", "");
Notice the space after the lowercase "z" in your regular expression.
Edit:
Looking at your example, you're also wanting to remove the leftover spaces at the beginning and end of the string. To do this, you will also want to trim the result of replaceAll. To do this, simply add .trim() after replaceAll(). You'll end up with something like this:
String parsedValue = selectedValue.replaceAll("[^A-Za-z ]", "").trim();

Split String 2 times but with different splits ";" and "."

Original String: "12312123;www.qwerty.com"
With this Model.getList().get(0).split(";")[1]
I get: "www.qwerty.com"
I tried doing this: Model.getList().get(0).split(";")[1].split(".")[1]
But it didnt work I get exception. How can I solve this?
I want only "qwerty"

Try this, to achieve "qwerty":
Model.getList().get(0).split(";")[1].split("\\.")[1]
You need escape dot symbol

Try to use split(";|\\.") like this:
for (String string : "12312123;www.qwerty.com".split(";|\\.")) {
System.out.println(string);
}
Output:
12312123
www
qwerty
com

You can split a string which has multiple delimiters. Example below:
String abc = "11;xyz.test.com";
String[] tokens = abc.split(";|\\.");
System.out.println(tokens[tokens.length-2]);

The array index 1 part doesn't make sense here. It will throw an ArrayIndexOutOfBounds Exception or something of the sort.
This is because splitting based on "." doesn't work the way you want it to. You would need to escape the period by putting "\." instead. You will find here that "." means something completely different.

You'd need to escape the ., i.e. "\\.". Period is a special character in regular expressions, meaning "any character".
What your current split means is "split on any character"; this means that it splits the string into a number of empty strings, since there is nothing between consecutive occurrences of " any character".
There is a subtle gotcha in the behaviour of the String.split method, which is that it discards trailing empty strings from the token array (unless you pass a negative number as the second parameter).
Since your entire token array consists of empty strings, all of these are discarded, so the result of the split is a zero-length array - hence the exception when you try to access one of its element.

Don't use split, use a regular expression (directly). It's safer, and faster.
String input = "12312123;www.qwerty.com";
String regex = "([^.;]+)\\.[^.;]+$";
Matcher m = Pattern.compile(regex).matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // prints: qwerty
}

Regex required to update a character

I have a String : testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing
I want to replace the character s with some other character sequence suppose : <b>X</b> but i want the character sequence s to remain intact i.e. regex should not update the character s with a previous character as "<".
I used the JAVA code :
String str = testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
str = str.replace("s[^<]", "<b>X</b>");
The problem is that the regex would match 2 characters, s and following character if it is not ">" and Sting.replace would replace both the characters. I want only s to be replaced and not the following character.
Any help would be appreciated. Since i have lots of such replacements i don't want to use a loop matching each character and updating it sequentially.

There are other ways, but you could, for example, capture the second character and put it back:
str = str.replaceAll("s([^<])", "<b>X\\1</b>");

Looks like you want a negative lookahead:
s(?!<)
String str = "testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;";
System.out.println(str.replaceAll("s(?!<)", "<b>X</b>"));
output:
te<b>X</b>ting<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;

Use look arounds to assert, but not capture, surrounding text:
str = str.replaceAll("s(?![^<]))", "whatever");
Or, capture and put back using a back reference $1:
str = str.replaceAll("s([^<])", "whatever$1");
Note that you need to use replaceAll() (which use regex), rather than replace() (which uses plain text).

How do I remove all punctuation that follows a single word in Java?

I need to remove punctuation following a word. For example, word?! should be changed to word and string: should be changed to string.
Edit: The algorithm should only remove punctuation at the end of the String. Any punctuation within the String should stay. For instance, doesn't; should become doesn't.

Use the method replaceAll(...) which accept a regular expression.
String s = "don't. do' that! ";
s = s.replaceAll("(\\w+)\\p{Punct}(\\s|$)", "$1$2");
System.out.println(s);

You could use a regex to modify the string.
String resultString = subjectString.replaceAll("([a-z]+)[?:!.,;]*", "$1");
There are no "words" that I know of where ' is at the end and it is used as a punctuation. So this regex will work for you.

Problem replacing words using [^a-zA-Z] regex

Just could not get this one and googling did not help much either..
First something that I know: Given a string and a regex, how to replace all the occurrences of strings that matches this regular expression by a replacement string ? Use the replaceAll() method in the String class.
Now something that I am unable to do. The regex I have in my code now is [^a-zA-Z] and I know for sure that this regex is definitely going to have a range. Only some more characters might be added to the list. What I need as output in the code below is Worksheet+blah but what I get using replaceAll() is Worksheet++++blah
String homeworkTitle = "Worksheet%#5_blah";
String unwantedCharactersRegex = "[^a-zA-Z]";
String replacementString = "+";
homeworkTitle = homeworkTitle.replaceAll(unwantedCharactersRegex,replacementString);
System.out.println(homeworkTitle);
What is the way to achieve the output that I wish for? Are there any Java methods that I am missing here?

[^a-zA-Z]+
Will do it nicely.
You just need a greedy quantifier in order to match as many non-alphabetical characters you can, and replace the all match by one '+' (a - by default - greedy quantifier)
Note: [^a-zA-Z]+? would make the '+' quantifier lazy, and would have give you the same result than [^a-zA-Z], since it would only have matched only one non-alphabetical character at a time.

String unwantedCharactersRegex = "[^a-zA-Z]"
This matches a single non-letter. So each single non-letter is replaced by a +. You need to say "one or more", so try
String unwantedCharactersRegex = "[^a-zA-Z]+"

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java replace New Lines, Commas, and Spaces at end of String - java

I am using mString.replaceAll("[\n,\\s]$", ""); Not working, what is the correct way to remove newlines commas or spaces from the end of a string if the can appear in any order.

Try this mString = mString.replaceAll("[\n,\\s]+$", "");

You can just take out \n since \s includes new lines also. You also need to add + quantifier to make it match more than 1 occurrence of whitespace or comma at end. mString = mString.replaceAll("[,\\s]+$", "");

Try mString = mString.replaceAll("(\\n|,|\\s)+$", "");

Related

Regex expression that keeps upper/lower case characters AND whitespace?

Split String 2 times but with different splits ";" and "."

Regex required to update a character

How do I remove all punctuation that follows a single word in Java?

Problem replacing words using [^a-zA-Z] regex

Categories

Resources