Regex expression that keeps upper/lower case characters AND whitespace?

Regex expression that keeps upper/lower case characters AND whitespace? - java

I need to parse some text, I am doing this using a Regex expression within the replaceAll() method. This is the line where I use it:
String parsedValue = selectedValue.replaceAll("[^A-Za-z]", "");
This is nearly perfect, it removes the numbers from the string, however it also gets rid of the spaces and I need to keep the spaces? How can I modify it to do this?
For example, "Local Police 101" would become "Local Police".

You're so close! You just need to add a space to your list of "not", so you end up with "[^A-Za-z ]";
String parsedValue = selectedValue.replaceAll("[^A-Za-z ]", "");
Notice the space after the lowercase "z" in your regular expression.
Edit:
Looking at your example, you're also wanting to remove the leftover spaces at the beginning and end of the string. To do this, you will also want to trim the result of replaceAll. To do this, simply add .trim() after replaceAll(). You'll end up with something like this:
String parsedValue = selectedValue.replaceAll("[^A-Za-z ]", "").trim();

Related

Split String 2 times but with different splits ";" and "."

Original String: "12312123;www.qwerty.com"
With this Model.getList().get(0).split(";")[1]
I get: "www.qwerty.com"
I tried doing this: Model.getList().get(0).split(";")[1].split(".")[1]
But it didnt work I get exception. How can I solve this?
I want only "qwerty"

Try this, to achieve "qwerty":
Model.getList().get(0).split(";")[1].split("\\.")[1]
You need escape dot symbol

Try to use split(";|\\.") like this:
for (String string : "12312123;www.qwerty.com".split(";|\\.")) {
System.out.println(string);
}
Output:
12312123
www
qwerty
com

You can split a string which has multiple delimiters. Example below:
String abc = "11;xyz.test.com";
String[] tokens = abc.split(";|\\.");
System.out.println(tokens[tokens.length-2]);

The array index 1 part doesn't make sense here. It will throw an ArrayIndexOutOfBounds Exception or something of the sort.
This is because splitting based on "." doesn't work the way you want it to. You would need to escape the period by putting "\." instead. You will find here that "." means something completely different.

You'd need to escape the ., i.e. "\\.". Period is a special character in regular expressions, meaning "any character".
What your current split means is "split on any character"; this means that it splits the string into a number of empty strings, since there is nothing between consecutive occurrences of " any character".
There is a subtle gotcha in the behaviour of the String.split method, which is that it discards trailing empty strings from the token array (unless you pass a negative number as the second parameter).
Since your entire token array consists of empty strings, all of these are discarded, so the result of the split is a zero-length array - hence the exception when you try to access one of its element.

Don't use split, use a regular expression (directly). It's safer, and faster.
String input = "12312123;www.qwerty.com";
String regex = "([^.;]+)\\.[^.;]+$";
Matcher m = Pattern.compile(regex).matcher(input);
if (m.find()) {
System.out.println(m.group(1)); // prints: qwerty
}

Java replace New Lines, Commas, and Spaces at end of String

I am using
mString.replaceAll("[\n,\\s]$", "");
Not working, what is the correct way to remove newlines commas or spaces from the end of a string if the can appear in any order.

Try this
mString = mString.replaceAll("[\n,\\s]+$", "");

There are two reasons your attempt
mString.replaceAll("[\n,\\s]$", "");
doesn't work. First of all, replaceAll does not modify the String instance, because Strings are immutable. It returns the modified string as the result of the method. But the above statement discards the result. So you at least need
mString = mString.replaceAll(...);
The second reason is that the replacement method looks for the pattern in order. If it started over at the beginning of the string after each replacement, then your expression would replace a newline, comma, or whitespace at the end of the string, then it would keep doing it until there were no more such characters at the end. But it doesn't do things this way (and if it did, it would be way too easy to write replaceAll expressions that looped infinitely). replaceAll works like this: It searches for the pattern, and if it finds it, it copies all characters before the pattern to the result. Then, it copies the replacement string to the result. Then, it resets the matcher to the character after the match. In your case, since the pattern match goes to the end of the input (because of the $), the character after the match will be the end of the string, and there can be no more matches. Thus, the matcher would only be able to replace one character. That's why you need to add + to the pattern, as in the other correct answers, like Anubhava's:
mString = mString.replaceAll("[,\\s]+$", "");

You can just take out \n since \s includes new lines also. You also need to add + quantifier to make it match more than 1 occurrence of whitespace or comma at end.
mString = mString.replaceAll("[,\\s]+$", "");

Try mString = mString.replaceAll("(\\n|,|\\s)+$", "");

Validate string that contains template in regex

I have a problem trying to validate this string...
So, the user selects a template: q( ). Then, the user fills in the contents inside the brackets which can end up like this:
q(a,b,c)
I have tried different ways using regex to validate this String, but it keeps returning the answer "No". I believe the problem is "q(" and ")" in my regex as I am not sure how it should look like.
Here's a snippet of the code:
String data2 = "q(a,b,c)";
String regex2 = "q([a-zA-Z0-9,'])";
if(data2.matches(regex2)){
System.out.println("yes");
}
else{
System.out.println("No");
}
I do have an alternative way by removing "q(" and ")" in data2 string, but I rather have it in regex without the need of removing characters in a String.
Any suggestions?

You need to escape the parentheses (and escape the escape character so that it compiles) and add a + at the end to indicate one or more characters
String regex2 = "q\\([a-zA-Z0-9,']+\\)";
You can read the meaning of every character in a regular expression in the Pattern javadoc.

Regex required to update a character

I have a String : testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing
I want to replace the character s with some other character sequence suppose : <b>X</b> but i want the character sequence s to remain intact i.e. regex should not update the character s with a previous character as "<".
I used the JAVA code :
String str = testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
str = str.replace("s[^<]", "<b>X</b>");
The problem is that the regex would match 2 characters, s and following character if it is not ">" and Sting.replace would replace both the characters. I want only s to be replaced and not the following character.
Any help would be appreciated. Since i have lots of such replacements i don't want to use a loop matching each character and updating it sequentially.

There are other ways, but you could, for example, capture the second character and put it back:
str = str.replaceAll("s([^<])", "<b>X\\1</b>");

Looks like you want a negative lookahead:
s(?!<)
String str = "testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;";
System.out.println(str.replaceAll("s(?!<)", "<b>X</b>"));
output:
te<b>X</b>ting<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;

Use look arounds to assert, but not capture, surrounding text:
str = str.replaceAll("s(?![^<]))", "whatever");
Or, capture and put back using a back reference $1:
str = str.replaceAll("s([^<])", "whatever$1");
Note that you need to use replaceAll() (which use regex), rather than replace() (which uses plain text).

How do I remove all punctuation that follows a single word in Java?

I need to remove punctuation following a word. For example, word?! should be changed to word and string: should be changed to string.
Edit: The algorithm should only remove punctuation at the end of the String. Any punctuation within the String should stay. For instance, doesn't; should become doesn't.

Use the method replaceAll(...) which accept a regular expression.
String s = "don't. do' that! ";
s = s.replaceAll("(\\w+)\\p{Punct}(\\s|$)", "$1$2");
System.out.println(s);

You could use a regex to modify the string.
String resultString = subjectString.replaceAll("([a-z]+)[?:!.,;]*", "$1");
There are no "words" that I know of where ' is at the end and it is used as a punctuation. So this regex will work for you.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Regex expression that keeps upper/lower case characters AND whitespace? - java

Related

Split String 2 times but with different splits ";" and "."

Java replace New Lines, Commas, and Spaces at end of String

Validate string that contains template in regex

Regex required to update a character

How do I remove all punctuation that follows a single word in Java?

Categories

Resources