Java Regex: Match a character followed by whitespace?

Java Regex: Match a character followed by whitespace? - java

This is driving me nuts... I have an input string like so:
String input = "T ";
And I'm trying to match and replace the string with something like so:
String output = input.replace("T\\s", "argggghhh");
System.out.println(output); // expected: "argggghhh"
// actual: "T "
What am I doing wrong? Why won't the \\s match the space?
Keep in mind I want to match multiple white space characters (\\s+), but I can't get this simple case to work :(.

Use replaceAll() instead of replace().
replace() does not use regular expressions.
See http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence) vs. http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String)

Related

Java Regex - Remove Non-Alphanumeric characters except line breaks

I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns. I have the following regular expression, but it keeps joining words before and after a line break.
[^\\p{Alnum}\\s]
How would I be able to preserve the line breaks or convert them into spaces so that I don't have words joining?
An example of this issue is shown below:
Original Text
and refreshingly direct
when compared with the hand-waving of Swinburne.
After Replacement:
and refreshingly directwhen compared with the hand-waving of Swinburne.

You may add these chars to the regex, not \s, as \s matches any whitespace:
String reg = "[^\\p{Alnum}\n\r]";
Or, you may use character class subtraction:
String reg = "[\\P{Alnum}&&[^\n\r]]";
Here, \P{Alnum} matches any non-alphanumeric and &&[^\n\r] prevents a LF and CR from matching.
A Java test:
String s = "&&& Text\r\nNew line".replaceAll("[^\\p{Alnum}\n\r]+", "");
System.out.println(s);
// => Text
Newline
Note that there are more line break chars than LF and CR. In Java 8, \R construct matches any style linebreak and it matches \u000D\u000A|\[\u000A\u000B\u000C\u000D\u0085\u2028\u2029\].
So, to exclude matching any line breaks, you may use
String reg = "[^\\p{Alnum}\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]+";

You can use this regex [^A-Za-z0-9\\n\\r] for example :
String result = str.replaceAll("[^a-zA-Z0-9\\n\\r]", "");
Example
Input
aaze03.aze1654aze987 */-a*azeaze\n hello *-*/zeaze+64\nqsdoi
Output
aaze03aze1654aze987aazeaze
hellozeaze64
qsdoi

I made a mistake with my code. I was reading in a file line by line and building the String, but didn't add a space at the end of each line. Therefore there were no actual line breaks to replace.

That's a perfect case for Guava's CharMatcher:
String input = "and refreshingly direct\n\rwhen compared with the hand-waving of Swinburne.";
String output = CharMatcher.javaLetterOrDigit().or(CharMatcher.whitespace()).retainFrom(input);
Output will be:
and refreshingly direct
when compared with the handwaving of Swinburne

Can't match two different regex on split

So I'm using a String as a delimiter to use when I call the Split method.
String[] aExpr;
String strDelimiter = "[-+/=^//%//*//(//);:?]";
aExpr = expr.split(strDelimiter);
This fills aExpr with the strings broken accordingly with the strDelimiter.
The thing is that I also want the Split() method to compare not only the strDelimiter, but also this String:
String oprDelimiter = "[abcdefghijklmnopqrstuvwxyz0123456789]+"
Which is basically any characters followed by numbers. I could add all these characters to the First String, but the + in the end won't let me. The + means that any combination of the words will break the String. Any ideas of how could I do this?

Try using this regex:
(?<=[abcdefghijklmnopqrstuvwxyz0123456789])(?=[-+=^%*();:?])|(?<=[-+=^%*();:?])(?=[abcdefghijklmnopqrstuvwxyz0123456789])
as the delimiter. It will split on any location that is preceded by any of the characters abcdefghijklmnopqrstuvwxyz0123456789 and followed by any of the characters -+=^%*();:?, or vice versa. Explanation and demonstration here: http://regex101.com/r/mT3lL1.

Validate string that contains template in regex

I have a problem trying to validate this string...
So, the user selects a template: q( ). Then, the user fills in the contents inside the brackets which can end up like this:
q(a,b,c)
I have tried different ways using regex to validate this String, but it keeps returning the answer "No". I believe the problem is "q(" and ")" in my regex as I am not sure how it should look like.
Here's a snippet of the code:
String data2 = "q(a,b,c)";
String regex2 = "q([a-zA-Z0-9,'])";
if(data2.matches(regex2)){
System.out.println("yes");
}
else{
System.out.println("No");
}
I do have an alternative way by removing "q(" and ")" in data2 string, but I rather have it in regex without the need of removing characters in a String.
Any suggestions?

You need to escape the parentheses (and escape the escape character so that it compiles) and add a + at the end to indicate one or more characters
String regex2 = "q\\([a-zA-Z0-9,']+\\)";
You can read the meaning of every character in a regular expression in the Pattern javadoc.

how to remove only charactres in a given string in java?

I am trying to remove only [A-z|a-z] like this:
String input ="A021001208A 711100609C 01111";
String clean = input.replaceAll("\\D+^\\s+","");
System.out.println(clean.toString());
but the above code also removes the spaces; I don't want to remove space.
The expected output is:
021001208 711100609 01111
Please help me to formate the reg-ex to remove only characters.

Just replace [a-zA-Z] then:
String clean = input.replaceAll("(?i)[A-Z]+","");
(?i) is ignore case embedded flag expression.

Rather than use a positive character class, use a negated one. The regex you want is:
[^\d\s]
Which means "any character other than a digit or a whitespace".
When coded as java, it looks like:
String clean = input.replaceAll("[^\\d\\s]","");

Try this it will replace all occurence of alphabet from the given string.
String clean = input.replaceAll("[^a-zA-Z]", "");

You have to use [a-zA-Z] regular expression. So your .replaceAll() method will look like as below :
String clean = input.replaceAll("[a-zA-Z]","");

Regex required to update a character

I have a String : testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing
I want to replace the character s with some other character sequence suppose : <b>X</b> but i want the character sequence s to remain intact i.e. regex should not update the character s with a previous character as "<".
I used the JAVA code :
String str = testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;
str = str.replace("s[^<]", "<b>X</b>");
The problem is that the regex would match 2 characters, s and following character if it is not ">" and Sting.replace would replace both the characters. I want only s to be replaced and not the following character.
Any help would be appreciated. Since i have lots of such replacements i don't want to use a loop matching each character and updating it sequentially.

There are other ways, but you could, for example, capture the second character and put it back:
str = str.replaceAll("s([^<])", "<b>X\\1</b>");

Looks like you want a negative lookahead:
s(?!<)
String str = "testing<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;";
System.out.println(str.replaceAll("s(?!<)", "<b>X</b>"));
output:
te<b>X</b>ting<b>s<b>tringwit<b>h</b>nomean<b>s</b>ing;

Use look arounds to assert, but not capture, surrounding text:
str = str.replaceAll("s(?![^<]))", "whatever");
Or, capture and put back using a back reference $1:
str = str.replaceAll("s([^<])", "whatever$1");
Note that you need to use replaceAll() (which use regex), rather than replace() (which uses plain text).

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java Regex: Match a character followed by whitespace? - java

Related

Java Regex - Remove Non-Alphanumeric characters except line breaks

Can't match two different regex on split

Validate string that contains template in regex

how to remove only charactres in a given string in java?

Regex required to update a character

Categories

Resources