regex for tab doesnt work | java - java

I'm trying to do regex for splitting string when tab is spotted.
I used this :
String line = scan.nextLine(); String Splitted[] = line.split("\t");
but it doesn't work so currently I'm using (which is working for me) :
String line = scan.nextLine(); String Splitted[] = line.split("\\s\\s\\s\\s");
Do you guys have idea why I can't use the "\t" regex?

Yes, \t is a valid Regex, but in Java string literals, a backslash has a special meaning, so to get the Regex symbol \t you'll have to use \\t. But since you are processing user input, you never know what this "tab" really consists of (could be a tab symbol or 4 spaces). So maybe you should just split at (\\t|\\s{2,}) - beware, this is a Java string literal. Hence the double backslash.
EDIT: In my above answer i suspect you don't want to split at single whitespaces too, is that right? In case you do want to split at single whitespaces, you could really just use \\s+ instead.

Related

Java Regex - Remove Non-Alphanumeric characters except line breaks

I'm trying to remove all the non-alphanumeric characters from a String in Java but keep the carriage returns. I have the following regular expression, but it keeps joining words before and after a line break.
[^\\p{Alnum}\\s]
How would I be able to preserve the line breaks or convert them into spaces so that I don't have words joining?
An example of this issue is shown below:
Original Text
and refreshingly direct
when compared with the hand-waving of Swinburne.
After Replacement:
and refreshingly directwhen compared with the hand-waving of Swinburne.
You may add these chars to the regex, not \s, as \s matches any whitespace:
String reg = "[^\\p{Alnum}\n\r]";
Or, you may use character class subtraction:
String reg = "[\\P{Alnum}&&[^\n\r]]";
Here, \P{Alnum} matches any non-alphanumeric and &&[^\n\r] prevents a LF and CR from matching.
A Java test:
String s = "&&& Text\r\nNew line".replaceAll("[^\\p{Alnum}\n\r]+", "");
System.out.println(s);
// => Text
Newline
Note that there are more line break chars than LF and CR. In Java 8, \R construct matches any style linebreak and it matches \u000D\u000A|\[\u000A\u000B\u000C\u000D\u0085\u2028\u2029\].
So, to exclude matching any line breaks, you may use
String reg = "[^\\p{Alnum}\\u000A\\u000B\\u000C\\u000D\\u0085\\u2028\\u2029]+";
You can use this regex [^A-Za-z0-9\\n\\r] for example :
String result = str.replaceAll("[^a-zA-Z0-9\\n\\r]", "");
Example
Input
aaze03.aze1654aze987 */-a*azeaze\n hello *-*/zeaze+64\nqsdoi
Output
aaze03aze1654aze987aazeaze
hellozeaze64
qsdoi
I made a mistake with my code. I was reading in a file line by line and building the String, but didn't add a space at the end of each line. Therefore there were no actual line breaks to replace.
That's a perfect case for Guava's CharMatcher:
String input = "and refreshingly direct\n\rwhen compared with the hand-waving of Swinburne.";
String output = CharMatcher.javaLetterOrDigit().or(CharMatcher.whitespace()).retainFrom(input);
Output will be:
and refreshingly direct
when compared with the handwaving of Swinburne

Can't match two different regex on split

So I'm using a String as a delimiter to use when I call the Split method.
String[] aExpr;
String strDelimiter = "[-+/=^//%//*//(//);:?]";
aExpr = expr.split(strDelimiter);
This fills aExpr with the strings broken accordingly with the strDelimiter.
The thing is that I also want the Split() method to compare not only the strDelimiter, but also this String:
String oprDelimiter = "[abcdefghijklmnopqrstuvwxyz0123456789]+"
Which is basically any characters followed by numbers. I could add all these characters to the First String, but the + in the end won't let me. The + means that any combination of the words will break the String. Any ideas of how could I do this?
Try using this regex:
(?<=[abcdefghijklmnopqrstuvwxyz0123456789])(?=[-+=^%*();:?])|(?<=[-+=^%*();:?])(?=[abcdefghijklmnopqrstuvwxyz0123456789])
as the delimiter. It will split on any location that is preceded by any of the characters abcdefghijklmnopqrstuvwxyz0123456789 and followed by any of the characters -+=^%*();:?, or vice versa. Explanation and demonstration here: http://regex101.com/r/mT3lL1.

Splitting a string with java not working

I have a string in java that looks something like:
holdingco^(218) 333-4444^scott#holdingco.com
I set a string variable equal to it:
String value = "holdingco^(218) 333-4444^scott#holdingco.com";
Then I want to split this string into it's components:
String[] components = value.split("^");
However it does not split up the string. I have tried escaping the carrot delimiter to no avail.
Use
String[] components = value.split("\\^");
The unescaped ^ means beginning of a string in a regex, and the unescaped $ means end. You have to use two backslashes for escaping, as the string literal "\\" represents a single backslash, and that's what regex needs.
If you tried escaping with one backslash, it didn't compile, as \^ is not a valid escape sequence in Java.
try with: value.split("\\^"); this should work a bit better

java replaceAll and '+' match

I have some code setup to remove extra spaces in between the words of a title
String formattedString = unformattedString.replaceAll(" +"," ");
My understanding of this type of regex is that it will match as many spaces as possible before stopping. However, my strings that are coming out are not changing in any way. Is it possible that it's only matching one space at a time, and then replacing that with a space? Is there something to the replaceAll method, since it's doing multiple matches, that would alter the way this type of match would work here?
A better approach might be to use "\\s+" to match runs of all possible whitespace characters.
EDIT
Another approach might be to extract all matches for "\\b([A-Za-z0-9]+)\\b" and then join them using a space which would allow you to remove everything except for valid words and numbers.
If you need to preserve punctuation, use "(\\S+)" which will capture all runs of non-whitespace characters.
Are you sure you string is spaces and not tabs? The following is a bit more "aggressive" on whitespace.
String formattedString = unformattedString.replaceAll("\\s+"," ");
all responses should work.
Both:
String formattedString = unformattedString.replaceAll(" +"," ");
or
String formattedString = unformattedString.replaceAll("\\s+"," ");
Maybe your unformattedString is a multiline expression. In that case you can instantiate an Pattern object
String unformattedString = " Hello \n\r\n\r\n\r World";
Pattern manySpacesPattern = Pattern.compile("\\s+",Pattern.MULTILINE);
Matcher formatMatcher = manySpacesPattern.matcher(unformattedString);
String formattedString = formatMatcher.replaceAll(" ");
System.out.println(unformattedString.replaceAll("\\s+", " "));
Or maybe unformattedString have special characters in that case you can play with Pattern flags en compile method.
Examples:
Pattern.compile("\\s+",Pattern.MULTILINE|Pattern.UNIX_LINES);
or
Pattern.compile("\\s+",Pattern.MULTILINE|Pattern.UNICODE_CASE);

Java Regex: Match a character followed by whitespace?

This is driving me nuts... I have an input string like so:
String input = "T ";
And I'm trying to match and replace the string with something like so:
String output = input.replace("T\\s", "argggghhh");
System.out.println(output); // expected: "argggghhh"
// actual: "T "
What am I doing wrong? Why won't the \\s match the space?
Keep in mind I want to match multiple white space characters (\\s+), but I can't get this simple case to work :(.
Use replaceAll() instead of replace().
replace() does not use regular expressions.
See http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replace(java.lang.CharSequence, java.lang.CharSequence) vs. http://download.oracle.com/javase/6/docs/api/java/lang/String.html#replaceAll(java.lang.String, java.lang.String)

Categories