String.split() at a meta character + - java

I'm making a simple program that will deal with equations from a String input of the equation
When I run it, however, I get an exception because of trying to replace the " +" with a " +" so i can split the string at the spaces. How should I go about using
the string replaceAll method to replace these special characters? Below is my code
Exception in thread "main" java.util.regex.PatternSyntaxException: Dangling meta character '+' near index 0
+
^
public static void parse(String x){
String z = "x^2+2=2x-1";
String[] lrside = z.split("=",4);
System.out.println("Left side: " + lrside[0] + " / Right Side: " + lrside[1]);
String rightside = lrside[0];
String leftside = lrside[1];
rightside.replaceAll("-", " -");
rightside.replaceAll("+", " +");
leftside.replaceAll("-", " -"); leftside.replaceAll("+", " +");
List<String> rightt = Arrays.asList(rightside.split(" "));
List<String> leftt = Arrays.asList(leftside.split(" "));
System.out.println(leftt);
System.out.println(rightt);

replaceAll accepts a regular expression as its first argument.
+ is a special character which denotes a quantifier meaning one or more occurrences. Therefore it should be escaped to specify the literal character +:
rightside = rightside.replaceAll("\\+", " +");
(Strings are immutable so it is necessary to assign the variable to the result of replaceAll);
An alternative to this is to use a character class which removes the metacharacter status:
rightside = rightside.replaceAll("[+]", " +");
The simplest solution though would be to use the replace method which uses non-regex String literals:
rightside = rightside.replace("+", " +");

I had similar problem with regex = "?". It happens for all special characters that have some meaning in a regex. So you need to have "\\" as a prefix to your regex.
rightside = rightside.replaceAll("\\+", " +");

String#replaceAll expects regex as input, and + is not proper pattern, \\+ would be pattern. rightside.replaceAll("\\+", " +");

The reason behind this is - There are reserved characters for regex. So when you split them using the java split() method, You will have to use them with escape.
FOr example you want to split by + or * or dot(.) then you will have to do it as split("\+") or split("\*") or split("\.") according to your need.
The reason behind my long explanation on regex is -
YOU MAY FACE IT in OTHER PLACES TOO.
For example the same issue will occur if you use replace or replaceAll methods of java Because they are also working based on regex.

Related

Regex display with arrays

So I have a regex question. When running this code
if (str1.trim().contains(search2)){
String str3 = str1;
str3 = str3.replaceAll("[^-?0-9]+", " ");
System.out.println("location: " + Arrays.asList(str3.trim().split(" ")));
System.out.println(" ");
}
it produces
location: [290, -70]
is it possible to replace the bracket characters with "[ x, x]" with "x x" so that they just show the characters within quotes?
location: "290 -70"?
I'm kinda new to regex so I tried some things like .replace("[", " "); but it did not work.
EDIT ----
Here's my entire code.
public static void main (String [] args) throws IOException {
BufferedReader in = new BufferedReader (new FileReader ("/Users/Dannybwee/Documents/workspace/csc199/src/csc199/test.txt"));
String str;
List<String> finallist = new ArrayList<String>();
while ((str = in.readLine()) != null){
finallist.add(str);
}
String search = "node";
String search2 = "position";
for (String str1: finallist) {
if (str1.trim().contains(search)){
System.out.print("{ key " + str1+ ",\n" +
"name: " + str1 + ",\n" +
"Truth: 'Tainted'," + "\n" +
"False: 'NotTainted, \n");
}
if (str1.trim().contains(search2)){
String str3 = str1;
str3 = str3.replaceAll("[^-?0-9]+", " ");
System.out.println("location: " + Arrays.asList(str3.trim().split(" ")));
System.out.println("}");
}
}
}
What i'm trying to do is take a text file, and then change the formatting of the text. I thought it would be easiest to take the file and scan for what needed to change. for instance, All I want is to change the brackets outputed above to braces.
So basically I want it to output location: "290 -70" instead of location: [290, -70] without the comma and brackets
I'm splitting because the line is positions = (number number); What I'm trying to do is just extract the number from that index
Then if you split, you get ["(number", "number)"].
You want to remove the round brackets, not the square ones. And you have already done that [^-?0-9]+ removes all characters but one or more 0-9, -, and ?
You don't need to split anything.
if (str1.trim().contains(search2)){
str1 = st1.replaceAll("[^-?0-9]+", " ");
System.out.println("location: \"" + str1 + "\"");
System.out.println("}");
}
You could also forget the regex entirely and use str1.substring(1, str1.length() - 1)
By the way, if you are trying to produce JSON, it isn't valid. The keys need to be quoted
you can specify the literal bracket with the backslash "escape character" \[. This is common for many regex entries that also correspond to triggered characters.
\\ , \. , \( ... etc
It is important to note that in Java we must escape our escape character, therefore whenever you use it you'll need a single backslash for each backslash:
\\[, \\\\, \\., \\( ... etc
You can implement this into your existing code, or you could make your life a little easier by using a pattern matcher.
Pattern p = Pattern.compile("\\D+?(-?\\d++)\\D+?(-?\\d++)\\D*");
Matcher m = p.matcher(STRING);
String results = "location: "+m.group(1)+" "+m.group(2);
\\D+? eliminates non-digit (0-9) characters reluctantly, this will spare the '-' when found.
(-?\\d++) will capture m.group(n) which will possessively contain as many digits as it can find in a row. Since the '-' was spared earlier it should be present for this capture if at all.

What does regex "\\p{Z}" mean?

I am working with some code in java that has an statement like
String tempAttribute = ((String) attributes.get(i)).replaceAll("\\p{Z}","")
I am not used to regex, so what is the meaning of it? (If you could provide a website to learn the basics of regex that would be wonderful) I've seen that for a string like
ept as y it gets transformed into eptasy, but this doesn't seem right. I believe the guy who wrote this wanted to trim leading and trailing spaces maybe.
It removes all the whitespace (replaces all whitespace matches with empty strings).
A wonderful regex tutorial is available at regular-expressions.info.
A citation from this site:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
The OP stated that the code fragment was in Java. To comment on the statement:
\p{Z} or \p{Separator}: any kind of whitespace or invisible separator.
the sample code below shows that this does not apply in Java.
public static void main(String[] args) {
// some normal white space characters
String str = "word1 \t \n \f \r " + '\u000B' + " word2";
// various regex patterns meant to remove ALL white spaces
String s = str.replaceAll("\\s", "");
String p = str.replaceAll("\\p{Space}", "");
String b = str.replaceAll("\\p{Blank}", "");
String z = str.replaceAll("\\p{Z}", "");
// \\s removed all white spaces
System.out.println("s [" + s + "]\n");
// \\p{Space} removed all white spaces
System.out.println("p [" + p + "]\n");
// \\p{Blank} removed only \t and spaces not \n\f\r
System.out.println("b [" + b + "]\n");
// \\p{Z} removed only spaces not \t\n\f\r
System.out.println("z [" + z + "]\n");
// NOTE: \p{Separator} throws a PatternSyntaxException
try {
String t = str.replaceAll("\\p{Separator}","");
System.out.println("t [" + t + "]\n"); // N/A
} catch ( Exception e ) {
System.out.println("throws " + e.getClass().getName() +
" with message\n" + e.getMessage());
}
} // public static void main
The output for this is:
s [word1word2]
p [word1word2]
b [word1
word2]
z [word1
word2]
throws java.util.regex.PatternSyntaxException with message
Unknown character property name {Separator} near index 12
\p{Separator}
^
This shows that in Java \\p{Z} removes only spaces and not "any kind of whitespace or invisible separator".
These results also show that in Java \\p{Separator} throws a PatternSyntaxException.
First of all, \p means you are going to match a class, a collection of character, not single one. For reference, this is Javadoc of Pattern class. https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
Unicode scripts, blocks, categories and binary properties are written with the \p and \P constructs as in Perl. \p{prop} matches if the input has the property prop, while \P{prop} does not match if the input has that property.
And then Z is the name of a class (collection,set) of characters. In this case, it's abbreviation of Separator . Separator containts 3 sub classes: Space_Separator(Zs), Line_Separator(Zl) and Paragraph_Separator(Zp).
Refer here for which characters those classes contains here: Unicode Character Database or
Unicode Character Categories
More document: http://www.unicode.org/reports/tr18/#General_Category_Property

Regex for special characters in java

public static final String specialChars1= "\\W\\S";
String str2 = str1.replaceAll(specialChars1, "").replace(" ", "+");
public static final String specialChars2 = "`~!##$%^&*()_+[]\\;\',./{}|:\"<>?";
String str2 = str1.replaceAll(specialChars2, "").replace(" ", "+");
Whatever str1 is I want all the characters other than letters and numbers to be removed, and spaces to be replaced by a plus sign (+).
My problem is if I use specialChar1, it does not remove some characters like ;, ', ", and if I am use specialChar2 it gives me an error :
java.util.regex.PatternSyntaxException: Syntax error U_REGEX_MISSING_CLOSE_BRACKET near index 32:
How can this be to achieved?. I have searched but could not find a perfect solution.
This worked for me:
String result = str.replaceAll("[^\\dA-Za-z ]", "").replaceAll("\\s+", "+");
For this input string:
/-+!##$%^&())";:[]{}\ |wetyk 678dfgh
It yielded this result:
+wetyk+678dfgh
replaceAll expects a regex:
public static final String specialChars2 = "[`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?]";
The problem with your first regex, is that "\W\S" means find a sequence of two characters, the first of which is not a letter or a number followed by a character which is not whitespace.
What you mean is "[^\w\s]". Which means: find a single character which is neither a letter nor a number nor whitespace. (we can't use "[\W\S]" as this means find a character which is not a letter or a number OR is not whitespace -- which is essentially all printable character).
The second regex is a problem because you are trying to use reserved characters without escaping them. You can enclose them in [] where most characters (not all) do not have special meanings, but the whole thing would look very messy and you have to check that you haven't missed out any punctuation.
Example:
String sequence = "qwe 123 :#~ ";
String withoutSpecialChars = sequence.replaceAll("[^\\w\\s]", "");
String spacesAsPluses = withoutSpecialChars.replaceAll("\\s", "+");
System.out.println("without special chars: '"+withoutSpecialChars+ '\'');
System.out.println("spaces as pluses: '"+spacesAsPluses+'\'');
This outputs:
without special chars: 'qwe 123 '
spaces as pluses: 'qwe+123++'
If you want to group multiple spaces into one + then use "\s+" as your regex instead (remember to escape the slash).
I had a similar problem to solve and I used following method:
text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
Code with time bench marking
public static String cleanPunctuations(String text) {
return text.replaceAll("\\p{Punct}+", "").replaceAll("\\s+", "+");
}
public static void test(String in){
long t1 = System.currentTimeMillis();
String out = cleanPunctuations(in);
long t2 = System.currentTimeMillis();
System.out.println("In=" + in + "\nOut="+ out + "\nTime=" + (t2 - t1)+ "ms");
}
public static void main(String[] args) {
String s1 = "My text with 212354 digits spaces and \n newline \t tab " +
"[`~!##$%^&*()_+[\\\\]\\\\\\\\;\\',./{}|:\\\"<>?] special chars";
test(s1);
String s2 = "\"Sample Text=\" with - minimal \t punctuation's";
test(s2);
}
Sample Output
In=My text with 212354 digits spaces and
newline tab [`~!##$%^&*()_+[\\]\\\\;\',./{}|:\"<>?] special chars
Out=My+text+with+212354+digits+spaces+and+newline+tab+special+chars
Time=4ms
In="Sample Text=" with - minimal punctuation's
Out=Sample+Text+with+minimal+punctuations
Time=0ms
you can use a regex like this:
[<#![CDATA[¢<(+|!$*);¬/¦,%_>?:#="~{#}\]]]#>]`
remove "#" at first and at end from expression
regards
#npinti
using "\w" is the same as "\dA-Za-z"
This worked for me:
String result = str.replaceAll("[^\\w ]", "").replaceAll("\\s+", "+");

How to remove newlines from beginning and end of a string?

I have a string that contains some text followed by a blank line. What's the best way to keep the part with text, but remove the whitespace newline from the end?
Use String.trim() method to get rid of whitespaces (spaces, new lines etc.) from the beginning and end of the string.
String trimmedString = myString.trim();
String.replaceAll("[\n\r]", "");
This Java code does exactly what is asked in the title of the question, that is "remove newlines from beginning and end of a string-java":
String.replaceAll("^[\n\r]", "").replaceAll("[\n\r]$", "")
Remove newlines only from the end of the line:
String.replaceAll("[\n\r]$", "")
Remove newlines only from the beginning of the line:
String.replaceAll("^[\n\r]", "")
tl;dr
String cleanString = dirtyString.strip() ; // Call new `String::string` method.
String::strip…
The old String::trim method has a strange definition of whitespace.
As discussed here, Java 11 adds new strip… methods to the String class. These use a more Unicode-savvy definition of whitespace. See the rules of this definition in the class JavaDoc for Character::isWhitespace.
Example code.
String input = " some Thing ";
System.out.println("before->>"+input+"<<-");
input = input.strip();
System.out.println("after->>"+input+"<<-");
Or you can strip just the leading or just the trailing whitespace.
You do not mention exactly what code point(s) make up your newlines. I imagine your newline is likely included in this list of code points targeted by strip:
It is a Unicode space character (SPACE_SEPARATOR, LINE_SEPARATOR, or PARAGRAPH_SEPARATOR) but is not also a non-breaking space ('\u00A0', '\u2007', '\u202F').
It is '\t', U+0009 HORIZONTAL TABULATION.
It is '\n', U+000A LINE FEED.
It is '\u000B', U+000B VERTICAL TABULATION.
It is '\f', U+000C FORM FEED.
It is '\r', U+000D CARRIAGE RETURN.
It is '\u001C', U+001C FILE SEPARATOR.
It is '\u001D', U+001D GROUP SEPARATOR.
It is '\u001E', U+001E RECORD SEPARATOR.
It is '\u001F', U+0
If your string is potentially null, consider using StringUtils.trim() - the null-safe version of String.trim().
If you only want to remove line breaks (not spaces, tabs) at the beginning and end of a String (not inbetween), then you can use this approach:
Use a regular expressions to remove carriage returns (\\r) and line feeds (\\n) from the beginning (^) and ending ($) of a string:
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "")
Complete Example:
public class RemoveLineBreaks {
public static void main(String[] args) {
var s = "\nHello world\nHello everyone\n";
System.out.println("before: >"+s+"<");
s = s.replaceAll("(^[\\r\\n]+|[\\r\\n]+$)", "");
System.out.println("after: >"+s+"<");
}
}
It outputs:
before: >
Hello world
Hello everyone
<
after: >Hello world
Hello everyone<
I'm going to add an answer to this as well because, while I had the same question, the provided answer did not suffice. Given some thought, I realized that this can be done very easily with a regular expression.
To remove newlines from the beginning:
// Trim left
String[] a = "\n\nfrom the beginning\n\n".split("^\\n+", 2);
System.out.println("-" + (a.length > 1 ? a[1] : a[0]) + "-");
and end of a string:
// Trim right
String z = "\n\nfrom the end\n\n";
System.out.println("-" + z.split("\\n+$", 2)[0] + "-");
I'm certain that this is not the most performance efficient way of trimming a string. But it does appear to be the cleanest and simplest way to inline such an operation.
Note that the same method can be done to trim any variation and combination of characters from either end as it's a simple regex.
Try this
function replaceNewLine(str) {
return str.replace(/[\n\r]/g, "");
}
String trimStartEnd = "\n TestString1 linebreak1\nlinebreak2\nlinebreak3\n TestString2 \n";
System.out.println("Original String : [" + trimStartEnd + "]");
System.out.println("-----------------------------");
System.out.println("Result String : [" + trimStartEnd.replaceAll("^(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])|(\\r\\n|[\\n\\x0B\\x0C\\r\\u0085\\u2028\\u2029])$", "") + "]");
Start of a string = ^ ,
End of a string = $ ,
regex combination = | ,
Linebreak = \r\n|[\n\x0B\x0C\r\u0085\u2028\u2029]
Another elegant solution.
String myString = "\nLogbasex\n";
myString = org.apache.commons.lang3.StringUtils.strip(myString, "\n");
For anyone else looking for answer to the question when dealing with different linebreaks:
string.replaceAll("(\n|\r|\r\n)$", ""); // Java 7
string.replaceAll("\\R$", ""); // Java 8
This should remove exactly the last line break and preserve all other whitespace from string and work with Unix (\n), Windows (\r\n) and old Mac (\r) line breaks: https://stackoverflow.com/a/20056634, https://stackoverflow.com/a/49791415. "\\R" is matcher introduced in Java 8 in Pattern class: https://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html
This passes these tests:
// Windows:
value = "\r\n test \r\n value \r\n";
assertEquals("\r\n test \r\n value ", value.replaceAll("\\R$", ""));
// Unix:
value = "\n test \n value \n";
assertEquals("\n test \n value ", value.replaceAll("\\R$", ""));
// Old Mac:
value = "\r test \r value \r";
assertEquals("\r test \r value ", value.replaceAll("\\R$", ""));
String text = readFileAsString("textfile.txt");
text = text.replace("\n", "").replace("\r", "");

How to replace the last word in a string

Does anyone knows how to replace the last word in a String.
Currently I am doing:
someStr = someStr.replace(someStr.substring(someStr.lastIndexOf(" ") + 1), "New Word");
The above code replaces every single occurance of the word in the string.
Thanks.
You could create a new string "from scratch" like this:
someStr = someStr.substring(0, someStr.lastIndexOf(" ")) + " New Word";
Another option (if you really want to use "replace" :) is to do
someStr = someStr.replaceAll(" \\S*$", " New Word");
replaceAll uses regular expressions and \S*$ means a space, followed by some non-space characters, followed by end of string. (That is, replace the characters after the last space.)
You're not far from the solution. Just keep the original string until the last index of " ", and append the new word to this substring. No need for replace here.
What your code is doing is replacing the substring by "New word".
Instead you need to substring first, and then do a replace on that string.
Here's how I would do it
someStr = someStr.substring(0, someStr.lastIndexOf(" ") + 1) + "New word"
try:
someStr = someStr.substring( someStr.lastIndexOf(" ") ) + " " + new_word;
use this: someStr.substring(0, someStr.lastIndexOf(" ")) + "New Word".
You can also use regular expression, e.g. someStr.repalaceFirst("\s+\S+$", " " + "New Word")
Try this regex (^.+)b(.+$)
Example (Replace the last b character)
System.out.println("1abchhhabcjjjabc".replaceFirst("(^.+)b(.+$)", "$1$2"));

Categories