To remove quotation marks in Java,I understand I can use
replaceAll("\"", "");
Ex: "Hello World" becomes Hello World.
However, it only removes this type of quotation marks "". Is there a way to remove quotes like this “Hello World” ?
If you simply want to remove those 3 kinds of double-quotes, irrespective of the context:
replaceAll("[\"“”]", "");
If there are other kinds of quote characters that you want to remove, just add them before the ].
These pages list some of the other quote characters that you might encounter:
https://unicode-table.com/en/sets/quotation-marks/
https://en.wikipedia.org/wiki/Quotation_mark
And also see:
Is there a regex to grab all quotation marks?
which talks about the difficulty in creating a regex to match all of them in a future-proof fashion.
Note that since we are including some "funky" characters (non-ASCII) in the source code (above), it is important that the Java compiler is aware of the character encoding that the source code uses. We could avoid that by using Unicode escapes instead. For example:
replaceAll("[\"\u201c\u201d]", "");
You may try a regex replacement here, e.g.
String input = "“Hello World”";
System.out.println(input.replaceAll("“(.*?)”", "$1")); // prints Hello World
This question already has answers here:
How can I make Java print quotes, like "Hello"?
(11 answers)
What is the backslash character (\\)?
(6 answers)
Closed 2 years ago.
basically i want the output to be Path = "C:\Users\Public" and it seems to me that System.out.println("Path = "C:\Users\Public""); should work, but it doesn't so the question is why can't java just print the phrase as a combination of characters?
btw. this is my second time "programing" so please in simple terms if possible.
You should escape the special character such as " and \
System.out.println("Path = \"C:\\Users\\Public\"");
When you use backslash (\), java assumes that you are going to use an escape character. If you want to print that line you should probably use the method below and it should work. And you are using a string inside another one so you should use different types of inverted commas for both to tell the compiler that yeah there is a string inside another one. Otherwise the compiler will assume the middle inverted comma to be the closing one.
You can escape each and every escape character. If you notice I have used a backslash before inner inverted commas (" ") and every other backslash (\). Using a backslash before any escape character escapes it. So this should work.
System.out.println("Path = \"C:\\Users\\Public\"");
I hope now you can print your required output.
This question already has answers here:
My regex is matching too much. How do I make it stop? [duplicate]
(5 answers)
Closed 4 years ago.
I want to replace &sp; in the string below with Z.
Input text : ABCD&sp;EF&p;GHIJ&bsp;KL
Output text : ABCDZEFZGHIZKL
Can anyone tell me how to replace the every instance of &\D+; using java regular expression?
I am using /(&\D+;)?/ but it doesn't work.
Use String#replaceAll.
You also should use the ? modificator to +:
String str = "ABCD&sp;EF&p;GHIJ&bsp;KL";
String regex = "&\\D+?;";
System.out.println (str.replaceAll(regex,"Z"));
This should work
Match the initial &, then all characters that are not the tailing ;, then that tailing ; like so: &[^;]+; If not matching numbers (as suggested by your example with \D) is a requirement, add the numbers to the negated character set: [^;0-9] To make it replace all occurrences, add the global flag g. The site regexr.com is a handy tool to create regexes.
Edit: Sorry, I initially read your question wrong.
This question already has answers here:
illegal string body character after dollar sign
(5 answers)
Closed 8 years ago.
I am using spock to test a java app.It seems "$" is a special character in groovy.any java string that is separated by "$" can't be separated in groovy properly.Any workaround for this problem?
update
The "split" happened in java code that I can't edit. It turns out that java code has a problem same as:Why can't I split a string with the dollar sign?
I don't think $ is a special character in Groovy strings. Edit: Yes, it is, if you use GStrings! But the rest may still be useful: But it's a special character in the string you give to String#split, because that string is interpreted as a regular expression, and in a regular expression, $ is "end of input" (or end of line, depending on flags).
If you're using String#split, to make it split on a literal $, you have to escape it with a backslash. To make the regex engine see a backslash, you have to escape the backslash in a string literal with another backslash.
Example:
'testing$one$two$three'.split('\\$').each {
println it
}
Output:
testing
one
two
three
Better yet, as suggested by Dónal, use tokenize:
Example:
'testing$one$two$three'.tokenize('$').each {
println it
}
(Same output)
I have a CSV string like apple404, orange pie, wind\,cool, sun\\mooon, earth, in Java. To be precise each value of the csv string could be any thing provided commas and backslash are escaped using a back slash.
I need a regular expression to find the first five values. After some goggling I came up with the following. But it wont allow escaped commas within the values.
Pattern pattern = Pattern.compile("([^,]+,){0,5}");
Matcher matcher = pattern.matcher("apple404, orange pie, wind\\,cool, sun\\\\mooon, earth,");
if (matcher.find()) {
System.out.println(matcher.group());
} else {
System.out.println("No match found.");
}
Does anybody know how to make it work for escaped commas within values?
Following negative look-behind based regex will work:
Pattern pattern = Pattern.compile("(?:.*?(?<!(?:(?<!\\\\)\\\\)),){0,5}");
However for full fledged CSV parsing better use a dedicated CSV parser like JavaCSV.
You can use String.split() here. By specifying the limit as 6 the first five elements (index 0 to 4) would always be the first five column values from your CSV string. If in case any extra column values are present they would all overflow to index 5.
The regex (?<!\\\\), makes sure the CSV string is only split at a , comma not preceded with a \.
String[] cols = "apple404, orange pie, wind\\,cool, sun\\\\mooon, earth, " +
"mars, venus, pluto".split("(?<!\\\\),", 6);
System.out.println(cols.length); // 6
System.out.println(Arrays.toString(cols));
// [apple404, orange pie, wind\,cool, sun\\mooon, earth, mars, venus, pluto]
System.out.println(cols[4]); // 5th = earth
System.out.println(cols[5]); // 6th discarded = mars, venus, pluto
This regular expression works well. It also properly recognizes not only backslash-escaped commas, but also backslash-escaped backslashes. Also, the matches it produces do not contain the commas.
/(?:\\\\|\\,|[^,])*/g
(I am using standard regular expression notation with the understanding that you would replace the delimiters with quote marks and double all backslashes when representing this regular expression within a Java string literal.)
example input
"apple404, orange pie, wind\,cool, sun\\,mooon, earth"
produces this output
"apple404"
" orange pie"
" wind\,cool"
" sun\\"
"mooon"
Note that the double backslash after "sun" is escaped and therefore does not escape the following comma.
The way this regular expression works is by atomizing the input into longest sequences first, beginning with double backslashes (treating them as one possible multi-byte character value alternative), followed by escaped commas (a second possible multi-byte character alternative), followed by any non-comma value. Any number of these atoms are matched, followed by a literal comma.
In order to obtain the first N fields, one may simply splice the array of matches from the previous answer or surround the main expression in additional parentheses, include an optional comma in order to match the contents between fields, anchor it to the beginning of the string to prevent the engine from returning further groups of N fields, and quantify it (with N = 5 here):
/^((?:\\\\|\\,|[^,])*,?){0,5}/g
Once again, I am using standard regular expression notation, but here I will also do the trivial exercise of quoting this as a Java string:
"^((?:\\\\\\\\|\\\\,|[^,])*,?){0,5}"
This is the only solution on this page so far which actually answers both parts of the precise requirements specified by the OP, "...commas and backslash are escaped using a back slash." For the input fi\,eld1\\,field2\\,field3\\,field4\\,field5\\,field6\\,, it properly matches only the first five fields fi\,eld1\\,field2\\,field3\\,field4\\,field5\\,.
Note: my first answer made the same assumption that is implicitly part of the OP's original code and example data, which required a comma to be following every field. The problem was that if input is exactly 5 fields or less, and the last field not followed by a comma (equivalently, by an empty field), then final field would not be matched. I did not like this, and so I updated both of my answers so that they do not require following commas.
The shortcoming with this answer is that it follows the OP's assumption that values between commas contain "anything" plus escaped commas or escaped backslashes (i.e., no distinction between strings in double quotes, etc., but only recognition of escaped commas and backslashes). My answer fulfills the criteria of that imaginary scenario. But in the real world, someone would expect to be able to use double quotes around a CSV field in order to include commas within a field without using backslashes.
So I echo the words of #anubhava and suggest that a "real" CSV parser should always be used when handling CSV data. Doing otherwise is just being a script kiddie and not in any way truly "handling" CSV data.