I have input String '~|~' as the delimiter.
For example:
String s = "1~|~Vijay~|~25~|~Pune";
when I am splitting it with '~\\|~' in Java it is working fine.
String sa[] = s.split("~\\|~", -1);
for(String str : sa) {
System.out.println(str);
}
I am getting the below output.
1
Vijay
25
Pune
When the same program I am running by passing a command-line argument('~\\|~'). It is not properly parsing the string and giving it below output.
1
|
Vijay
|
25
|
Pune
Is anyone else facing the same issue? please comment on this issue.
You only need a single backslash when running it from the command line. The reason you need two when making the regular expression in Java is that backslash is used to escape the next character in a string literal or start an escape sequence so one backslash is needed to escape the next one in order for it to be interpreted literally.
~\|~
Please, do a System.out.println("[" + args[i] + "]"); to see what java is receiving from the command line, as the \ character is special for the shell and aso are the | and ~ chars (the last one expands to your home directory, which could be a problem)
You need to pass:
java foo_bar '~\|~'
(Java still needs a single \ this time to escape the vertical bar, as you are not writing a string literal for the java compiler but a simple string representing the internal representation of the above string literal, the \ character doesn't need to be escaped, as it is inside single quotes so it is passed directly to the java program) Any quoting (single or double quotes) suffices to avoid ~ expansion.
If you are passing
java foo_bar '~\\|~'
the shell will not assume the \ as a escaping character and will pass the equivalent to this String literal:
String sa[] = s.split("~\\\\|~", -1); /* to escapes mean a literal escape */
(see that now the vertical bar doesn't have its special significance)
...which is far different (you meant this time: split on one ~\ sequence, this is, a ~ followed by a backslash, or just a single ~ character, and as there are no ~s followed by a backslash, the second option was used. You should get:
1
|
Vijay
|
25
|
Pune
Which is the output you post.
You don't have to escape:
public static void main(String[] args) {
Pattern p = Pattern.compile(args[0], Pattern.LITERAL);
final String[] result = p.split("1~|~Vijay~|~25~|~Pune");
Arrays.stream(result).forEach(System.out::println);
}
Running:
javac Main.java
java Main "~|~"
Output:
1
Vijay
25
Pune
Where args[0] is equal to ~|~ (no escaping). The trick is that pattern flag, Pattern.LITERAL, which treats every character, including |, as normal character, ignoring their meta meaning.
Related
I have a path in Windows:
assert f.toString() == 'C:\\path\\to\\some\\dir'
I need to convert the backslashes \ to forward slashes /. Using Java syntax, I would write:
assert f.toString().replaceAll('\\\\', '/') == 'C:/path/to/some/dir'
But I am studying Groovy, so I thought I would write a literal regular expression:
assert f.toString().replaceAll(/\\/, '/') == 'C:/path/to/some/dir'
This throws a compilation error:
unexpected token: ) == at line: 4, column: 42
I started looking on the internet, and found several comments suggesting that this particular regex literal would not work, instead you would have to use a workaround like /\\+/. But this obviously changes the semantics of the regex.
I cannot really understand why /\\/ does not work. Maybe somebody does?
The \ at the end of the slashy string ruins it.
The main point is that you need to separate the \ from the / trailing slashy string delimiter.
It can be done in several ways:
println(f.replaceAll('\\\\', '/')) // Using a single-quoted string literal with 4 backslashes, Java style
println(f.replaceAll(/[\\]/, '/')) // Wrapping the backslash with character class
println(f.replaceAll(/\\{1}/, '/')) // Using a {1} limiting quantifier
println(f.replaceAll(/\\(?:)/, '/')) // Using an empty group after it
See the Groovy demo.
However, you may use dollar slashy strings to use the backslash at the end of the string:
f.replaceAll($/\\/$, '/')
See the demo and check this thread:
Slashy strings: backslash escapes end of line chars and slash, $ escapes interpolated variables/closures, can't have backslash as last character, empty string not allowed. Examples: def a_backslash_b = /a\b/; def a_slash_b = /a\/b/;
Dollar slashy strings: backslash escapes only EOL, $ escapes interpolated variables/closures and itself if required and slash if required, use $$ to have $ as last character or to have a $ before an identifier or curly brace or slash, use $/ to have a slash before a $, empty string not allowed. Examples: def a_backslash_b = $/a\b/$; def a_slash_b = $/a/b/$; def a_dollar_b = $/a$$b/$;
I am struggling to understand word boundary \b in regex.
I read that there are three conditions for \b.
Before the first character in the string, if the first character is a
word character.
After the last character in the string, if the last character is a
word character.
Between two characters in the string, where one is a word character
and the other is not a word character.
I am trying to find the start index of the previous match using the java method start()
import java.util.regex.*;
class Quetico{
public static void main(String[] args){
Pattern p = Pattern.compile(args[0]);
Matcher m = p.matcher(args[[1]]);
System.out.print("match positions: ");
while(m.find()){
System.out.print(m.start()+" ");
}
System.out.println();
}
}
% java Quetico "\b" "^23 *$76 bc"
//string: ^23 *$76 bc pattern:\b
//index : 01234567890
produces: 1 3 5 6 7 9
I'm having trouble understanding why is produces this result. Because I'm struggling to see the pattern. Ive tried looking at the inverse, \B which produces 0 2 4 8 however this doesn't make it any clearer for me. If you can help clarify this for me it would be appreciated.
The issue isn't Java here, it's Linux/Unix. When you put text between double quote marks on the command line, most of the special shell characters such as *, ?, etc. are no longer special--except for variable interpolation. (And some other things, like ! depending on which shell flavor you're using.) Thus, if you say
% command "this $variable is interesting"
if you've set variable to value, your command will be called with one argument, this value is interesting. In your case, Linux will treat $7 as a shell script parameter, even though you're not in a shell script; since this isn't set to anything, it's replaced with an empty string, and the result is the same as if you had run
% java Quetico "\b" "^23 *6 bc"
which gives me 1 3 5 6 7 9 if I use that string literal in a Java program (instead of on the command line).
To prevent $ from being interpreted by the shell, you need to use single quote marks:
% java Quetico "\b" '^23 *$76 bc'
(My programming question may seem somewhat devious, but I see no other solution.)
A text is written in the editor of Eclipse. By activating a self-made Table view plugin for Eclipse, the text quality is checked automatically by an activated Python script (not editable by me) that receives the editor text. The editor text is stripped from space characters (\n, \t) except the normal space (' '), because otherwise the sentences cannot be QA checked. When the script is done, it returns the incorrect sentences to the table.
It is possible to click on the sentences in the table, and the plugin will search (row-per-row) in the active editor for the clicked sentence. This works for single-line sentences. However, the multiline sentences cannot be found in the active editor, because all the \n and \t are missing in the compiled sentence.
To overcome this problem, I changed the script so it takes the complete editor text as one string. I tried the following:
String newSentence = tableSentence.replaceAll(" ", "\\s+")
Pattern p = Pattern.compile(newSentence)
Matcher contentMatcher = p.matcher(editorContent) // editorContent is a string
if (contentMatcher.find()) {
// Get index offset of string and length of string
}
By changing all spaces into \s+, I hoped to get the match. However, this does not work because it will look like the following:
editorContent: The\nright\n\ttasks.
tableSentence: The right tasks.
NewSentence: Thes+rights+tasks. // After the 'replaceAll' action
Should be: The\s+right\s+tasks.
So, my question is: how can I adjust the input for the compiler?
I am inexperienced when it comes to Java, so I do not see how to change this.. And I unfortunately cannot change the Python script to also return the full sentences...
Add a third and fourth backslash to your regex, so it looks like this: \\\\s+.
Java doesn't have raw (or verbatim) strings, so you have to escape a backslash, so in regex engine it will treat it as a double backslash. This should solve the problem of adding a s+ instead of your spaces.
When you type a regex in code it goes like this:
\\\\s+
| # Compile time
V
\\s+
| # regex parsing
V
\s+ # actual regex used
Updated my answer according to #nhahtdh comment (fixed number of backslashes)
You need to use "\\\\s+" instead of "\\s+", since \ is the escape character in the regex replacement string syntax. To specify a literal \ in the replacement text, you need to write \\ in the replacement string, and that doubles up to "\\\\" since \ requires escaping in Java string literal.
Note that \ just happens to be used as escape character in regex replacement string syntax in Java. Other languages, such as JavaScript, uses $ to escape $, so \ doesn't need to be escape in JavaScript's regex replacement string.
If you are replacing a match with literal text, you can use Matcher.quoteReplacement to avoid dealing with the escaping in regex replacement string:
String newSentence = tableSentence.replaceAll(" ", Matcher.quoteReplacement("\\s+"));
In this case, since you are searching for string and replace it with another string, you can use String.replace instead, which does normal string replacement:
String newSentence = tableSentence.replace(" ", "\\s+");
Consider the string,
this\is\\a\new\\string
The output should be:
this\is\a\new\string
So basically one or more \ character should be replaced with just one \.
I tried the following:
str = str.replace("[\\]+","\")
but it was no use. The reason I used two \ in [\\]+ was because internally \ is stored as \\. I know this might be a basic regex question, but I am able to replace one or more normal alphabets but not \ character. Any help is really appreciated.
str.replace("[\\]+", "\") has few problems,
replace doesn't use regex (replaceAll does) so "[\\]" will represent [\] literal, not \ nor \\ (depending on what you think it would represent)
even if it did accept regex "[\\]" would not be correct regex because \\] would escape ] so you would end up with unclosed character class [..
it will not compile (your replacement String is not closed)
It will not compile because \ is start of escape sequence \X where X needs to be either
changed from being String special character to simple literal, like in your case \" will escape " to be literal (so you could print it for instance) instead of being start/end of String,
changed from being normal character to be special one like in case of line separators \n \r or tabulations \t.
Now we know that \ is special and is used to escape other character. So what do you think we need to do to make \ represent literal (when we want to print \). If you guessed that it needs to be escaped with another \ then you are right. To create \ literal we need to write it in String as "\\".
Since you know how to create String containing \ literal (escaped \) you can start thinking about how to create your replacements.
Regex which represents one or more \ can look like
\\+
But that is its native form, and we need to create it using String. I used \\ here because in regex \ is also special character (for instance \d represents digits , not \ literal followed by d) so it also needs to be escaped first to represent \ literal. Just like in String we can escape it with another \.
So String representing this regex will need to be written as
"\\\\+" (we escaped \ twice, once in regex \\+ and once in string)
You can use it as first argument of replaceAll (because replace as mentioned earlier doesn't accept regex).
Now last problem you will face is second argument of replaceAll method. If you write
replaceAll("\\\\+", "\\")
and it will find match for regex you will see exception
java.lang.IllegalArgumentException: character to be escaped is missing
It is because in replacement part (second argument in replaceAll method) we can also use special formula $x which represents current match from group with index x. So to be able to escape $ into literal we need some escape mechanism, and again \ was used here for that purpose. So \ is also special in replacement part of our method.
So again to create \ literal we need to escape it with another \, and string literal representing expression \\ is "\\\\".
But lets get back to earlier exception: message "character to be escaped is missing" refers to X part of \X formula (X is character we want to be escaped). Problem is that earlier your replacement "\\" represented only \ part, so this method expected either $ to create \$, or \\ to create \ literal. So valid replacements would be "\\$ or "\\\\".
To make things work you need to write your replacing method as
str = str.replaceAll("\\\\+", "\\\\")
You can use:
str = str.replace("\\\\", "\\");
Remember that String#replace doesn't take a regex.
try this
str = str.replaceAll("\\\\+", "\\\\");
When writing regular expressions, you typically need to double-escape backslashes. So you would do this:
str = str.replaceAll("\\\\+", "\\\\");
I'd use Matcher.quoteReplacement() and String.replaceAll() here.
Like this:
String s;
[...]
s = s.replaceAll("\\\\+", Matcher.quoteReplacement("\\"));
In Linux and other OS, file can contain characters like (,),[,],<space>, etc. in their names.
Whenever I try to use any of these files in my bash command like cat, ls, etc. I am required to escape them like below :
filename abc(10-oct).txt
cat abc(10-oct).txt wont work.
If I precede "(" and ")" characters with "\" character like
cat abc\(10-oct\).txt
This works
I am trying to automate some of Linux shell commands via Java program.And I am not sure of what all characters I must take care of and escape them.
If someone may point to a resource where I can get an entire list of characters, it would be a great help.
Many Thanks
Quoting from Shell Command Language:
The following characters must be quoted if they are to represent
themselves:
| & ; < > ( ) $ ` \ " ' <space> <tab> <newline>
and the following may need to be quoted under certain circumstances.
That is, these characters may be special depending on conditions
described elsewhere in this specification:
* ? [ # ~ = %
The various quoting mechanisms are the escape character, single-quotes
and double-quotes.
It also says:
Enclosing characters in single-quotes (' ') preserves the literal
value of each character within the single-quotes. A single-quote
cannot occur within single-quotes.
And:
Enclosing characters in double-quotes (" ") preserves the literal
value of all characters within the double-quotes, with the exception
of the characters dollar-sign, backquote and backslash...
You can use Single Quote 'filename' which will escape everything needs to be escaped in shell mode