Find and replace whole line based on regex java - java

I have this string
Chest pain\tab \tab 72%\tab 0%\tab 67%
}d \ql \li0\ri0\nowidctlpar\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\tx10800\tx11520\tx12240\tx12960\faauto\rin0\lin0\itap0 {\insrsid14762702
}d \ql \li0\ri0\nowidctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\faauto\rin0\lin0\itap0 {\b\f1\fs24\ul\insrsid14762702 Waveform}{\insrsid14762702
}{\insrsid14762702 {\*\shppict{\pict{\*\picprop\shplid1025{\sp{\sn shapeType}{\sv 75}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn fLayoutInCell}{\sv 1}}}
\
I want to get rid of all lines with }d \ql in them
I have tried
String v= u.replace("}d \\ql(\\.*)","");
but it doesn't detect the line. Having tested it out the culprit must be the .* part but I don't know how to put it in the string.replace

replace doesn't use regex syntax, replaceAll does. This means that \\.* simply replace text which represents \ . and *.
So your first solution could look like (notice that to create \ literal in regex you need to escaped it twice: once in regex \\ and once in string literal "\\\\")
String v = u.replaceAll("\\}d \\\\ql.*","");
But possible problem here is that we don't require \} to be placed at start of string. Also we are skipping leading space in that line which exist right before \}.
To solve it we can add ^\s* at start of your regex and make ^ represent start of line (we can do it with MULTILINE flag - we can use (?m) for that).
So now our solution could look like:
String v= u.replaceAll("(?m)^\\s*\\}d \\\\ql.*","");
But there is another problem. . can't match line separators so .* will not include them in match which will prevent us from removing them.
So we should include them in our match explicitly (we should also make them optional - we can use ? quantifier for that - in case line you want to match will be last one, which means it will not have line separator after it). Since Java 8 we can do it with \R which can match few line separators (including paragraph separators), or if you want to limit yourself only to \r \n (or can't use Java 8) you can use something like (\r?\n|\r).
So our final solution can look like:
in Java 8
String v = u.replaceAll("(?m)^\\s*\\}d \\\\ql.*\\R?","");
pre Java 8
String v = u.replaceAll("(?m)^\\s*\\}d \\\\ql.*(\r?\n|\r)?","");

Related

How to replace a space exactly with "\\\\s+" [duplicate]

I'm trying to convert the String \something\ into the String \\something\\ using replaceAll, but I keep getting all kinds of errors. I thought this was the solution:
theString.replaceAll("\\", "\\\\");
But this gives the below exception:
java.util.regex.PatternSyntaxException: Unexpected internal error near index 1
The String#replaceAll() interprets the argument as a regular expression. The \ is an escape character in both String and regex. You need to double-escape it for regex:
string.replaceAll("\\\\", "\\\\\\\\");
But you don't necessarily need regex for this, simply because you want an exact character-by-character replacement and you don't need patterns here. So String#replace() should suffice:
string.replace("\\", "\\\\");
Update: as per the comments, you appear to want to use the string in JavaScript context. You'd perhaps better use StringEscapeUtils#escapeEcmaScript() instead to cover more characters.
TLDR: use theString = theString.replace("\\", "\\\\"); instead.
Problem
replaceAll(target, replacement) uses regular expression (regex) syntax for target and partially for replacement.
Problem is that \ is special character in regex (it can be used like \d to represents digit) and in String literal (it can be used like "\n" to represent line separator or \" to escape double quote symbol which normally would represent end of string literal).
In both these cases to create \ symbol we can escape it (make it literal instead of special character) by placing additional \ before it (like we escape " in string literals via \").
So to target regex representing \ symbol will need to hold \\, and string literal representing such text will need to look like "\\\\".
So we escaped \ twice:
once in regex \\
once in String literal "\\\\" (each \ is represented as "\\").
In case of replacement \ is also special there. It allows us to escape other special character $ which via $x notation, allows us to use portion of data matched by regex and held by capturing group indexed as x, like "012".replaceAll("(\\d)", "$1$1") will match each digit, place it in capturing group 1 and $1$1 will replace it with its two copies (it will duplicate it) resulting in "001122".
So again, to let replacement represent \ literal we need to escape it with additional \ which means that:
replacement must hold two backslash characters \\
and String literal which represents \\ looks like "\\\\"
BUT since we want replacement to hold two backslashes we will need "\\\\\\\\" (each \ represented by one "\\\\").
So version with replaceAll can look like
replaceAll("\\\\", "\\\\\\\\");
Easier way with replaceAll
To make out life easier Java provides tools to automatically escape text into target and replacement parts. So now we can focus only on strings, and forget about regex syntax:
replaceAll(Pattern.quote(target), Matcher.quoteReplacement(replacement))
which in our case can look like
replaceAll(Pattern.quote("\\"), Matcher.quoteReplacement("\\\\"))
Even better: use replace
If we don't really need regex syntax support lets not involve replaceAll at all. Instead lets use replace. Both methods will replace all targets, but replace doesn't involve regex syntax. So you could simply write
theString = theString.replace("\\", "\\\\");
To avoid this sort of trouble, you can use replace (which takes a plain string) instead of replaceAll (which takes a regular expression). You will still need to escape backslashes, but not in the wild ways required with regular expressions.
You'll need to escape the (escaped) backslash in the first argument as it is a regular expression. Replacement (2nd argument - see Matcher#replaceAll(String)) also has it's special meaning of backslashes, so you'll have to replace those to:
theString.replaceAll("\\\\", "\\\\\\\\");
Yes... by the time the regex compiler sees the pattern you've given it, it sees only a single backslash (since Java's lexer has turned the double backwhack into a single one). You need to replace "\\\\" with "\\\\", believe it or not! Java really needs a good raw string syntax.

Regex to allow only Numbers, alphabets, spaces and hyphens - Java

Need to allow user to enter only Numbers or alphabets or spaces or hyphens OR combination of any of the above.
and i tried the following
String regex = "/^[0-9A-Za-z\s\-]+$/";
sampleString.matches(regex);
but it is not working properly. would somebody help me to fix please.
Issue : your regex is trying to match / symbol at the beginning and at the end
In java there is no need of / before and after regex so use, java!=javascript
"^[0-9A-Za-z\\s-]+$"
^[0-9A-Za-z\\s-]+$ : ^ beginning of match
[0-9A-Za-z\\s-]+ : one or more alphabets, numbers , spaces and -
$ : end of match
You are close but need to make two changes.
The first is to double-escape (i.e., use \\ instead of \). This is due to the weirdness of Java (see the section "Backslashes, escapes, and quoting" in Javadoc for the Pattern class). The second thing is to drop the explicit reference to the start and end of the string. That's going to be implied when using matches(). So the correct Java code is
String regex = "[0-9A-Za-z\\s\\-]+";
sampleString.matches(regex);
While that will work, you can also replace the "0-9" reference with \d and drop the escaping of the "-". That gives you
String regex = "[\\dA-Za-z\\s-]+";

Replace word using java regex but not quotes

I want to replace a word in a sentence using java regex replace.
test string is a_b a__b a_bced adbe a_bc_d 'abcd' ''abcd''
if i want to replace all the words which starts with a & ends with d.
i'm using String.replaceAll("(?i)\\ba[a-zA-Z0-9_.]*d\\b","temp").
its replacing as a_b a__b temp adbe a_bc_d 'temp' ''temp''
What should be my regex if i don't want to consider the string in quotes.?
I used String.replaceAll("[^'](?i)\\ba[a-zA-Z0-9_.]*d\\b[^']","temp")
Its replacing as a_b a__btempadbe temp'abcd' ''abcd''.
Its removing one spaces of that word.
Is there any way to replace only that string not inside the quotes?
PS: there is a workaround for this String.replaceAll("[^'](?i)\\ba[a-zA-Z0-9_.]*d\\b[^']"," temp "). But it fails in some cases.
What should be my regex if i want to replace a word in a sentence & i should not consider string in side quotes.?
Thanks in Advance...!!!
You can use lookaround assertions:
string = string.replaceAll("(?i)(?<!')\\ba[a-zA-Z0-9_.]*d\\b(?!')", "temp");
RegEx Demo
Read more about lookarounds
Testing if there's or not a quote before and after the target is a wrong approach because you can't know if the described quote is an opening quote or a closing quote. (try to add a quote at the start of your test string and test a naive pattern, you will see: 'inside'a_outside_d'inside').
The only way to know if something is inside or outside quotes is to check the string from the beginning (or from the end, but it's less handy and eventually error prone if quotes aren't balanced). To do that, you must describe all possible substrings before the target, example:
\G([^a']*+(?:'[^']*'[^a']*|\Ba+[^a']*|a(?!\w*d\b)[^a']*)*+)\ba\w*d\b
details:
\G # matches the start of the string or the position after the previous match
(
[^a']*+ # all that isn't an "a" or a quote
(?:
'[^']*' [^a']* # content between quotes
|
\Ba+ [^a']* # "a" not at the start of a word
|
a(?!\w*d\b) [^a']* # "a" at the start of a word that doesn't end with "d"
)*+
) # all that can be before the target in a capture group
\ba\w*d\b # the target
Don't forget to escape backslashes in the java string: \ => \\.
To perform the replacement, you need to refer to the capture group 1:
$1temp
Note: to handle escaped quotes between quotes, change '[^']*' to: '[^\\']*+(?s:\\.[^\\']*)*+'.
Demo: click the Java button.

Multiline RegEx in Java

(My programming question may seem somewhat devious, but I see no other solution.)
A text is written in the editor of Eclipse. By activating a self-made Table view plugin for Eclipse, the text quality is checked automatically by an activated Python script (not editable by me) that receives the editor text. The editor text is stripped from space characters (\n, \t) except the normal space (' '), because otherwise the sentences cannot be QA checked. When the script is done, it returns the incorrect sentences to the table.
It is possible to click on the sentences in the table, and the plugin will search (row-per-row) in the active editor for the clicked sentence. This works for single-line sentences. However, the multiline sentences cannot be found in the active editor, because all the \n and \t are missing in the compiled sentence.
To overcome this problem, I changed the script so it takes the complete editor text as one string. I tried the following:
String newSentence = tableSentence.replaceAll(" ", "\\s+")
Pattern p = Pattern.compile(newSentence)
Matcher contentMatcher = p.matcher(editorContent) // editorContent is a string
if (contentMatcher.find()) {
// Get index offset of string and length of string
}
By changing all spaces into \s+, I hoped to get the match. However, this does not work because it will look like the following:
editorContent: The\nright\n\ttasks.
tableSentence: The right tasks.
NewSentence: Thes+rights+tasks. // After the 'replaceAll' action
Should be: The\s+right\s+tasks.
So, my question is: how can I adjust the input for the compiler?
I am inexperienced when it comes to Java, so I do not see how to change this.. And I unfortunately cannot change the Python script to also return the full sentences...
Add a third and fourth backslash to your regex, so it looks like this: \\\\s+.
Java doesn't have raw (or verbatim) strings, so you have to escape a backslash, so in regex engine it will treat it as a double backslash. This should solve the problem of adding a s+ instead of your spaces.
When you type a regex in code it goes like this:
\\\\s+
| # Compile time
V
\\s+
| # regex parsing
V
\s+ # actual regex used
Updated my answer according to #nhahtdh comment (fixed number of backslashes)
You need to use "\\\\s+" instead of "\\s+", since \ is the escape character in the regex replacement string syntax. To specify a literal \ in the replacement text, you need to write \\ in the replacement string, and that doubles up to "\\\\" since \ requires escaping in Java string literal.
Note that \ just happens to be used as escape character in regex replacement string syntax in Java. Other languages, such as JavaScript, uses $ to escape $, so \ doesn't need to be escape in JavaScript's regex replacement string.
If you are replacing a match with literal text, you can use Matcher.quoteReplacement to avoid dealing with the escaping in regex replacement string:
String newSentence = tableSentence.replaceAll(" ", Matcher.quoteReplacement("\\s+"));
In this case, since you are searching for string and replace it with another string, you can use String.replace instead, which does normal string replacement:
String newSentence = tableSentence.replace(" ", "\\s+");

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).
[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.
In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]
If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

Categories