Replace word using java regex but not quotes

Replace word using java regex but not quotes - java

I want to replace a word in a sentence using java regex replace.
test string is a_b a__b a_bced adbe a_bc_d 'abcd' ''abcd''
if i want to replace all the words which starts with a & ends with d.
i'm using String.replaceAll("(?i)\\ba[a-zA-Z0-9_.]*d\\b","temp").
its replacing as a_b a__b temp adbe a_bc_d 'temp' ''temp''
What should be my regex if i don't want to consider the string in quotes.?
I used String.replaceAll("[^'](?i)\\ba[a-zA-Z0-9_.]*d\\b[^']","temp")
Its replacing as a_b a__btempadbe temp'abcd' ''abcd''.
Its removing one spaces of that word.
Is there any way to replace only that string not inside the quotes?
PS: there is a workaround for this String.replaceAll("[^'](?i)\\ba[a-zA-Z0-9_.]*d\\b[^']"," temp "). But it fails in some cases.
What should be my regex if i want to replace a word in a sentence & i should not consider string in side quotes.?
Thanks in Advance...!!!

You can use lookaround assertions:
string = string.replaceAll("(?i)(?<!')\\ba[a-zA-Z0-9_.]*d\\b(?!')", "temp");
RegEx Demo
Read more about lookarounds

Testing if there's or not a quote before and after the target is a wrong approach because you can't know if the described quote is an opening quote or a closing quote. (try to add a quote at the start of your test string and test a naive pattern, you will see: 'inside'a_outside_d'inside').
The only way to know if something is inside or outside quotes is to check the string from the beginning (or from the end, but it's less handy and eventually error prone if quotes aren't balanced). To do that, you must describe all possible substrings before the target, example:
\G([^a']*+(?:'[^']*'[^a']*|\Ba+[^a']*|a(?!\w*d\b)[^a']*)*+)\ba\w*d\b
details:
\G # matches the start of the string or the position after the previous match
(
[^a']*+ # all that isn't an "a" or a quote
(?:
'[^']*' [^a']* # content between quotes
|
\Ba+ [^a']* # "a" not at the start of a word
|
a(?!\w*d\b) [^a']* # "a" at the start of a word that doesn't end with "d"
)*+
) # all that can be before the target in a capture group
\ba\w*d\b # the target
Don't forget to escape backslashes in the java string: \ => \\.
To perform the replacement, you need to refer to the capture group 1:
$1temp
Note: to handle escaped quotes between quotes, change '[^']*' to: '[^\\']*+(?s:\\.[^\\']*)*+'.
Demo: click the Java button.

Related

How do I match escaped characters in groups in Java RegEx

I'm recently working on a command line project in java and I need to parse through commands. But I'm having issues in matching this particular command.
15.00|GR,LQ,MD "Uber"
where the amount can be with a decimal fraction of two or an int. I need to collect all the information on groups. "Uber" is an optional description.
Here is what I have tried..
Pattern.compile("ˆ([\\d]+(\\.[\\d]{2})?\\|([A-Z]{2}){1})(,[A-Z]{2})*\\s(\\\".+\\\")?$");
What I expect is to get the number, the two character composed users and optionally the description too..

Your regex analyzed:
"ˆ([\\d]+(\\.[\\d]{2})?\\|([A-Z]{2}){1})(,[A-Z]{2})*\\s(\\\".+\\\")?$"
First, let's un-escape the Java string literal into the actual regex string:
ˆ([\d]+(\.[\d]{2})?\|([A-Z]{2}){1})(,[A-Z]{2})*\s(\".+\")?$
Now lets split it apart:
ˆ Incorrect character 'ˆ', should be '^'
Match start of input, but your input starts with '['
(
[\d]+ The '[]' is superfluous, use '\d+'
(\.[\d]{2})? Don't capture this, use '(?:X)?'
\|
([A-Z]{2}){1} The '{1}` is superfluous, and don't capture just this
) You're capturing too much. Move back to before '\|'
(,[A-Z]{2})* Will only capture last ',XX'.
Use a capture group around all the letters, then split that on ','
\s
(\".+\")? No need to escape '"', and only capture the content
$ Match end of input, but your input ends with ']'
So, cleaned up it will be:
^\[
(
\d+
(?:\.[\d]{2})?
)
\|
(
[A-Z]{2}
(?:,[A-Z]{2})*
)
\s
(?:"(.+)")?
\]$
Joined back together:
^\[(\d+(?:\.[\d]{2})?)\|([A-Z]{2}(?:,[A-Z]{2})*)\s(?:"(.+)")?\]$
With input [15.00|GR,LQ,MD "Uber"] that will capture:
15.00 - The full number
GR,LQ,MD - Use split(",") to get array { "GR", "LQ", "MD" }
Uber - Just the text without the quotes
See Demo on regex101.com.

The first character is a ˆ and not ^. Beside that you should change your first group to ([\d]+(\.[\d]{2})?) to get only 15.00 and not 15.00|GR.
The full example would look like this:
Pattern.compile("^([\\d]+(\\.[\\d]{2})?)\\|(([A-Z]{2})(,[A-Z]{2})*)\\s(\".+\")?$");

There are 2 main issues.
The ˆ character is an accent circumflex instead of a ^ caret.
You're not including the square brackets in the regex.
A possible solution could be like this
Pattern.compile("^\\[(?<number>[\\d]+(?>\\.[\\d]{2})?)\\|(?<codes>(?>[A-Z]{2},?)+)(?>\\s\\\"(?<comment>.+)\\\")?\\]$");
This solution also has named capturing groups which makes it nicer to specify which group you want to get value from. https://regex101.com/r/HEboNf/2
All three of the 2 letter codes are grouped in a single capturing group, you can split them in your code on the comma.

Find and replace whole line based on regex java

I have this string
Chest pain\tab \tab 72%\tab 0%\tab 67%
}d \ql \li0\ri0\nowidctlpar\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\tx10800\tx11520\tx12240\tx12960\faauto\rin0\lin0\itap0 {\insrsid14762702
}d \ql \li0\ri0\nowidctlpar\tx720\tx1440\tx2160\tx2880\tx3600\tx4320\tx5040\tx5760\tx6480\tx7200\tx7920\tx8640\tx9360\tx10080\faauto\rin0\lin0\itap0 {\b\f1\fs24\ul\insrsid14762702 Waveform}{\insrsid14762702
}{\insrsid14762702 {\*\shppict{\pict{\*\picprop\shplid1025{\sp{\sn shapeType}{\sv 75}}{\sp{\sn fFlipH}{\sv 0}}{\sp{\sn fFlipV}{\sv 0}}{\sp{\sn fLine}{\sv 0}}{\sp{\sn fLayoutInCell}{\sv 1}}}
\
I want to get rid of all lines with }d \ql in them
I have tried
String v= u.replace("}d \\ql(\\.*)","");
but it doesn't detect the line. Having tested it out the culprit must be the .* part but I don't know how to put it in the string.replace

replace doesn't use regex syntax, replaceAll does. This means that \\.* simply replace text which represents \ . and *.
So your first solution could look like (notice that to create \ literal in regex you need to escaped it twice: once in regex \\ and once in string literal "\\\\")
String v = u.replaceAll("\\}d \\\\ql.*","");
But possible problem here is that we don't require \} to be placed at start of string. Also we are skipping leading space in that line which exist right before \}.
To solve it we can add ^\s* at start of your regex and make ^ represent start of line (we can do it with MULTILINE flag - we can use (?m) for that).
So now our solution could look like:
String v= u.replaceAll("(?m)^\\s*\\}d \\\\ql.*","");
But there is another problem. . can't match line separators so .* will not include them in match which will prevent us from removing them.
So we should include them in our match explicitly (we should also make them optional - we can use ? quantifier for that - in case line you want to match will be last one, which means it will not have line separator after it). Since Java 8 we can do it with \R which can match few line separators (including paragraph separators), or if you want to limit yourself only to \r \n (or can't use Java 8) you can use something like (\r?\n|\r).
So our final solution can look like:
in Java 8
String v = u.replaceAll("(?m)^\\s*\\}d \\\\ql.*\\R?","");
pre Java 8
String v = u.replaceAll("(?m)^\\s*\\}d \\\\ql.*(\r?\n|\r)?","");

How to match ^(d+) in a particular text using regex

For example I have text like below :
case1:
(1) Hello, how are you?
case2:
Hi. (1) How're you doing?
Now I want to match the text which starts with (\d+).
I have tried the following regex but nothing is working.
^[\(\d+\)], ^\(\d+\).

[] are used to match any of the things you specify inside the brackets, and are to be followed by a quantifier.
The second regexp will work: ^\(\d+\), so check your code.
Check also so there's no space in front of the first parenthesis, or add \s* in front.
EDIT: Also, java can be tricky with escapes depending on if the regexp you type is directly translated to a regexp or is first a string literal. You may need to double escape your escapes.

In Java you have to escape parenthesis, so "\\(\\d+\\)" should match (1) in case one and two. Adding ^ as you did "^\\(\\d+\\)" will match only case1.
You have to use double back slashes within java string. Consider this
"\n" give you [line break]
"\\n" give you [backslash][n]

If you are going to downvote my post, at least comment to tell me WHY it's not useful.
I believe Java's Regex Engine supports Positive Lookbehind, in which case you can use the following regex:
(?<=[(][0-9]{1,9999}[)]\s?)\b.*$
Which matches:
The literal text (
Any digit [0-9], between 1 and 9999 times {1,9999}
The literal text )
A space, between 0 and 1 times \s?
A word boundary \b
Any character, between 0 and unlimited times .*
The end of a string $

Match exact word using regex in java

I need a regular expression to match an exact word.
For example:
There is a string "draft guidance allerg Excellence" and I want to search allerg then I have written \ballerg\b. It gives me exact match. But when I pass string as "draft guidance 12=allerg Excellence" then it also return true, but this is wrong.
Which regular expression do I need to match only exact words?

The \b boundary would normally handle this situation, even in your case of "draft guidance 12=allerg Excellence"; however, you're saying that the = is part of the word (in normal English, this is not the case).
I'm assuming then that by "whole word", you mean a word that is surrounded by a space or normal sentence punctuation. In this case, a regex such as the following should work:
(?:^|[\s\.;\?\!,])allerg(?:$|[\s\.;\?\!,])
You can, obviously, add or remove characters as needed.
Regex Explained:
(?: # non-matching group
^ # beginning of string
| [\s\.;\?\!,] # OR a space, period, and other misc. punctuation
)
allerg # string to match
(?: # non-matching group
$ # end of string
| [\s\.;\?\!,] # OR a space, period, and other misc. punctuation
)

If i understood the question correctly, you want to match a word "allerg" . A word is enclosed with whitespace characters and "=allerg" has the "=" character which you dont want to match.
To match the word "allerg" you can use the following regex:
\s+allerg\s+

java regex string split by " not \"

actually I need to write just a simple program in JAVA to convert MySQL INSERTS lines into CSV files (each mysql table equals one CSV file)
is the best solution to use regex in JAVA?
My main problem how to match correctly value like this: 'this is \'cool\'...'
(how to ignore escaped ')
example:
INSERT INTO `table1` VALUES ('this is \'cool\'...' ,'some2');
INSERT INTO `table1` (`field1`,`field2`) VALUES ('this is \'cool\'...' ,'some2');
Thanks

Assuming that your SQL statements are syntactically valid, you could use
Pattern regex = Pattern.compile("'(?:\\\\.|[^'\\\\])*'");
to get a regex that matches all single-quoted strings, ignoring escaped characters inside them.
Explanation without all those extra backslashes:
' # Match '
(?: # Either match...
\\. # an escaped character
| # or
[^'\\] # any character except ' or \
)* # any number of times.
' # Match '
Given the string
'this', 'is a \' valid', 'string\\', 'even \\\' with', 'escaped quotes.\\\''
this matches
'this'
'is a \' valid'
'string\\'
'even \\\' with'
'escaped quotes.\\\''

You can match on chars within non-escaped quotes by using this regex:
(?<!\\)'([^'])(?<!\\)`
This is using a negative look-behind to assert that the character before the quote is not a bask-slash.
In jave, you have to double-escape (once for the String, once for the regex), so it looks like:
String regex = "(?<!\\\\)'([^'])(?<!\\\\)`";
If you are working in linux, I would be using sed to do all the work.

Four backslashes (two to represent a backslash) plus dot. "'(\\\\.|.)*'"

Although regexes give you a very powerful mechanism to parse text, I think you might be better off with a non-regex parser. I think you code will be easier to write, easier to understand and have fewer bugs.
Something like:
find "INSERT INTO"
find table name
find column names
find "VALUES"
find value set (loop this part)
Writing the regex to do all of the above, with optional column values and an optional number of value sets is non-trivial and error-prone.

You have to use \\\\. In Java Strings \\is one \, because the backslash is used to do whitespace or control characters (\n,\t, ...). But in regex a backslash is also represented by '\'.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace word using java regex but not quotes - java

You can use lookaround assertions: string = string.replaceAll("(?i)(?<!')\\ba[a-zA-Z0-9_.]*d\\b(?!')", "temp"); RegEx Demo Read more about lookarounds

Related

How do I match escaped characters in groups in Java RegEx

Find and replace whole line based on regex java

How to match ^(d+) in a particular text using regex

Match exact word using regex in java

java regex string split by " not \"

Categories

Resources