Is it possible to replace a single quote ' without affecting multi-single quotes (e.g. ''').
For example, I want to replace ' with '''
GIVEN --> EXPECT
-------------------------------------------
"text" --> "text"
"long'text" --> "long'''text"
"long'long''text" --> "long'''long''text"
"long'long'''text" --> "long'''long'''text"
"long'long'text" --> "long'''long'''text"
Thanks in advance
For matching use this look-around based regex:
(?<!')'(?!')
and replace it by:
'''
RegEx Demo
(?<!')'(?!') matches a single quote if is not followed and preceded by a single quote.
Related
I am trying to figure out how to ignore ' ' in the locator while writing xpath.
For example:
//affiliation-summary[#ng-if='oaProfileStatus.oa.workflowState === 'PENDING_CONFIRMATION'']
Now what I am trying to do is ignore ' before P and ' after N in pending and confirmation respectively so that Xpath don't treat these as start and end point of identifier.
Just to be full proof if what I am trying to achieve is not clear yet then this is what we do in java
System.out.println("\"Hello");
to ignore " from terminating the String so that "Hello will be printed.
How can I achieve this in Xpath?
In XPath 1.0 you can put ' in a string by using " as the string delimiter or vice versa. When the XPath is nested in an XML document (e.g. XSLT, XSD, or XProc), you may also need to escape this as an XML character reference, or when XPath is nested in a Java/C# string you may need to escape it as \' or \"
In XPath 2.0 there is an additional option: for a string delimited by single quotes you can include a doubled single quote ('Don''t do it!'),
and for a string delimited by double quotes you can include a doubled double quote ("Don't say ""I can't""").
A useful trick for XPath-within-XSLT, especially with 1.0, is to use variables, for example
<xsl:variable name="quot">"</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="sentence"
select="concat('Don', $apos, 't say ', $quot, 'I can', $apos, 't', $quot)"/>
I am trying to transform the following string:
<img src="image.jpg" ... />
with this one
<img src="cid:image" ... />
the "image" string needs to be maintained but the string itself could be different. In the html document there are different img tags each one with a different image file.
so for instance if I have:
<img src="mylogo.jpg" ... />
it should transform to:
<img src="cid:mylogo" ... />
The images could be jpg or gif.
Thanks for any help,
Note:
Apart from the fact that Regex is not the right tool to parse HTML, as mentioned in comments, because in Java there are many tools for parsing HTML maybe you can take a look at jsoup for example, I will give you a solution that fits your needs of using Regex.
Solution:
You can use the following Regex:
src=\"([\\:\\w\\s\\/]+)\\.\\w{3}\"
This is the code you need:
String html = "<img src=\"folder1/mylogo.jpg\" ... />";
Pattern pattern = Pattern.compile("src=\"([\\:\\w\\s\\/]+)\\.\\w{3}\"");
Matcher matcher = pattern.matcher(html);
while (matcher.find()) {
System.out.println("group 1: " + matcher.group(1));
//This line will give you the wanted output.
System.out.println("src=\"cid:"+matcher.group(1)+"\"");
System.out.println("Final Result: "+html.replaceAll("src=\"([\\:\\w\\s\\/]+)\\.\\w{3}\"", "src=\"cid:$1\""));
}
And this is a Working DEMO.
Explanation:
src= matches the characters src= literally.
\" matches the character " literally.
([\\w\\/]+) is a capturing group to match all the wanted text.
\. matches the character . literally.
\w{3,4} match any word character [a-zA-Z0-9_] between 3 and 4 times for extensions, you can use jpg|gif instead if you are not willing to use any other image extensins.
\" matches the character " literally
EDIT:
Desired output:
And to replace this expression with the wanted result just use this regex on the replaceAll() method with your HTML, as follow:
html.replaceAll("src=\"([\\:\\w\\s\\/]+)\\.\\w{3}\"", "src=\"cid:$1\"");
We use $1 to point to the first capturing group.
I'm writing a regular expression to replace all occurrences of the substring ">(Some Text)</A>" with .html">(Some Text)</A>" (case insensitive) in an HTML document.
However, it does not appear to produce the intended replacement on the outputted page.
Pattern fixRest = Pattern.compile("(\">.*?</a>)", Pattern.CASE_INSENSITIVE);
Matcher mh2 = fixRest.matcher(pgText);
mh2.replaceAll(".html$1");
When I view the outputted page, it appears that there are plenty of href links that are not suffixed with a .html by this code.
Is there something wrong with my Regex? Running it under RegexBuddy I see it producing the results I expect for the same page that is in the variable pgText.
mh2.replaceAll(".html$1");
isn't modifying mh2 in place. Try using the result as in
mh2 = mh2.replaceAll(".html$1");
In general though, don't use regular expressions to parse HTML.
Here's a sampling of the ways this can fail:
<a href='...'>foo</a> <!-- single quotes -->
<a href=...>foo</a> <!-- no quotes -->
foo <!-- the href isn't the last attribute. -->
<img src="...">foo <!-- tag inside link -->
<a href="..." >foo</a> <!-- space between attribute and end -->
"y">"x" <!-- text node contains '>' -->
I'm sure you can think of many more.
I have below content in text file
some texting content <img src="cid:part123" alt=""> <b> Test</b>
I read it from file and store it in String i.e inputString
expectedString = inputString.replaceAll("\\<img.*?cid:part123.*?>",
"NewContent");
I get expected output i.e
some texting content NewContent <b> Test</b>
Basically if there is end of line character in between img and src like below, it does not work for example below
<img
src="cid:part123" alt="">
Is there a way regex ignore end of line character in between while matching?
If you want your dot (.) to match newline also, you can use Pattern.DOTALL flag. Alternativey, in case of String.replaceAll(), you can add a (?s) at the start of the pattern, which is equivalent to this flag.
From the Pattern.DOTALL - JavaDoc : -
Dotall mode can also be enabled via the embedded flag expression (?s).
(The s is a mnemonic for "single-line" mode, which is what this is
called in Perl.)
So, you can modify your pattern like this: -
expectedStr = inputString.replaceAll("(?s)<img.*?cid:part123.*?>", "Content");
NOTE: - You don't need to escape your angular bracket(<).
By default, the . character will not match newline characters. You can enable this behavior by specifying the Pattern.DOTALL flag. In String.replaceAll(), you do this by attaching a (?s) to the front of your pattern:
expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>",
"NewContent");
See also Pattern.DOTALL with String.replaceAll
You need to use Pattern.DOTALL mode.
replaceAll() doesn't take mode flags as a separate argument, but you can enable them in the expression as follows:
expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", ...);
Note, however, that it's not a good idea to parse HTML with regular expressions. It would be better to use HTML parser instead.
I want to create an XML file which will be used to store the structure of a Java program. I am able to successfully parse the Java program and create the tags as required. The problem arises when I try to include the source code inside my tags, since Java source code may use a vast number of entity reference and reserved characters like &, < ,> , &. I am not able to create a valid XML.
My XML should go like this:
<?xml version="1.0"?>
<prg name="prg_name">
<class name= "class_name>
<parent>parent class</parent>
<interface>Interface name</interface>
.
.
.
<method name= "method_name">
<statement>the ordinary java statement</statement>
<if condition="Conditional Expression">
<statement> true statements </statement>
</if>
<else>
<statement> false statements </statement>
</else>
<statement> usual control statements </statement>
.
.
.
</method>
</class>
.
.
.
</prg>
Like this, but the problem is conditional expressions of if or other statements have a lot of & or other reserved symbols in them which prevents XML from getting validated. Since all this data (source code) is given by the user I have little control over it. Escaping the characters will be very costly in terms of time.
I can use CDATA to escape the element text but it can not be used for attribute values containing conditional expressions. I am using Antlr Java grammar to parse the Java program and getting the attributes and content for the tags. So is there any other workaround for it?
You will have to escape
" to "
' to '
< to <
> to >
& to &
for xml.
In XML attributes you must escape
" with "
< with <
& with &
if you wrap attribute values in double quotes ("), e.g.
<MyTag attr="If a<b & b<c then a<c, it's obvious"/>
meaning tag MyTag with attribute attr with text If a<b & b<c then a<c, it's obvious - note: no need to use ' to escape ' character.
If you wrap attribute values in single quotes (') then you should escape these characters:
' with '
< with <
& with &
and you can write " as is.
Escaping of > with > in attribute text is not required, e.g. <a b=">"/> is well-formed XML.