How to ignore ' ' in Xpath? - java

I am trying to figure out how to ignore ' ' in the locator while writing xpath.
For example:
//affiliation-summary[#ng-if='oaProfileStatus.oa.workflowState === 'PENDING_CONFIRMATION'']
Now what I am trying to do is ignore ' before P and ' after N in pending and confirmation respectively so that Xpath don't treat these as start and end point of identifier.
Just to be full proof if what I am trying to achieve is not clear yet then this is what we do in java
System.out.println("\"Hello");
to ignore " from terminating the String so that "Hello will be printed.
How can I achieve this in Xpath?

In XPath 1.0 you can put ' in a string by using " as the string delimiter or vice versa. When the XPath is nested in an XML document (e.g. XSLT, XSD, or XProc), you may also need to escape this as an XML character reference, or when XPath is nested in a Java/C# string you may need to escape it as \' or \"
In XPath 2.0 there is an additional option: for a string delimited by single quotes you can include a doubled single quote ('Don''t do it!'),
and for a string delimited by double quotes you can include a doubled double quote ("Don't say ""I can't""").
A useful trick for XPath-within-XSLT, especially with 1.0, is to use variables, for example
<xsl:variable name="quot">"</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>
<xsl:variable name="sentence"
select="concat('Don', $apos, 't say ', $quot, 'I can', $apos, 't', $quot)"/>

Related

Java - Replace single quote without affecting multi single quotes

Is it possible to replace a single quote ' without affecting multi-single quotes (e.g. ''').
For example, I want to replace ' with '''
GIVEN --> EXPECT
-------------------------------------------
"text" --> "text"
"long'text" --> "long'''text"
"long'long''text" --> "long'''long''text"
"long'long'''text" --> "long'''long'''text"
"long'long'text" --> "long'''long'''text"
Thanks in advance
For matching use this look-around based regex:
(?<!')'(?!')
and replace it by:
'''
RegEx Demo
(?<!')'(?!') matches a single quote if is not followed and preceded by a single quote.

concatenating strings and variables

i have problems while concatenating strings and variables. I tried to add quotes and slashes, i tried to move them back and forth, but i wasnt able to find a solution.
I have a class that 'write' a div. I wrote this
String var = "width:100px";
String div ="<div class=\"divClass\" style="+var+">";
The code i wrote give me
<div class="divClass" style=width:100px>
But, in order to write a good code i would need this
<div class="divClass" style="width:100px">
with the value of style between quote "".
You need to escape the " symbol
String var = "\"width:100px\"";
String div ="<div class=\"divClass\" style="+var+">";
Then div would be
<div class="divClass" style="width:100px">
The reason we need to do this is that we need to tell the compiler that the quotes symbol " is a part of the String and we are not closing the String literal yet.
Example
System.out.println("hello"); => hello
System.out.println("\"hello\""); => "hello"
When the compiler sees \" it reads \ and knows that it has to ignore the next character ie ".
try
String var = "\"width:100px\"";
as you will need to escape your quotes
Just try like this.
String var = "width:100px";
String div ="<div class=\"divClass\" style=\""+var+"\">";

Ignoring the line break in regex?

I have below content in text file
some texting content <img src="cid:part123" alt=""> <b> Test</b>
I read it from file and store it in String i.e inputString
expectedString = inputString.replaceAll("\\<img.*?cid:part123.*?>",
"NewContent");
I get expected output i.e
some texting content NewContent <b> Test</b>
Basically if there is end of line character in between img and src like below, it does not work for example below
<img
src="cid:part123" alt="">
Is there a way regex ignore end of line character in between while matching?
If you want your dot (.) to match newline also, you can use Pattern.DOTALL flag. Alternativey, in case of String.replaceAll(), you can add a (?s) at the start of the pattern, which is equivalent to this flag.
From the Pattern.DOTALL - JavaDoc : -
Dotall mode can also be enabled via the embedded flag expression (?s).
(The s is a mnemonic for "single-line" mode, which is what this is
called in Perl.)
So, you can modify your pattern like this: -
expectedStr = inputString.replaceAll("(?s)<img.*?cid:part123.*?>", "Content");
NOTE: - You don't need to escape your angular bracket(<).
By default, the . character will not match newline characters. You can enable this behavior by specifying the Pattern.DOTALL flag. In String.replaceAll(), you do this by attaching a (?s) to the front of your pattern:
expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>",
"NewContent");
See also Pattern.DOTALL with String.replaceAll
You need to use Pattern.DOTALL mode.
replaceAll() doesn't take mode flags as a separate argument, but you can enable them in the expression as follows:
expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", ...);
Note, however, that it's not a good idea to parse HTML with regular expressions. It would be better to use HTML parser instead.

JSP Text Processing with Regex

I have a large number (>1500) of JSP files that I am trying to convert to JSPX. I am using a tool that will parse well-formed JSPs and convert to JSPX, however, my JSPs are not all well-formed :)
My solution is to pre-process the JSPs and convert untidy code so the tool will parse them correctly. The main problem I am trying to resolve is that of unquoted attribute values. Examples:
<INPUT id="foo" size=1>
<input id=body size="2">
My current regex for finding these is (in Java string format):
"(\\w+)=([^\"' >]+)"
And my replacement string is (in Java string format):
"$1=\"$2\""
This works well, EXCEPT for a few patterns, both of which involve inline scriptlets. For example:
<INPUT id=foo value="<%= someBean.method("a=b") %>">
In this case, my pattern matches the string literal "a=b", which I don't want to do. What I'd like to have happen is that the regex would IGNORE anything between <% and %>. Is there a regular expression that will do what I am trying to do?
EDIT:
Changed to title to clarify that I am NOT trying to parse HTML / JSP with regexes... I am doing a simple syntactic transformation to prepare the input for parsing.
If a sentence contains an arbitrary number of matching tokens such as double quotes, then this sentence belongs to a context-free language, which simply cannot be parsed with Regex designed to handle regular languages.
Either there could be some simplification assumptions (e.g. there are no unmatched double quotes and there is only a certain number of those etc.) that would permit the use of Regex, or your need to think about using (creating) a lexer/parser for a case of context-free language. ANTLR is a good tool for this.
Based on the assumption that there are NO unquoted attribute values inside the scriptlets, the following construct might work for you:
Note: this approach is fragile. Just for your reference.
import java.util.regex.*;
public class test{
public static void main(String args[]){
String s = "<INPUT id=foo abbr='ip ' name = bar color =\"blue\" value=\" <%= someBean.method(\" a = b \") %>\" nickname =box >";
Pattern p = Pattern.compile("(\\w+)\\s*=\\s*(\\w+[^\"'\\s])");
Matcher m = p.matcher(s);
while (m.find())
{
System.out.println("Return Value :"+m.group(1)+"="+m.group(2));
}
}
}
Output:
Return Value:id=foo
Return Value:name=bar
Return Value:nickname=box

How do I include &, <, > etc in XML attribute values

I want to create an XML file which will be used to store the structure of a Java program. I am able to successfully parse the Java program and create the tags as required. The problem arises when I try to include the source code inside my tags, since Java source code may use a vast number of entity reference and reserved characters like &, < ,> , &. I am not able to create a valid XML.
My XML should go like this:
<?xml version="1.0"?>
<prg name="prg_name">
<class name= "class_name>
<parent>parent class</parent>
<interface>Interface name</interface>
.
.
.
<method name= "method_name">
<statement>the ordinary java statement</statement>
<if condition="Conditional Expression">
<statement> true statements </statement>
</if>
<else>
<statement> false statements </statement>
</else>
<statement> usual control statements </statement>
.
.
.
</method>
</class>
.
.
.
</prg>
Like this, but the problem is conditional expressions of if or other statements have a lot of & or other reserved symbols in them which prevents XML from getting validated. Since all this data (source code) is given by the user I have little control over it. Escaping the characters will be very costly in terms of time.
I can use CDATA to escape the element text but it can not be used for attribute values containing conditional expressions. I am using Antlr Java grammar to parse the Java program and getting the attributes and content for the tags. So is there any other workaround for it?
You will have to escape
" to "
' to &apos;
< to <
> to >
& to &
for xml.
In XML attributes you must escape
" with "
< with <
& with &
if you wrap attribute values in double quotes ("), e.g.
<MyTag attr="If a<b & b<c then a<c, it's obvious"/>
meaning tag MyTag with attribute attr with text If a<b & b<c then a<c, it's obvious - note: no need to use &apos; to escape ' character.
If you wrap attribute values in single quotes (') then you should escape these characters:
' with &apos;
< with <
& with &
and you can write " as is.
Escaping of > with > in attribute text is not required, e.g. <a b=">"/> is well-formed XML.

Categories