Escape XML Characters for Attribute values Java - java

I have an XML represented in String. I need to replace all the special characters in the Attribute values with the Escape Characters.
For Ex:
I want to convert 1st one to the second one as following.
<r1 c1=\"01\" c168=\"<A_ATTR><Updates A_VALUE="959" /><Current A_VALUE="100" /></A_ATTR>\"/>
<r1 c1=\"01\" c168=\"<A_ATTR><Updates A_VALUE="959" /><Current A_VALUE="100" /></A_ATTR>\"/>
This questions is similar to the below one : But I need to escape the attribute values. Please advise.
Escape xml characters within nodes of string xml in java

Use string replace function to replace the required character by the encoding. Example below
if your xml string is s then
s = s.replace("<", "<");
s = s.replace(">", ">");

Related

read unique char: 'あ' from json file in java

I am reading a JSON file in Java using this code:
String data = Files.readFile(jsonFile)
.trim()
.replaceAll("[^\\x00-\\x7F]", "")
.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "")
.replaceAll("\\p{C}", "");
In my JSON file, there is a unique char: 'あ' (12354) that is interpreted to: "" (nothing) when reading the file.
How can I make this char show up in my variable "data"?
Due to answers I've got, I understand that the data is cleaned from high ASCII characters by adding replaceAll("[^\\x00-\\x7F]", ""). But what can I do if I want all high ASCII characters to be cleaned except this one 'あ'?
The character you want is the unicode character HIRAGANA LETTER A and has code U+3042.
You can simply add it to the list of valid characters:
...
.replaceAll("[^\\x00-\\x7F\\u3042]", "")
...

How to Parse a Java String containing HTML element as JsonObject?

Hi I am having a Java String with following value received from HTTPRequest
{SubRefNumber:"3243 ",QBType:"-----",Question:"<p><img title="format.jpg" src="..."></img></p>"};
As the String contains HTML elements as part of it,while i try to parse the String as JsonObject as below (quesRow is the variable with above String as value)
JSONObject jsonObject = new JSONObject(quesRow);
I get parse error
org.codehaus.jettison.json.JSONException: Expected a ',' or '}' at character 103 of {SubRefNumber:"3243.....
I need to parse the HTML elements within Question Key as a seperate data from this JSONString. is there any way to handle this scenario? Please Guide...TIA
A valid JSON does not contain an unescaped quotation mark (") inside a string (See RFC 7159 Chapter 7 - https://www.rfc-editor.org/rfc/rfc7159#page-9).
There are different options to escape the quotation mark in your source string, already when putting it into the JSON string parameter:
escape with a backslash - "
escape as unicode sequence - \u0022

What should be regular expression to replace only two quotes?

<element>
<Argument Name="AWSAccessKeyId" Value="APOEIUVWIE8E78E6"></Argument>
<Argument Name="SearchIndex" Value="Apparel"></Argument>
<LegalDisclaimer>Leriya Fashion products,on amazon.in."Leriya Fashion" in search.</LegalDisclaimer>
</element>
I want to replace only quotes in this word ("Leriya Fashion").I have tried many regular expression but they replace all the quotes.Right now we know this word but what if we don't know the actual word.
"|[a-z]$$|"$
"|"+\s
I want to replace it with blank or space. And the main problem is occurred when we convert this xml to json. Because json take this double quoted as value but in actual its not a value its just a name which is double quoted.So for me its very tough to replace this quote with blank in json thats why I'm trying to replace this in xml file.
If it's only "Leriya Fashion", then why not just use String::replace
str = str.replace("\"Leriya Fashion\"", "Leriya Fashion");
I'm assuming you just want to remove the quotes.
Lambda replaceAll
str = Pattern.compile("(.)\"([^\"\r\n]*\"").matcher(str)
.replaceAll(mr -> mr.group(1).equals("=") ? mr.group()
: mr.group(1) + mr.group(2));
This will replace all quotes from any text "..." from non-HTML-attributes (="...").
The dangerous assumption is that quotes only appear in pairs.

Remove special character from a column in dataframe

I am trying to remove a special character (å) from a column in a dataframe.
My data looks like:
ClientID,PatientID
AR0001å,DH_HL704221157198295_91
AR00022,DH_HL704221157198295_92
My original data is approx 8TB in size from which I need to get rid of this special character.
Code to load data:
reader.option("header", true)
.option("sep", ",")
.option("inferSchema", false)
.option("charset", "ISO-8859-1")
.schema(schema)
.csv(path)
After loading into dataframe when I do df.show() it shows:
+--------+--------------------+
|ClientID| PatientID|
+--------+--------------------+
|AR0001Ã¥|DH_HL704221157198...|
|AR00022 |DH_HL704221157198...|
+--------+--------------------+
Code I used to try to replace this character:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "\å", ""));
But this didn't work. While loading the data in dataframe if I change the charset to "UTF-8" it works.
I am not able to find a solution with the current charset (ISO-8859-1).
Some things to note,
Make sure to assign the result to a new variable and use that afterwards
You do not need to escape "å" with \
colName in the command should be ClientId or PatientID
If you did all these things, then I would suggest to, instead of matching on "å", try matching on the characters you want to keep. For example, for the ClientID column,
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^A-Z0-9_]", ""));
Another approach would be to convert the UTF-8 character "å" to it's ISO-8859-1 equivalent and replace with the resulting string.
String escapeChar = new String("å".getBytes("UTF-8"), "ISO-8859-1");
The below command will remove all the special characters and will keep all the lower/upper case alphabets and all the numbers in the string:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^a-zA-Z0-9]", ""));

Insert " in correct form in a String

I want to to input this link in to the string.
String url=www.test.com;
String link=<a href=url>contact info</a>
How can I write this ?
You will need to do:
String url = "www.test.com";
You can use \ character to indicate that we want to include a special character, and that the next character should be treated differently. \" indicates a double quote character and not the termination of the string.
String link = "contact info";
A character preceded by a backslash is an escape sequence and has special meaning to the compiler. The following table shows the Java escape sequences:
Java Escape Sequences:
For More information check this link
First, let's assume you have:
String url = "www.test.com";
(Note the quotes around the string.)
To create your link string, you'd do this:
String link = "contact info";
// Note ---------------^^-----------^^
To put a " inside a string literal, you put a backslash in front of it. This is called "escaping" the quote.
First have the url value within quotes ,then concat the value in the link string.
String url="www.test.com";
String link="contact info";

Categories