read unique char: 'あ' from json file in java - java

I am reading a JSON file in Java using this code:
String data = Files.readFile(jsonFile)
.trim()
.replaceAll("[^\\x00-\\x7F]", "")
.replaceAll("[\\p{Cntrl}&&[^\r\n\t]]", "")
.replaceAll("\\p{C}", "");
In my JSON file, there is a unique char: 'あ' (12354) that is interpreted to: "" (nothing) when reading the file.
How can I make this char show up in my variable "data"?
Due to answers I've got, I understand that the data is cleaned from high ASCII characters by adding replaceAll("[^\\x00-\\x7F]", ""). But what can I do if I want all high ASCII characters to be cleaned except this one 'あ'?

The character you want is the unicode character HIRAGANA LETTER A and has code U+3042.
You can simply add it to the list of valid characters:
...
.replaceAll("[^\\x00-\\x7F\\u3042]", "")
...

Related

How to Parse a Java String containing HTML element as JsonObject?

Hi I am having a Java String with following value received from HTTPRequest
{SubRefNumber:"3243 ",QBType:"-----",Question:"<p><img title="format.jpg" src="data:image/jpeg;base64,/9j/4AAQSkZJRgAB..."></img></p>"};
As the String contains HTML elements as part of it,while i try to parse the String as JsonObject as below (quesRow is the variable with above String as value)
JSONObject jsonObject = new JSONObject(quesRow);
I get parse error
org.codehaus.jettison.json.JSONException: Expected a ',' or '}' at character 103 of {SubRefNumber:"3243.....
I need to parse the HTML elements within Question Key as a seperate data from this JSONString. is there any way to handle this scenario? Please Guide...TIA
A valid JSON does not contain an unescaped quotation mark (") inside a string (See RFC 7159 Chapter 7 - https://www.rfc-editor.org/rfc/rfc7159#page-9).
There are different options to escape the quotation mark in your source string, already when putting it into the JSON string parameter:
escape with a backslash - "
escape as unicode sequence - \u0022

Escape XML Characters for Attribute values Java

I have an XML represented in String. I need to replace all the special characters in the Attribute values with the Escape Characters.
For Ex:
I want to convert 1st one to the second one as following.
<r1 c1=\"01\" c168=\"<A_ATTR><Updates A_VALUE="959" /><Current A_VALUE="100" /></A_ATTR>\"/>
<r1 c1=\"01\" c168=\"<A_ATTR><Updates A_VALUE="959" /><Current A_VALUE="100" /></A_ATTR>\"/>
This questions is similar to the below one : But I need to escape the attribute values. Please advise.
Escape xml characters within nodes of string xml in java
Use string replace function to replace the required character by the encoding. Example below
if your xml string is s then
s = s.replace("<", "<");
s = s.replace(">", ">");

Remove special character from a column in dataframe

I am trying to remove a special character (å) from a column in a dataframe.
My data looks like:
ClientID,PatientID
AR0001å,DH_HL704221157198295_91
AR00022,DH_HL704221157198295_92
My original data is approx 8TB in size from which I need to get rid of this special character.
Code to load data:
reader.option("header", true)
.option("sep", ",")
.option("inferSchema", false)
.option("charset", "ISO-8859-1")
.schema(schema)
.csv(path)
After loading into dataframe when I do df.show() it shows:
+--------+--------------------+
|ClientID| PatientID|
+--------+--------------------+
|AR0001Ã¥|DH_HL704221157198...|
|AR00022 |DH_HL704221157198...|
+--------+--------------------+
Code I used to try to replace this character:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "\å", ""));
But this didn't work. While loading the data in dataframe if I change the charset to "UTF-8" it works.
I am not able to find a solution with the current charset (ISO-8859-1).
Some things to note,
Make sure to assign the result to a new variable and use that afterwards
You do not need to escape "å" with \
colName in the command should be ClientId or PatientID
If you did all these things, then I would suggest to, instead of matching on "å", try matching on the characters you want to keep. For example, for the ClientID column,
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^A-Z0-9_]", ""));
Another approach would be to convert the UTF-8 character "å" to it's ISO-8859-1 equivalent and replace with the resulting string.
String escapeChar = new String("å".getBytes("UTF-8"), "ISO-8859-1");
The below command will remove all the special characters and will keep all the lower/upper case alphabets and all the numbers in the string:
df.withColumn("ClientID", functions.regexp_replace(df.col("ClientID"), "[^a-zA-Z0-9]", ""));

convert string to control characters

I have to replace a string literal with the delimiter coming form mysql resultset.
String str1 = value.replace(" - ", mysqlDelimiterValue);
Here i am passing delimiter into variable mysqlDelimiterValue.
If there is any escape character in the delimiter value then instead of generating that escape character as delimiter into data file it converting that delimiter value into string and writing into data file.
For Example:
My input file record is: "a - b - c - d"
My delimiter is: "\t" (tab delimiter)
Expected output: "a b c d"(delimited by tab)
Actual output: "a\tb\tc\td"
Here my delimiter is dynamic. So i want to make this one as generalized one which support any delimiter.
Please help me... thanks in advance...

Insert " in correct form in a String

I want to to input this link in to the string.
String url=www.test.com;
String link=<a href=url>contact info</a>
How can I write this ?
You will need to do:
String url = "www.test.com";
You can use \ character to indicate that we want to include a special character, and that the next character should be treated differently. \" indicates a double quote character and not the termination of the string.
String link = "contact info";
A character preceded by a backslash is an escape sequence and has special meaning to the compiler. The following table shows the Java escape sequences:
Java Escape Sequences:
For More information check this link
First, let's assume you have:
String url = "www.test.com";
(Note the quotes around the string.)
To create your link string, you'd do this:
String link = "contact info";
// Note ---------------^^-----------^^
To put a " inside a string literal, you put a backslash in front of it. This is called "escaping" the quote.
First have the url value within quotes ,then concat the value in the link string.
String url="www.test.com";
String link="contact info";

Categories