Java print string as unicode - java

I was processing some data tweeter using java. I read them from the file, do some process and print to the stdout.
The text in file looks like this:
"RT #Bollogosta319a: #BuyBookSilentSinners \u262fGain Followers\n\u262fRT This\n\u262fMUST FOLLOW ME I FOLLOW BACK\n\u262fFollow everyone who rts\n\u262fGain\n #ANDROID \u2026"
I read it in, and print it out to stdout. The output is supposed to be:
"RT #Bollogosta319a: #BuyBookSilentSinners ☯Gain Followers\n☯RT This\n☯MUST FOLLOW ME I FOLLOW BACK\n☯Follow everyone who rts\n☯Gain\n #ANDROID …"
But my output is like this:
"RT #Bollogosta319a: #BuyBookSilentSinners ?Gain Followers
?RT This
?MUST FOLLOW ME I FOLLOW BACK
?Follow everyone who rts
?Gain
#ANDROID ?"
So, it seems that I have two problems to deal with:
1. print the exact Unicode character instead of Unicode string
2. keep "\n" as it is, instead of a newline in the output.
How can I do this? (I'm really crazy about dealing with different coding in Java)

I don't know how you are parsing the file, but the method you are using seems to be interpreting escape codes (like \n and \u262f). To leave instances of \n in the file literally, you could replace \n with \\n prior to using whatever means of interpreting the escape codes. The \\ will be converted to a single \, and the n will be left alone. Have you tried using a plain java.io.FileReader to read the file? That may be simpler.
The Unicode symbols may actually be read correctly; many terminals do not support the full range of Unicode characters and print some symbol in place of those it does not understand. Perhaps your program prints ☯ and the terminal simply doesn't know how to render it, so it prints a ? instead.

Related

JSCH Library: Getting strange character while reading readLine() [duplicate]

I am working with automation and using Jsch to connect to remote boxes and automate some tasks.
I am having problem parsing the command results because sometimes they come with ANSI Control chars.
I've already saw this answer and this other one but it does not provide any library to do that. I don't want to reinvent the wheel, if there is any. And I don't feel confident with those answers.
Right now, I am trying this, but I am not really sure it's complete enough.
reply = reply.replaceAll("\\[..;..[m]|\\[.{0,2}[m]|\\(Page \\d+\\)|\u001B\\[[K]|\u001B|\u000F", "");
How to remove ANSI control chars (VT100) from a Java String?
Most ANSI VT100 sequences have the format ESC [, optionally followed by a number or by two numbers separated by ;, followed by some character that is not a digit or ;. So something like
reply = reply.replaceAll("\u001B\\[[\\d;]*[^\\d;]","");
or
reply = reply.replaceAll("\\e\\[[\\d;]*[^\\d;]",""); // \e matches escape character
should catch most of them, I think. There may be other cases that you could add individually. (I have not tested this.)
Some of the alternatives in the regex you posted start with \\[, rather than the escape character, which may mean that you could be deleting some text you're not supposed to delete, or deleting part of a control sequence but leaving the ESC character in.

Replacing series of new lines in File

I've ran into a bit of a rough spot in this Java program I'm writing an thought I would ask for some help. I'm using regex to replace certain lines in a file being read in and not getting the desired result. I want to replace all series of 3 new lines in my file and thought this would be straight forward since my regex is working in notepad++ but I guess not. Below is what an example of what the file is like:
FIRST SENTENCECRLF
CRLF
CRLF
CRLF
CRLF
CRLF
SECOND SENTENCECRLF
So, in other words, I am wanting to remove 3 of those carriage return\line feed instances between the first and second sentence lines. Below is what I've tried so far. The first tried in Java results in no change to the file (works in Notepad++ fine). The second, pretty much the same as the first works in notepad++ but not Java. The third is pretty much the exact same case as the other two. Anyone have any helpful suggestions as to what might work in this situation. At this point anything would be greatly appreciated!
^(\r\n){3}
^\r\n(\r\n)(\r\n)
^\r\n\r\n\r\n
Try the following regex:
(?m)^(\r\n){3}
The (?m) enables multi-line mode in Java, as explained in How to use java regex to match a line

Java how to insert a "power of" symbol?

Hello I have a program that displays the area of some land. The number is 1900 square kilometers. I want to write this as 1900 km2 but the two should look like a "to the power of 2" symbol. Is there a way I can insert a symbol like that?
Unicode code u00B2 will give you the superscript two symbol. Try the following:
System.out.println("km\u00B2");
Working example
You could alternatively use the extended ASCII code alt + 253 as stated here
If your output is an html page:
km<sup>2</sup>
or
km²
If your output is a String to the console
System.out.println("km\u00B2");
Other outputs
If you need to print it to other systems (pdf, excel...) and they accept unicode values use the char '\u00B2' for the 2 at the exponent
You can try to use the "SUPERSCRIPT TWO" Unicode Character (escape code \u00B2) if your font supports it.
You can use the string literal "km\u00B2", or just use km² directly in your source code if you are using a unicode-supporting file encoding.

Reading a new line in a string with \n

When building a string, you can create newlines like so:
"This is the first line \n\n And this is the second line";
So, when running this portion of code all works well on the Android Emulator:
TextView newsTextArea = (TextView) view.findViewById(R.id.newsTextView);
newsTextArea.setText("Hello \n\n Whats up");
However, I have downloaded and parsed JSON from a web service we have created, and I have stored what I want in a variable like so:
GlobalSettings globalSettings = new GlobalSettings();
String newsText = globalSettings.getNews();
So the variable newsText equals a string, which lets say for arguments sake here is "Hello, this has two lines. \n\n Welcome to the test".
When I run the above TextView code like this, it outputs it with the \n\n as literal characters.
newsTextArea.setText(newsText);
How can it be done so that the variable newsText keeps the formatting?
im a good guesser..lol well there may be soo many reasons that i think
1. newsTextArea.setSingleLine(false);
2. newsTextArea.setMinLines(2); or newsTextArea.setMaxLines(50);//any figure
3. String newsText = globalSettings.getNews().replace("\\\n", System.getProperty("line.separator"));
4. String newsText = globalSettings.getNews().toString();
play with these methods and see if one works for ya or two lol
However, I have downloaded and parsed JSON from a web service we have created ...
There are great chances that your JSON WS returns the literate sequence \n as its output. Not the line-feed character (<LF>) as you expected it.
At this point you probably have two options:
Fix the web service to return the proper value
Change your client program to "patch" the faulty data in order to obtain the desired behavior. As of myself, I won't push toward that direction as it will soon become a maintenance nightmare (as soon as legitimate \ will pop out in your data).
Maybe it is time to post an other question, this time about your JSON web service ?
It is likely that wherever that text comes from in the first place, the backslash is getting escaped. For example, it may be coming from a hard-coded constant in your service, entered into a form by a human and then validated, or etc.
Look at your JSON in its raw format before it is parsed. The JSON spec says that a single backslash followed by n means the newline character. If it looks like ["Hello, this has two lines. \n\n Welcome to the test"] then you should be good, because the JSON parser should interpret the newline characters correctly. However, if it looks like this: ["Hello, this has two lines. \\n\\n Welcome to the test"], then the backslash character is being escaped, not the n.

In Java (Pig) Regex, how could I do the following?

I have data coming in a txt file delimited by pipes. The unfortunate thing is 2 fields can have multiple values. To separate these multiples, the sender used pipes again, but put quotes around it. My regex worked for months until a certain rare situation...
Regex currently:
([^\|]*)\|"?([^"]*)"?\|([^\|]*)\|"?([^"]*)"?
And it worked for the following situation which happens most of the time:
abc|"part1|part2"|abc|"tool1|tool2"
But this case is where the ([^"]*) jumps ahead and takes all from the blank to the end of the quotes:
abc||abc|"tool1|tool2"
So I realize I must account for when there is a pipe next instead of a quote.
Just not sure how.............
P.S. For those PIG people that might be looking at this, I removed a backslash from each escape, to make it look more like Java, but in PIG you need 2, fyi.
In your expression you need to specify that the part between |s can be either quoted or not quoted. You can do it as follows:
(("[^"]*")|((?!")[^|]*))
Now you can repeat this part several times with |s in between, as you need.

Categories