Hello I have a program that displays the area of some land. The number is 1900 square kilometers. I want to write this as 1900 km2 but the two should look like a "to the power of 2" symbol. Is there a way I can insert a symbol like that?
Unicode code u00B2 will give you the superscript two symbol. Try the following:
System.out.println("km\u00B2");
Working example
You could alternatively use the extended ASCII code alt + 253 as stated here
If your output is an html page:
km<sup>2</sup>
or
km²
If your output is a String to the console
System.out.println("km\u00B2");
Other outputs
If you need to print it to other systems (pdf, excel...) and they accept unicode values use the char '\u00B2' for the 2 at the exponent
You can try to use the "SUPERSCRIPT TWO" Unicode Character (escape code \u00B2) if your font supports it.
You can use the string literal "km\u00B2", or just use km² directly in your source code if you are using a unicode-supporting file encoding.
Related
I was processing some data tweeter using java. I read them from the file, do some process and print to the stdout.
The text in file looks like this:
"RT #Bollogosta319a: #BuyBookSilentSinners \u262fGain Followers\n\u262fRT This\n\u262fMUST FOLLOW ME I FOLLOW BACK\n\u262fFollow everyone who rts\n\u262fGain\n #ANDROID \u2026"
I read it in, and print it out to stdout. The output is supposed to be:
"RT #Bollogosta319a: #BuyBookSilentSinners ☯Gain Followers\n☯RT This\n☯MUST FOLLOW ME I FOLLOW BACK\n☯Follow everyone who rts\n☯Gain\n #ANDROID …"
But my output is like this:
"RT #Bollogosta319a: #BuyBookSilentSinners ?Gain Followers
?RT This
?MUST FOLLOW ME I FOLLOW BACK
?Follow everyone who rts
?Gain
#ANDROID ?"
So, it seems that I have two problems to deal with:
1. print the exact Unicode character instead of Unicode string
2. keep "\n" as it is, instead of a newline in the output.
How can I do this? (I'm really crazy about dealing with different coding in Java)
I don't know how you are parsing the file, but the method you are using seems to be interpreting escape codes (like \n and \u262f). To leave instances of \n in the file literally, you could replace \n with \\n prior to using whatever means of interpreting the escape codes. The \\ will be converted to a single \, and the n will be left alone. Have you tried using a plain java.io.FileReader to read the file? That may be simpler.
The Unicode symbols may actually be read correctly; many terminals do not support the full range of Unicode characters and print some symbol in place of those it does not understand. Perhaps your program prints ☯ and the terminal simply doesn't know how to render it, so it prints a ? instead.
So I'm currently using the commons lang apache library.
When I tried unescaping this string: 😀
This returns the same string: 😀
String characters = "😀"
StringEscapeUtils.unescapeHtml(characters);
Output: 😀
But when I tried unescaping a String with a less few characters, it works:
String characters = "㈳"
StringEscapeUtils.unescapeHtml(characters);
Output: ㈳
Any ideas? When I tried unescaping this String "😀" on online unescaping utility, it works, so maybe it's a bug in the apache common langs library? Or can anyone recommend another library?
Thanks.
UPDATES:
I'm now able to unescape the String successfully. The problem now is when I tried to escaped the result of that unescape, it won't bring back the String (😀).
unescapeHtml() leaves 😀 untouched because – as the documentation says – it only unescapes HTML 4.0 entities, which are limited to 65,536 characters. Unfortunately, 128,512 is far beyond that limit.
Have you tried using unescapeXml()?
XML supports up to 1,114,111 (10FFFFh) character entities (link).
This is a unicode character whose index is U+1F600 (128512) - GRINNING FACE
Refer the URL for details
The String you have mentioned is HTML Escape of U+1F600, If you unescape it using Apache commons lang it will draw you the required smiley as provided in screenshot
The set of characters from U+0000 to U+FFFF is sometimes referred to as the Basic Multilingual Plane (BMP). Characters whose code points are greater than U+FFFF are called supplementary characters. The Java platform uses the UTF-16 representation in char arrays and in the String and StringBuffer classes. In this representation, supplementary characters are represented as a pair of char values, the first from the high-surrogates range, (\uD800-\uDBFF), the second from the low-surrogates range (\uDC00-\uDFFF).
Regarding your update that its not converting back to 😀
You can also represent a character using a Numeric Character Reference, of the form &#dddd;, where dddd is the decimal value representing the character's Unicode scalar value. You can alternatively use a hexadecimal representation &#xhhhh;, where hhhh is the hexadecimal value equivalent to the decimal value.
A good site for this
Have added few SoP to help you understand this unicode better.
Well - the solution is pretty easy:
use org.apache.commons.lang3.StringEscapeUtils.unescapeHtml4 instead! (unless you're using Java <1.5, which you probably won't)
String characters = "😀";
StringEscapeUtils.unescapeHtml4(characters);
i think the problem is that there is no unicode character "😀"
so the method simply returns this string.
the doc of the function says only
Returns: a new unescaped String, null if null string input
If it's a HTML specific question, then you can just use JavaScript for this purpose.
You can do
escape("😀") which gives you %26%23128512%3B
unescape("%26%23128512%3B") which gives you back 😀
I want to output "Arabic" and "English" text at the same time in Java for example, outputting the following statement: مرحبا I am Adham.
I searched the internet and I found that the BiDi algorithm is needed in this case. Are there any java classes for BiDi.
I have tried this class BiDiReferenceJava and I tested it, but when I call runSample() in the class BidiReferenceTest and entering an arabic string as parameter, I got an OutOfIndexException as the count of the character is duplicated (exactly at this line of code in the class BidiReferenceTestCharmap)
byte[] result = new byte[count];
Where if the string length is 4 the count is 8!
The ICU4J is more or less the standard comprehensive Unicode library for Java, and thus supports the bidirectional algorithm. I really wonder why you need this, though; BiDi is usually applied by the display layer, unless you're a word-processor or something.
BidiReference.java is apparently a demonstration piece; it's designed to show how the algorithm works on ASCII characters instead of using actual Unicode characters.
I'm trying to concatenate several strings containing both arabic and western characters (mixed in the same string). The problem is that the result is a String that is, most likely, semantically correct, but different from what I want to obtain, because the order of the characters is altered by the Unicode Bidirectional Algorithm. Basically, I just want to concatenate as if they were all LTR, ignoring the fact that some are RTL, a sort of "agnostic" concatenation.
I'm not sure if I was clear in my explanation, but I don't think I can do it any better.
Hope someone can help me.
Kind regards,
Carlos Ferreira
BTW, the strings are being obtained from the database.
EDIT
The first 2 Strings are the strings I want to concatenate and the third is the result.
EDIT 2
Actually, the concatenated String is a little different from the one in the image, it got altered during the copy+paste, the 1 is after the first A and not immediately before the second A.
You can embed bidi regions using unicode format control codepoints:
Left-to-right embedding (U+202A)
Right-to-left embedding (U+202B)
Pop directional formatting (U+202C)
So in java, to embed a RTL language like Arabic in an LTR language like English, you would do
myEnglishString + "\u202B" + myArabicString + "\u202C" + moreEnglish
and to do the reverse
myArabicString + "\u202A" + myEnglishString + "\u202C" + moreArabic
See Bidirectional General Formatting for more details, or the Unicode specification chapter on "Directional Formatting Codes" for the source material.
It's very likely that you need to insert Unicode directional formatting codes into your string to get your string display correctly. For details see Directional Formatting Codes of the Unicode Bidirectional Algorithm specification.
Maybe the Bidi class can help you in determining the correct sequence, as it implements the Unicode Bidirectional Algorithm.
It's not changing order of the codepoints. What's happening is that when it comes to display the string, it sees that the string starts with a right-to-left script, so it displays it right-to-left.
What's the best way to insert statistics symbols in a JLabel's text? For example, the x-bar? I tried assigning the text field the following with no success:
<html>x̄
Html codes will not work in Java. However, you can use the unicode escape in Java Strings.
For example:
JLabel label = new JLabel(new String("\u0304"));
Also, here is a cool website for taking Unicode text and turning it into Java String leterals.
Well, that's completely mal-formed HTML, probably even for Swing (I think you would need the </html> at the end for it to work. But I would try to never go that road if you can help it, as Swing's HTML support has many drawbacks and bugs.
You can probably simply insert the appropriate character directly, either directly in the source code if you're using Unicode or with the appropriate Unicode escape:
"x\u0304"
This should work, actually. But it depends on font support and some fonts are pretty bad in positioning combining characters. But short of drawing it yourself it should be your best option.
You can obtain x̄ in adding \u0304 to x character.
In your case, you must generate following string
"x\u0304"
The character \u0304 alone is only a overscore or overline character. It is a special Diacritic character in UNICODE table. You can use it in combination with other normal character to obtain a composite character.
You can find more information on Diacritics characters at following location //en.wikipedia.org/wiki/Combining_character
You can also use
\u0305 to have a longer bar
\u0307 to have a dot above previous character (speed in mechanic)
\u0308 to have 2 dots above previous character (accelaration in
mechanic)
\u0325 to have a circle above previous character
Example
"[x\u0305]" -> x̄
"[z\u0307]" -> x̅
"[t\u0308]" -> ṫ
"[u\u0309]" -> ü
"[A\u030A]" -> Å
In example, only x overlapped with long bar is not correctly displayed in HTML because Chrome browser has a bug (I think).
If you Paste/Copy correctly displayed character in MS Word, you will see correct display of all these characters.
To type a new overlined character in MS Word, you type the character and immediately after you press Alt & 0 774 where 0774 is the base 10 representation of diacritic overline character.