string to encode in CodePage 857

string to encode in CodePage 857 - java

I need a function that can print Turkish characters.
public String convert(String input) {
String output = new String(s.getBytes(input), "CodePage-857");
return output;
}
Is there anyone out there can show me how to achieve this?
Thank you

Java strings are UTF-16 by default which includes the Turkish character set. You can display the strings in UTF-8, UTF-16, or ISO-8859-3.
What view technology are you working with? It's probably configured for ISO-8859-1, which doesn't support Turkish.

This is how I achieved it.
System.Text.Encoding CP857 = System.Text.Encoding.GetEncoding(857);
return CP857.GetBytes("Text goes here");

Java strings are always stored as UTF-16 therefore it wont help to create a new string from your input string. If you wan to print a string in a different encoding then you will need to address that on the actual displaying, e.g.: if you want to display it in a JSP page then the JSP page's encoding must be set to CodePage-857

Related

Java String some characters not showing [duplicate]

I have a problem with turkish special characters on different machines. The following code:
String turkish = "ğüşçĞÜŞÇı";
String test1 = new String(turkish.getBytes());
String test2 = new String(turkish.getBytes("UTF-8"));
String test3 = new String(turkish.getBytes("UTF-8"), "UTF-8");
System.out.println(test1);
System.out.println(test2);
System.out.println(test3);
On a Mac the three Strings are the same as the original string. On a Windows machine the three lines are (Printed with the Netbeans 6.7 console):
?ü?ç?Ü?Ç?
ÄŸÃ¼ÅŸÃ§ÄžÃœÅžÃ‡Ä±
?ü?ç?Ü?Ç?
I don't get the problem.

String test1 = new String(turkish.getBytes());
You're taking the Unicode String including the Turkish characters, and turning it into bytes using the default encoding (using the default encoding is usually a mistake). You're then taking those bytes and decoding them back into a String, again using the default encoding. The result is you've achieved nothing (except losing any characters that don't fit in the default encoding); whether you have put a String through an encode/decode cycle has no effect on what the following System.out.println(test1) does because that's still printing a String and not bytes.
String test2 = new String(turkish.getBytes("UTF-8"));
Encodes as UTF-8 and then decodes using the default encoding. On Mac the default encoding is UTF-8 so this does nothing. On Windows the default encoding is never UTF-8 so the result is the wrong characters.
String test3 = new String(turkish.getBytes("UTF-8"), "UTF-8");
Does precisely nothing.
To write Strings to stdout with a different encoding than the default encoding, you'd create a encoder something like new OutputStreamWriter(System.out, "cp1252") and send the string content to that.
However in this case, it looks like the console is using Windows code page 1252 Western European (+1 ATorres). There is no encoding mismatch issue here at all, so you won't be able to solve it by re-encoding strings!
The default encoding cp1252 matches the console's encoding, it's just that cp1252 doesn't contain the Turkish characters ğşĞŞı at all. You can see the other characters that are in cp1252, üçÜÇ, come through just fine. Unless you can reconfigure the console to use a different encoding that does include all the characters you want, there is no way you'll be able to output those characters.
Presumably on a Turkish Windows install, the default code page will be cp1254 instead and you will get the characters you expect (but other characters don't work). You can test this by changing the ‘Language to use for non-Unicode applications’ setting in the Regional and Language Options Control Panel app.
Unfortunately no Windows locale uses UTF-8 as the default code page. Putting non-ASCII output onto the console with the stdio stream functions is not something that's really reliable at all. There is a Win32 API to write Unicode directly to the console, but unfortunately nothing much uses it.

Don't rely on the console, or on the default platform encoding. Always specify the character encoding for calls like getBytes and the String constructor taking a byte array, and if you want to examine the contents of a string, print out the unicode value of each character.
I would also advise either restricting your source code to use ASCII (and \uxxxx to encode non-ASCII characters) or explicitly specifying the character encoding when you compile.
Now, what bigger problem are you trying to solve?

You may be dealing with different settings of the default encoding.
java -Dfile.encoding=utf-8
versus
java -Dfile.encoding=something else
Or, you may just be seeing the fact that the Mac terminal window works in UTF-8, and the Windows DOS box does not work in UTF-8.
As per Mr. Skeet, you have a third possible problem, which is that you are trying to embed UTF-8 chars in your source. Depending on the compiler options, you may or may not be getting what you intend there. Put this data in a properties file, or use \u escapes.
Finally, also per Mr. Skeet, never, ever call the zero-argument getBytes().

If you are using AspectJ compiler do not forget to set it's encoding to UTF-8 too. I have struggled to find this for hours.

How to display string from Latin-Extended-A charset

I can't display string, which contains Latin-Extended-A chars in an appropriate way.
I have tried getting bytes with different Unicode and creating new String with new Unicode.
If you have string "ăăăăăăăăă". How can i output it in an appropriate way.

Java supports unicode characters. If you have:
String x = "ăăăăăăăăă";
System.out.println(x);
You'll get
ăăăăăăăăă
If you get question marks or funky looking characters, then it's most likely not a problem with java or the code, but with the fonts on your computer not supporting it.

Android: setting up utf-8 encoding to String and Array

I'm lithuanian and I'm creating app in lithuanian language, but Strings can't contain letters such as: ą, č, ę, ė, į, š, ų, ū, ž...
I searched over the internet for simple way to make it possible, but I ended up there...
There is some of my code that I want to modify:
if (dayOfWeek.equals("Wednesday")) {
dayOfWeek = "Treciadienis"; //this should be Trečiadienis
}
And I have Array that has bunch of these letters. How should I deal with it?
static JSONArray jArray = new JSONArray(data);
Thank you in advance!

A string can contain the letter ą. The following code dayOfWeek = "Treciadienis";.
Do you have checked if your file is encoded in UTF-8 ? For that under Eclipse, do File => Properties, and you'll see in the bottom the Text file encoding.
If you really can not, I think your talking about a a with ogonek, the other solution is to refer on bytes value of the String, and to do : dayOfWeek = "Tre".concat(new String(new byte[]{(byte) 0xC4})).concat("iadienis"); (yep, quite extreme, but it works).

It's very common if your're using Windows that Eclipse sets the default encoding to Cp1252, which you must change to UTF-8 so you're able to use that kind of characters hardcoded in your .java files.
Don't forget that you can also use the string constructor:
String(byte[] data, String charsetName)

Adding Tod gahfy's answer:
Instead of adding individual bytes you can use the \uxxxx syntax within the String. Xxxx is the Unicode code point of the character. This is of course more annoying than using UTF-8 encoding but less annoying than adding bytes.

Chinese Strings Handling in Java? [duplicate]

This question already has answers here:
How to read UTF 8 encoded file in java with turkish characters
(3 answers)
Closed 9 years ago.
In my assigned project, the original author has written a function:
public String asString() throws DataException
{
if (getData() == null) return null;
CharBuffer charBuf = null;
try
{
charBuf = s_charset.newDecoder().decode(ByteBuffer.wrap(f_data));
}
catch (CharacterCodingException e)
{
throw new DataException("You can't have a string from this ParasolBlob: " + this, e);
}
return charBuf.toString()+"你好";
}
Please note that the constant s_charset is defined as:
private static final Charset s_charset = Charset.forName("UTF-8");
Please also note that I have hard-coded a Chinese string in the return string.
Now when the program flow reaches this method, it will throw the following exception:
java.nio.charset.UnmappableCharacterException: Input length = 2
And more interstingly, the hard-coded Chinese strings will be shown as "??" at the console if I do a System.out.println().
I think this problem is quite interesting in regard of Localization. And I've tried changing it to
Charset.forName("GBK");
but seems is not the solution. Also, I have set the coding of the Java class to be of "UTF-8".
Any experts have experience in this regard? Would you please share a little? Thanks in advance!

And more interstingly, the hard-coded Chinese strings will be shown as
"??" at the console if I do a System.out.println().
System.out performs transcoding operations from UTF-16 strings to the default JRE character encoding. If this does not match the encoding used by the device receiving the character data is corrupted. So, the console should be set to use the right character encoding(UTF-8) to render the chinese chars properly.
If you are using eclipse then you can change the console encoding by going to
Run Configuration-> Common -> Encoding(slect UTF-8 from dropdown)

Java Strings are unicodes
System.out.println("你好");

As Kevin stated, depending on what is the underlying encoding of your source file, this encoding will be used to convert it to UTF-16BE (real encoding of Java String). So when you see "??" it is surely simple conversion error.
Now, if you want to convert simple byte array to String, using given character encoding, I believe there is much easier way to do this, than using raw CharsetDecoder. That is:
byte[] bytes = {0x61};
String string = new String(bytes, Charset.forName("UTF-8"));
System.out.println(string);
This will work, provided that the byte array really contains UTF-8 encoded stream of bytes. And it must be without BOM, otherwise the conversion will probably fail. Make sure that what you are trying to convert does not start with the sequence 0xEF 0xBB 0xBF.

How can I determine the correct encoding?

I was trying to print some chinese characters as below but this won't work. I suppose there should be some sort of encoding to be done. Can you please help mo on this?
public static void main(String[] args)
{
String myString = "奥妙洗衣粉";
System.out.println(myString);
// Output in eclipse: Some characters cannot be mapped using Cp1252 character encoding.
// Either change the encoding or remove the characters which are not supported
// by the Cp1252 character encoding.
}
EDIT: How can I do it (change/apply the encoding) programatically before printing the string?

Windows-1252 character encoding does not support the characters in your code. Use UTF-8.

You can change the default encoding in file output:
new PrintWriter(fileName, "UTF-8");
Another problem, the compiler can read only ASCII characters (but the JVM could read others as well). This means, strings cannot be built from foreign characters. The proper way to do it, determine the character's Unicode - 4 char hexadecimal code - and build like this:
String myString = "\u3b12\uc2d4hello"
This creates a string with the first char as the code 3b12 (escaped with the \u Unicode character) + c2d4 + hello.
Here's my output:
㬒싔hello

the console in eclipse does not support those characters by default.
here is a tutorial on that

the problem is that the eclipse console encoding is not utf-8.
you should change the console encoding,
i hope this link help you : http://www.mkyong.com/java/how-to-display-chinese-character-in-eclipse-console/

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.