I made a little project that converts a hexadecimal string into an ASCII string. When i convert te value then i send it to a client. But my client doesn't reconised the value.
I searched why and i saw that when i convert the ASCII string back to hexadecimal, then i get a little bit differend value back .. So i think something has going wrong when i sended the data .. But i don't no how to fix my problem ..
I also tried to convert the hex first to dec and then to ascii , also i tried the more noob whay , just send a command with for example this :
char p = 3;
char d = 4;
bw3.write(p + "" + c + "");
So this is the code i get now :
ServerSocket welcomeSocket2 = new ServerSocket(9999);
Socket socket2 = welcomeSocket2.accept();
OutputStream os3 = socket2.getOutputStream();
OutputStreamWriter osw3 = new OutputStreamWriter(os3);
BufferedWriter bw3 = new BufferedWriter(osw3);
String hex4 = "00383700177a0102081c4200000000000001a999c338030201000a080000000000000000184802000007080444544235508001000002080104";
StringBuilder output4 = new StringBuilder();
for (int i =0; i< hex4.length(); i +=2){
String str4 = hex4.substring(i, i+2);
int outputdecimal = Integer.parseInt(str4,16);
char hexchar = (char)outputdecimal;
System.out.println(str4);
output4.append(hexchar);
}
bw3.write(output4.toString());
bw3.flush();
What i also noticed is that when i send a command that is only 4 bytes long or 10 then everything is going good. I receive my converted ascii code good. The command that i now wanne send is 58 bytes long.
ASCII is not capable to represent all possible data expressed in hexadecimal.
Therefore, as long as you'll try to convert your hexa to ASCII, nothing you try will ever work.
Your hexadecimal contain purely binary, computer-y opaque data. ASCII is what you use to represent text. There are some binary data that are made out of ASCII and therefore can be represented in ASCII. And there are all the other data than these ones. Those will always end up wrong when you try to convert hexadecimal to ASCII. This is simply because ASCII is meant to be unable to do that, by the very definition of what it is.
Related
How can UTF-8 value like =D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0 be converted in Java?
I have tried something like:
Character.toCodePoint((char)(Integer.parseInt("D0", 16)),(char)(Integer.parseInt("93", 16));
but it does not convert to a valid code point.
That string is an encoding of bytes in hex, so the best way is to decode the string into a byte[], then call new String(bytes, StandardCharsets.UTF_8).
Update
Here is a slightly more direct version of decoding the string, than provided by "sstan" in another answer. Of course both versions are good, so use whichever makes you more comfortable, or write your own version.
String src = "=D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0";
assert src.length() % 3 == 0;
byte[] bytes = new byte[src.length() / 3];
for (int i = 0, j = 0; i < bytes.length; i++, j+=3) {
assert src.charAt(j) == '=';
bytes[i] = (byte)(Character.digit(src.charAt(j + 1), 16) << 4 |
Character.digit(src.charAt(j + 2), 16));
}
String str = new String(bytes, StandardCharsets.UTF_8);
System.out.println(str);
Output
Газета
In UTF-8, a single character is not always encoded with the same amount of bytes. Depending on the character, it may require 1, 2, 3, or even 4 bytes to be encoded. Therefore, it's definitely not a trivial matter to try to map UTF-8 bytes yourself to a Java char which uses UTF-16 encoding, where each char is encoded using 2 bytes. Not to mention that, depending on the character (code point > 0xffff), you may also have to worry about dealing with surrogate characters, which is just one more complication that you can easily get wrong.
All this to say that Andreas is absolutely right. You should focus on parsing your string to a byte array, and then let the built-in libraries convert the UTF-8 bytes to a Java string for you. From a Java String, it's trivial to extract the Unicode code points if that's what you want.
Here is some sample code that shows one way this can be achieved:
public static void main(String[] args) throws Exception {
String src = "=D0=93=D0=B0=D0=B7=D0=B5=D1=82=D0=B0";
// Parse string into hex string tokens.
String[] tokens = Arrays.stream(src.split("="))
.filter(s -> s.length() != 0)
.toArray(String[]::new);
// Convert the hex string representations to a byte array.
byte[] utf8bytes = new byte[tokens.length];
for (int i = 0; i < utf8bytes.length; i++) {
utf8bytes[i] = (byte) Integer.parseInt(tokens[i], 16);
}
// Convert UTF-8 bytes to Java String.
String str = new String(utf8bytes, StandardCharsets.UTF_8);
// Display string + individual unicode code points.
System.out.println(str);
str.codePoints().forEach(System.out::println);
}
Output:
Газета
1043
1072
1079
1077
1090
1072
There are many similar questions, but no one helped me.
utf-8 can be 1 byte or 2,3,4.
ISO-8859-15 is allways 2 bytes.
But I need 1 byte character like code page Code "page 863" (IBM863).
http://en.wikipedia.org/wiki/Code_page_863
For example "é" is code point 233 and is 2 bytes long in utf 8, how can I convert it to IBM863 (1 byte) in Java?
Running on JVM -Dfile.encoding=UTF-8 possible?
Of course that conversion would mean that some characters can be lost, because IBM863 is smaller.
But I need the language specific characters, like french, è, é etc.
Edit1:
String text = "text with é";
Socket socket = getPrinterSocket( printer);
BufferedWriter bwOut = getPrinterWriter(printer,socket);
...
bwOut.write("PRTXT \"" + text + "\n");
...
if (socket != null)
{
bwOut.close();
socket.close();
}
else
{
bwOut.flush();
}
Its going a label printer with Fingerprint 8.2.
Edit 2:
private BufferedWriter getPrinterWriter(PrinterLocal printer, Socket socket)
throws IOException
{
return new BufferedWriter(new OutputStreamWriter(socket.getOutputStream()));
}
First of all: there is no such thing as "1 byte char" or, in fact, "n byte char" for whatever n.
In Java, a char is a UTF-16 code unit; depending on the (Unicode) code point, either one, or two chars, are necessary to represent a code point.
You can use the following methods:
Character.toChars() to turn a Unicode code point into a char array representing this code point;
a CharsetEncoder to perform the char[] to byte[] conversion;
a CharsetDecoder to perform the byte[] to char[] conversion.
You obtain the two latter from a Charset's .new{Encoder,Decoder}() methods.
It is crucially important here to know what your input is exactly: is it a code point, is it an encoded byte array? You'll have to adapt your code depending on this.
Final note: the file.encoding setting defines the default charset to use when you don't specify a charset to use, for instance in a FileReader constructors; you should avoid not specifying a charset to begin with!
byte[] someUtf8Bytes = ...
String decoded = new String(someUtf8Bytes, StandardCharsets.UTF8);
byte[] someIso15Bytes = decoded.getBytes("ISO-8859-15");
byte[] someCp863Bytes = decoded.getBytes("cp863");
If you start with a string, use just getBytes with a proper encoding.
If you want to write strings with a proper encoding to a socket, you can either use OutputStream instead of PrintStream or Writer and send byte arrays, or you can do:
new BufferedWriter(new OutputStreamWriter(socket.getOutputStream(), "cp863"))
How do you convert a specific charset to unicode in Java?
charsets have been discussed quite a lot here, but I think this one hasn't been covered yet.
I have a hex-string that meets the criteria length%4==0 (e.g. \ud3faef8e). usually I just display this in an HTML container and add &#x to the front and ; to the back of each hex quadruple.
but in this case the following procedure led to the correct output (non-Java)
paste hex string into Hex-Editor and save the file to test.txt (utf-8)
open the file with Notepad++
change the encoding to Simplified Chinese (GB2312)
Now I'm trying to do the same in Java.
// having hex convert to ascii
String ascii = "";
for (int cnt = 0; cnt <= unicode.length() - 2; cnt += 2) {
String tmp = unicode.substring(cnt, cnt + 2);
int decimal = Integer.parseInt(tmp, 16);
ascii += (char) decimal;
}
// writing ascii to file at this point leads to the same result as in step 2 before
try {
// get the bytes
byte[] utf8 = ascii.getBytes("UTF-8"); // == UTF8
// convert to gb2312
String converted = new String(utf8, "GB2312"); // == EUC_CN
// write to file (writer with declared UTF-8)
writeToFile(converted, 20 + cntu);
cntu++;
} catch (Exception e) {
System.err.println(e.getMessage());
}
the output looks according the should-output, except the fact that randomly the following character is displayed: � why does this one come up? and how can I get rid of it?
in the end, what I'd like to get is the converted unicode again to be able to display it with my original approach (폴), but I haven't figured out a way to get to the hex values again (they don't match the criteria length%4==0). how do I get the hex values of the characters?
update1
to be more precise, regarding the input, I'm assuming that it is Unicode, because of the start of the String with \u, which would be sufficient for my usual approach, but not in the case I am describing above.
update2
the writeToFile method
FileOutputStream fos = new FileOutputStream("test" + id + ".txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(str);
out.close();
I tried with GB2312 as well, but there is no change. I still get the ? inbetween the correct characters.
update3
the expected output for \ud3f6ef8e is 遇飵 , you get to it when following the steps 1 to 3. (HxD as an example of an hex editor)
there was no indication that I should delete my question, thus I'm writing my final comment as the answer
I was misinterpreting the incoming hex-digits. they were in a specific charset and not uni-code, so they represented the hex-values of a character in that charset. What I'm doing now is new String(byteArray, "CharsetName"); and get (int)s.charAt(i) to get the unicode value and write it to HTML. thanks for your ideas and hints
for more details see this answer here: https://stackoverflow.com/a/4049781/1338732 , and this question here: How to convert UTF-8 to unicode in Java?
Good day.
I have an ASCII file with Spanish words. They contain only characters between A and Z, plus Ñ, ASCII Code 165 (http://www.asciitable.com/).
I get this file with this source code:
InputStream is = ctx.getAssets().open(filenames[lang_code][w]);
InputStreamReader reader1 = new InputStreamReader(is, "UTF-8");
BufferedReader reader = new BufferedReader(reader1, 8000);
try {
while ((line = reader.readLine()) != null) {
workOn(line);
// do a lot of things with line
}
reader.close();
is.close();
} catch (IOException e) { e.printStackTrace(); }
What here I called workOn() is a function that should extract the characters codes from the strings and is something like that:
private static void workOn(String s) {
byte b;
for (int w = 0; w < s.length(); w++) {
b = (byte)s.charAt(w);
// etc etc etc
}
}
Unfortunately what happens here is that I cannot identify b as an ASCII code when it represents the Ñ letter. The value of b is correct for any ascii letter, and returns -3 when dealing with Ñ, that, brought to signed, is 253, or the ASCII character ². Nothing similar to Ñ...
What happens here? How should I get this simple ASCII code?
What is getting me mad is that I cannot find a correct coding. Even, if I go and browse the UTF-8 table (http://www.utf8-chartable.de/) Ñ is 209dec and 253dec is ý, 165dec is ¥. Again, not event relatives to what I need.
So... help me please! :(
Are you sure that your source file you are reading is UTF-8 encoded? In UTF-8 encoding, all values greater than 127 are reserved for a multi-byte sequence, and they are never seen standing on their own.
My guess is that the file you are reading is encoded using "code page 237" which is the original IBM PC character set. In that character set, the Ñ is represented by the decimal 165.
Many modern systems use ISO-8859-1, which happen to be equivalent to the first 256 characters of the Unicode character set. In those, the Ñ character is a decimal 209. In a comment, the author clarified that a 209 is actually in the file.
If the file was really UTF-8 encoded, then the Ñ would be represented as a two-byte sequence, and would be neither the value 165 nor the value 209.
Based on the above assumption that the file is ISO-8859-1 encoded, you should be able to solve the situation by using:
InputStreamReader reader1 = new InputStreamReader(is, "ISO-8859-1");
This will translate to the Unicode characters, and you should then find the character Ñ represented by decimal 209.
I am receiving a string text via USB communication in android in form of extended ASCII characters like
String receivedText = "5286T11ɬ ªË ¦¿¯¾ ¯¾ ɬ ¨¬°:A011605286 ª¿ª ¾®:12:45 ¸Í®°:(9619441121)ª¿ª:-, ®¹¿¦Í°¾ ¡ ®¹¿¦Í°¾ ª¨À, ¾¦¿µ²À ¸Í, ¾¦¿µ²À ªÂ°Íµ °¿®¾°Í͸:- ¡Í°Éª:-, ¬¾¹°, ¸¾¤¾Í°Â¼ ªÂ°Íµ~";
Now these character represents a string in hindi.
I am not getting how to convert this received string into hindi equivalent text.
Any one knows how to convert this into equivalent hindi text using java
Following is the piece of code which I am using to convert byte array to byte string
public String byteArrayToByteString(byte[] arayValue, int size) {
byte ch = 0x00;
int i = 0;
if (arayValue == null || arayValue.length <= 0)
return null;
String pseudo[] = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9",
"A", "B", "C", "D", "E", "F" };
StringBuffer out = new StringBuffer();
while (i < size) {
ch = (byte) (arayValue[i] & 0xF0); // Strip off high nibble
ch = (byte) (ch >>> 4); // shift the bits down
ch = (byte) (ch & 0x0F); // must do this is high order bit is on!
out.append(pseudo[(int) ch]); // convert the nibble to a String
// Character
ch = (byte) (arayValue[i] & 0x0F); // Strip off low nibble
out.append(pseudo[(int) ch]); // convert the nibble to a String
// Character
i++;
}
String rslt = new String(out);
return rslt;
}
Let me know if this helps in finding solution
EDIT:
Its an UTF-16 encoding and the characters in receivedText string is in form of extended ASCII for hindi characters
New Edit
I have new characters
String value = "?®Á?Ƕ ¡??°¿¯¾";
Which says मुकेश in hindi and dangaria in hindi. Google translator is not translating dangaria in hindi so I cannot provide you hindi version of it.
I talked to the person who is encoding he said that he removed 2 bits from the input before encoding i.e. if \u0905 represents अ in hindi then he removed \u09 from the input and converted remaining 05 in extended hexadecimal form.
So the new input string I provided you is decoded in form of above explanation. i.e. \u09 is been removed and rest is converted into extended ascii and then sent to device using USB.
Let me know if this explanation helps you in finding out solution
I've been playing around with this a bit and have an idea of what you might need to do. It looks like the value for receivedText that you have in your posting is encoded in windows-1252 for some reason. It was probably from pasting it into this post perhaps. Providing the raw byte values would be better to avoid any encoding errors. Regardless, I was able to get that String into the following Unicode Devanagari characters:
5286T11फए ऋभ इडऒठ ऒठ फए उएओ:A011605286 ऋडऋ ठऍ:12:45 चयऍओ:(9619441121)ऋडऋ:-, ऍछडइयओठ ँ ऍछडइयओठ ऋउढ, ठइडगऑढ चय, ठइडगऑढ ऋतओयग ओडऍठओययच:- ँयओफऋ:-, एठछओ, चठअठयओतञ ऋतओयग~
With the following code:
final String receivedText = "5286T11ɬ ªË ¦¿¯¾ ¯¾ ɬ ¨¬°:A011605286 ª¿ª ¾®:12:45 ¸Í®°:(9619441121)ª¿ª:-, ®¹¿¦Í°¾ ¡ ®¹¿¦Í°¾ ª¨À, ¾¦¿µ²À ¸Í, ¾¦¿µ²À ªÂ°Íµ °¿®¾°Í͸:- ¡Í°Éª:-, ¬¾¹°, ¸¾¤¾Í°Â¼ ªÂ°Íµ~";
final Charset fromCharset = Charset.forName("x-ISCII91");
final CharBuffer decoded = fromCharset.decode(ByteBuffer.wrap(receivedText.getBytes("windows-1252")));
final Charset toCharset = Charset.forName("UTF-16");
final byte[] encoded = toCharset.encode(decoded).array();
System.out.println(new String(encoded, toCharset.displayName()));
Whether or not those are the expected characters is something you would need to tell me :)
Also, I'm not sure if the x-ISCII91 character encoding is available in Android.
Generally, for a byte array that you know to be a string value, you can use the following.
Assuming byte[] someBytes:
String stringFromBytes = new String(someBytes, "UTF-16");
You may replace "UTF-16" with the approprate charset, which you can find after some experimentation. This link detailing java's supported character encodings may be of help.
From the details you have provided I would suggest considering the following:
If you're reading a file from a USB drive, android might have existing frameworks that will help you do this in a more standard way.
If you most certainly need to read in and manipulate the bytes from the USB port directly, make sure that you are familiar with the API/protocol of the data you are reading. It may be that some of the bytes are control messages or something similar that cannot be converted to strings, and you will need to identify exactly where in the byte stream the string begins (and ends).
hindi = new String(receivedText.getBytes(), "UTF-16");
But this does not really look like hindi.. are you sure it is encoded as UTF-16?
Edit:
String charset = "UTF-8";
hindi = new String(hindi.getBytes(Charset.forName(charset)), "UTF-16");
Replace UTF-8 with the actual charsed that resulted in your loooong String.