Read Java in as Hex - java

I have tried to solve this but I keep coming up with stuff that is no help I'm sure this is easy (when you know how of course ;) )
What I would like to do is read in a file using a byte stream like below:
while((read = in.read()) != -1){
//code removed to save space
Integer.toHexString(read);
System.out.println(read);
}
When it prints out the Hex to the screen it will print out numbers fine e.g
31
13
12
0
but when it comes to a hex code that should be 01 31 it will print 0 131. I want to read it in to a variable like you would see in a hex editor i.e 00 11 21 31 no single numbers as i need to scan the whole file and look for patterns which I know how to do I'm just stuck on this :/
so in short i need a variabe to contain the two hex characters i.e int temp = 01 not int temp = 0 , I hope this all makes sense, I'm a little confused as it's 3am!
If anyone knows how to do this I would be most greatful, p.s thanks for the help in advance this site has saved me loads of research and have learnt a lot!
Many thanks.

This method :
public static void printHexStream(final InputStream inputStream, final int numberOfColumns) throws IOException{
long streamPtr=0;
while (inputStream.available() > 0) {
final long col = streamPtr++ % numberOfColumns;
System.out.printf("%02x ",inputStream.read());
if (col == (numberOfColumns-1)) {
System.out.printf("\n");
}
}
}
will output something like this :
40 32 38 00 5f 57 69 64 65 43
68 61 72 54 6f 4d 75 6c 74 69
42 79 74 65 40 33 32 00 5f 5f
69 6d 70 5f 5f 44 65 6c 65 74
65 46 69 6c 65 41 40 34 00 5f
53 65 74 46 69 6c 65 50 6f 69
6e 74 65 72 40 31 36 00 5f 5f
69 6d 70 5f 5f 47 65 74 54 65
6d 70 50 61 74 68 41 40 38 00
Is it what you are looking for?

I think what you're looking for is a formatter. Try:
Formatter formatter = new Formatter();
formatter.format("%02x", your_int);
System.out.println(formatter.toString());
Does that do what you're looking for? Your question wasn't all that clear (and I think maybe you deleted too much code from your snippet).

import org.apache.commons.io.IOUtils;
import org.apache.commons.codec.binary.Hex;
InputStream is = new FileInputStream(new File("c:/file.txt"));
String hexString = Hex.encodeHexString(IOUtils.toByteArray(is));
In java 7 you can read byte array directly from file as below :
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.Path;
Path path = Paths.get("path/to/file");
byte[] data = Files.readAllBytes(path)

Hi everyone one posted, thanks for the reply but the way I eneded up doing it was:
hexIn = in.read();
s = Integer.toHexString(hexIn);
if(s.length() < 2){
s = "0" + Integer.toHexString(hexIn);
}
Just thought I would post they way I did it for anyone else in future, thank you soo much for your help though!

Related

How to find a specific byte in many bytes?

I readed a file using Java and use HexDump to output the data. It looks like this:
The first and second line:
one:31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31
two: 30 31 31 30 30 31 31 30 31 31 30 30 31 31 30 31
I want to print the data between first "31 30 30 31"and the second "31 30 30 31".My ideal ouput is 31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31
30 31.
But the real output is wrong,I think my code can not find the 31 30 30 31 in the data1.How to figure it out?
I Use jdk 1.7 and the software is idea
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.File;
public class TestDemo{
public static void main(String[] args) {
try {
File file = new File("/0testData/1.bin");
DataInputStream isr = new DataInputStream(newFileInputStream(file));
int bytesPerLine = 16;
int byteCount = 0;
int data;
while ((data = isr.read()) != -1) {
if (byteCount == 0)
System.out.println();
else if (byteCount % bytesPerLine == 0)
System.out.printf("\n",byteCount );
else
System.out.print(" ");
String data1 = String.format("%02X",data & 0xFF);
System.out.printf(data1);
byteCount += 1;
if(data1.contains("31 30 30 31")) {
int i=data1.indexOf("31 30 30 31",12);
System.out.println("find it!");
String strEFG=data1.substring(i,i+53);
System.out.println("str="+strEFG);
}else {
System.out.println("cannot find it");
}
}
} catch (Exception e) {
System.out.println("Exception: " + e);
}
}
}
My ideal ouput is 31 30 30 31 30 30 30 31 31 30 30 31 30 31 31 31
30 31.
But the real output is:
31cannot find it
30cannot find it
30cannot find it
31cannot find it
30cannot find it
30cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
30cannot find it
30cannot find it
30cannot find it
31cannot find it
30cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
31cannot find it
30cannot find it
30cannot find it
31cannot find it
30cannot find it
30cannot find it
I feel that your input data is a bit confusing. Nevertheless, this probably answers your question.
It doesn't give quite the same output that you are asking for, but I think you should be able to tweak it to turn on or off the output by using the flag "inPattern". If inPattern is true, print your data read from the file, if false, do not print the data read from the file.
This is probably not the best form of coding as it is entirely static methods - but it does what you ask for.
The problem with your code (I think) is that data1 will be a 2 character string. It is impossible for it to contain a 11 character string ("31 30 30 31"). If you tried reversing the test (i.e. "31 30 30 31".contains(data1)) then it will only be matching a single byte - not the 4 bytes you are intending to match.
package hexdump;
import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.InputStream;
import java.util.LinkedList;
public class HexDumpWithFilter {
// private static final int beginPattern [] = { 0x47, 0x0d, 0x0a, 0x1a };
private static final int beginPattern [] = { 0x00, 0x83, 0x7d, 0x8a };
private static final int endPattern [] = { 0x23, 0x01, 0x78, 0xa5 };
private static LinkedList<Integer> bytesRead = new LinkedList();
public static void main(String[] args) {
try {
InputStream isr = new DataInputStream(new FileInputStream("C:\\Temp\\resistor.png"));
int bytesPerLine = 16;
int byteCount = 0;
int data;
boolean inPattern = false;
while ((data = isr.read()) != -1) {
// Capture the data just read into an input buffer.
bytesRead.add(data);
// If we have too much data in the input buffer to compare to our
// pattern, peel off the first byte.
// Note: This assumes that the begin pattern and end Pattern are the same lengths.
if (bytesRead.size() > beginPattern.length) {
bytesRead.removeFirst();
}
// Output a byte count at the start of each new line of output.
if (byteCount % bytesPerLine == 0)
System.out.printf("\n%04x:", byteCount);
// Output the spacing - if we have found our pattern, then also output an asterisk
System.out.printf(inPattern ? " *%02x" : " %02x", data);
// Finally check to see if we have found our pattern if we have enough bytes
// in our bytesRead buffer.
if (bytesRead.size() == beginPattern.length) {
// If we are not currently in a pattern, then check for the begin pattern
if (!inPattern && checkPattern(beginPattern, bytesRead)) {
inPattern = true;
}
// if we are currently in a pattern, then check for the end pattern.
if (inPattern && checkPattern (endPattern, bytesRead)) {
inPattern = false;
}
}
byteCount += 1;
}
System.out.println();
} catch (Exception e) {
System.out.println("Exception: " + e);
}
}
/**
* Function to check whether our input buffer read from the file matches
* the supplied pattern.
* #param pattern the pattern to look for in the buffer.
* #param bytesRead the buffer of bytes read from the file.
* #return true if pattern and bytesRead have the same content.
*/
private static boolean checkPattern (int [] pattern, LinkedList<Integer> bytesRead) {
int ptr = 0;
boolean patternMatch = true;
for (int br : bytesRead) {
if (br != pattern[ptr++]) {
patternMatch = false;
break;
}
}
return patternMatch;
}
}
There is a small problem with this code in that it does not mark the beginning pattern, but does mark the ending pattern. Hopefully this is not a problem for you. If you need to correctly mark the beginning or not mark the ending, then there will be another level of complexity. Basically you would have to read ahead in the file and write the data out 4 bytes behind the data you have been reading. This could be achieved by printing the value that comes off of the buffer at the line which reads:
bytesRead.removeFirst();
rather than printing the value read from the file (i.e. the value in the "data" variable).
Following is an example of the data produced when run against a PNG file of an image of a resistor.
0000: 89 50 4e 47 0d 0a 1a 0a 00 00 00 0d 49 48 44 52
0010: 00 00 00 60 00 00 00 1b 08 06 00 00 00 83 7d 8a
0020: *3a *00 *00 *00 *09 *70 *48 *59 *73 *00 *00 *2e *23 *00 *00 *2e
0030: *23 *01 *78 *a5 3f 76 00 00 00 07 74 49 4d 45 07 e3
0040: 03 0e 17 1a 0f c2 80 9c d0 00 00 01 09 49 44 41
0050: 54 68 de ed 9a 31 0b 82 40 18 86 cf 52 d4 a1 7e
0060: 45 4e 81 5b a3 9b 10 ae ae 4d 4d 61 7f a1 21 1b
0070: fa 0b 45 53 53 ab ab 04 6e 42 4b 9b d0 64 bf a2
0080: 06 15 a9 6b ef 14 82 ea ec e8 7d c6 f7 0e f1 be
0090: e7 3b 0f 0e 25 4a 29 25 a0 31 5a 28 01 04 fc 35
00a0: f2 73 e0 af af b5 93 fd c9 8c cd 36 cb da f9 ae
00b0: ad 11 d3 50 84 2e 50 92 96 24 88 f2 ca b1 41 7b
00c0: cc 64 c7 db b6 be 7e 5e 87 ef 0e 08 e3 82 64 85
00d0: b8 47 4c 56 50 12 c6 85 b8 9f 20 1e 0b 10 bd 81
00e0: 64 1e 5b 38 49 cb ca 31 e3 7c 67 b2 b4 c7 f6 c4
00f0: 62 da 65 b2 f9 ea c2 64 a7 dd 90 c9 fa a3 3d 0e
0100: 61 00 01 10 00 20 00 02 00 04 40 00 80 00 08 00
0110: 10 00 01 00 02 7e 82 af 5f c6 99 86 42 5c 5b 7b
0120: eb 19 be f7 e2 8d a4 77 f8 e8 bb 07 51 5e 7b 91
0130: 28 c4 0e d0 55 89 38 96 2a 6c 77 3a 96 4a 74 55
0140: 12 57 00 8f 05 88 de 40 12 fe 8a c0 21 0c 01 00
0150: 02 20 00 34 c3 03 f7 3f 46 9a 04 49 f8 9d 00 00
0160: 00 00 49 45 4e 44 ae 42 60 82
Note that some of the bytes have an asterisk in front of them? These are the bytes that are inside of the beginPattern and endPattern.
Also note that I used a beginPattern and an endPattern. You do not need to do this, I only did it to make it easier for me to find a pattern in my resistor.png file to test the pattern matching. You can use one variable for both begin and end, set the same value for both or simply assign endPattern = beginPattern if you want to use a single pattern (e.g. "0x31, 0x30, 0x30, 0x31") for the start and finish.

Extract .gz files in java

I'm trying to unzip some .gz files in java. After some researches i wrote this method:
public static void gunzipIt(String name){
byte[] buffer = new byte[1024];
try{
GZIPInputStream gzis = new GZIPInputStream(new FileInputStream("/var/www/html/grepobot/API/"+ name + ".txt.gz"));
FileOutputStream out = new FileOutputStream("/var/www/html/grepobot/API/"+ name + ".txt");
int len;
while ((len = gzis.read(buffer)) > 0) {
out.write(buffer, 0, len);
}
gzis.close();
out.close();
System.out.println("Extracted " + name);
} catch(IOException ex){
ex.printStackTrace();
}
}
when i try to execute it i get this error:
java.util.zip.ZipException: Not in GZIP format
how can i solve it? Thanks in advance for your help
Test a sample, correct, gzipped file to see whether the problem lies in your code or not.
There are many possible ways to build a (g)zip file. Your file may have been built differently from what Java's built-in support expects, and the fact that one uncompressor understands a compression variant is no guarantee that Java will also recognize that variant. Please verify exact file type with file and/or other uncompression utilities that can tell you which options were used when compressing it. You may also have a look at the file itself with a tool such as hexdump. This is the output of the following command:
$ hexdump -C lgpl-2.1.txt.gz | head
00000000 1f 8b 08 08 ed 4f a9 4b 00 03 6c 67 70 6c 2d 32 |.....O.K..lgpl-2|
00000010 2e 31 2e 74 78 74 00 a5 5d 6d 73 1b 37 92 fe 8e |.1.txt..]ms.7...|
00000020 ba 1f 81 d3 97 48 55 34 13 7b 77 73 97 78 2b 55 |.....HU4.{ws.x+U|
00000030 b4 44 d9 bc 95 25 2d 29 c5 eb ba ba aa 1b 92 20 |.D...%-)....... |
00000040 39 f1 70 86 99 17 29 bc 5f 7f fd 74 37 30 98 21 |9.p...)._..t70.!|
00000050 29 7b ef 52 9b da 58 c2 00 8d 46 bf 3c fd 02 d8 |){.R..X...F.<...|
00000060 da fe 3f ef 6f 1f ed cd 78 36 1b 4f ed fb f1 ed |..?.o...x6.O....|
00000070 78 3a ba b1 f7 8f ef 6e 26 97 96 fe 1d df ce c6 |x:.....n&.......|
00000080 e6 e0 13 f9 e7 57 57 56 69 91 db 37 c3 d7 03 7b |.....WWVi..7...{|
00000090 ed e6 65 93 94 7b fb fa a7 9f 7e 32 c6 5e 16 bb |..e..{....~2.^..|
In this case, I used standard gzip on this license text. The 1st few bytes are unique to GZipped files (although they do not specify variants) - if your file does not start with 1f 8b, Java will complain, regardless of remaining contents.
If the problem is due to the file, it is possible that other uncompression libraries available in Java may deal with the format correctly - for example, see Commons Compress
import com.horsefly.utils.GZIP;
import org.apache.commons.io.FileUtils;
....
String content = new String(new GZIP().decompresGzipToBytes(FileUtils.readFileToByteArray(fileName)), "UTF-8");
in case someone needs it.

Store data into aerospike

I have some data which is to be stored into aerospike against a column
Suppose the incomming data is
["A", 1]
Now the first question is how to hold this data in Java.
I tried this.
ArrayList value = new ArrayList();
value.add(new String("A"));
value.add(new Integer(2));
When i try to write this data to aerospike using
AerospikeClient client = client.put(new WritePolicy(),
new Key("namespace", "set", "test"),
new Bin("binName", value) );
Then if i use AQL to query then i am seeing
| AC ED 00 05 73 72 00 13 6A 61 76 61 2E 75 74 69 6C 2E 41 72 72 61 79 4C 69 73 74 78 81 D2 1D 99 C7 61 9D 03 00 01 49 00 04 73 69 7A 65 78 70 00 00 00 02 77 04 00 00 00 02 73 72 00 11 6A 61 76 61 2E 6C 61 6E 67 2E 49 6E 74 65 67 65 72 12 E2 A0 A4 F7 81 87 |
Some HexaDecimal numbers
but when i try to store the Data into Aerospike using
AerospikeClient client = client.put(new WritePolicy(),
new Key("namespace", "set", "test"),
new Bin("binName", Value.getAsList(value)) );
Then firing query through AQL gives me
["A",1]
Which is and seems like the intended behaviour but when i use the Aerospike client to fetch the values and check their types
List<Object> ret = (List<Object>) client.get(new Policy(), key, "test").getValue("binName");
if(ret.get(0) instanceof Long){
System.out.println("Got instance of long");
}
Then i can see the print statement though Initially i sent Integer data.
Why is this happening, and can anyone tell me the any alternate solution to save an incomming data into aerospike say the data is
["A",1]
PS:Please support your answer with small code snippit
FOUND SOME INFO ON GITHUB
In reference to this link there is some function
which i am copy/pasting below
/**
* Write/Read ArrayList<Object> directly instead of relying on java serializer.
*/
private void testListComplex(AerospikeClient client, Parameters params) throws Exception {
console.info("Read/Write ArrayList<Object>");
Key key = new Key(params.namespace, params.set, "listkey2");
client.delete(params.writePolicy, key);
byte[] blob = new byte[] {3, 52, 125};
ArrayList<Object> list = new ArrayList<Object>();
list.add("string1");
list.add(2);
list.add(blob);
Bin bin = new Bin(params.getBinName("listbin2"), list);
client.put(params.writePolicy, key, bin);
Record record = client.get(params.policy, key, bin.name);
List<?> receivedList = (List<?>) record.getValue(bin.name);
validateSize(3, receivedList.size());
validate("string1", receivedList.get(0));
// Server convert numbers to long, so must expect long.
validate(2L, receivedList.get(1));
validate(blob, (byte[])receivedList.get(2));
console.info("Read/Write ArrayList<Object> successful.");
}
There is a comment that server converts number to long
Now i have a question. So does it mean for this type of Case integer cannot be stored?
If you check the Aerospike datatypes, you'll see that they only support 64-bit integers, which are long types in Java.
If you don't need to access this data on the server through Lua UDF scripts, then you can just save it as serialized blob data. The java driver already supports native serialization which is what you did in the first attempt. The AQL is just showing you the serialized bytes but you can read it back in the java client just fine.
Or you can store it as a json serialized string so that it's more compatible with other language drivers that you might use in the future.

Java serialization with empty and substrings

Had a look at the implementation and haven't been able to think of an explanation to this but maybe someone here will know.
public static void main(String[] args) throws Exception {
List<String> emptyStrings = new ArrayList<String>();
List<String> emptySubStrings = new ArrayList<String>();
for (int i = 0; i < 20000; i++) {
String actuallyEmpty = "";
String subStringedEmpty = " ";
subStringedEmpty = subStringedEmpty.substring(0, 0);
emptyStrings.add(actuallyEmpty);
emptySubStrings.add(subStringedEmpty);
}
System.out.println("Substring test");
// Write to files
long time = System.currentTimeMillis();
writeObjectToFile(emptyStrings, "empty.list");
System.out.println("Time taken to write empty list " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
writeObjectToFile(emptySubStrings, "substring.list");
System.out.println("Time taken to write substring list " + (System.currentTimeMillis() - time));
//Read from files
time = System.currentTimeMillis();
List<String> readEmptyString = readObjectFromFile("empty.list");
System.out.println("Time taken to read empty list " + (System.currentTimeMillis() - time));
time = System.currentTimeMillis();
List<String> readEmptySubStrings = readObjectFromFile("substring.list");
System.out.println("Time taken to read substring list " + (System.currentTimeMillis() - time));
}
private static void writeObjectToFile(Object o, String file) throws Exception {
FileOutputStream out = new FileOutputStream(file);
ObjectOutputStream oout = new ObjectOutputStream(out);
oout.writeObject(o);
oout.flush();
oout.close();
}
private static <T> T readObjectFromFile(String file) throws Exception {
ObjectInputStream ois = null;
try {
ois = new ObjectInputStream(new FileInputStream(file));
return (T) ois.readObject();
} finally {
ois.close();
}
}
Ultimately these 2 lists contain 20,000 empty strings (one list contains "" empty strings and the other contains empty strings generated by substring(0,0)). But if you check the sizes of the serialized files generated (empty.list and substring.list) you will notice that the empty.list contains substantially more data.
I have noticed that the callers of remote EJB's which un-serialize these substring objects seem to have severe performance issues also.
The sizes of the lists are different because java uses a mechanism to store multiples references to the same object, like described:
References to other objects (except in transient or static fields)
cause those objects to be written also. Multiple references to a
single object are encoded using a reference sharing mechanism so that
graphs of objects can be restored to the same shape as when the
original was written.
see ObjectOutputStream
If you look the generated serialized file, you will see:
With 1 String empty inside:
empty.list:
ac ed 00 05 73 72 00 13 6a 61 76 61 2e 75 74 69
6c 2e 41 72 72 61 79 4c 69 73 74 78 81 d2 1d 99
c7 61 9d 03 00 01 49 00 04 73 69 7a 65 78 70 00
00 00 01 77 04 00 00 00 01 74 00 00 78
The string "" corresponds to the last three bytes (00 00 78)
substring.list
ac ed 00 05 73 72 00 13 6a 61 76 61 2e 75 74 69
6c 2e 41 72 72 61 79 4c 69 73 74 78 81 d2 1d 99
c7 61 9d 03 00 01 49 00 04 73 69 7a 65 78 70 00
00 00 01 77 04 00 00 00 01 74 00 00 78
Note that with one element the resulted file is the same.
But if we want to add more times the same object, we will be faced with other behavior.
Look the respective files with 2 times that string.
empty.list:
ac ed 00 05 73 72 00 13 6a 61 76 61 2e 75 74 69
6c 2e 41 72 72 61 79 4c 69 73 74 78 81 d2 1d 99
c7 61 9d 03 00 01 49 00 04 73 69 7a 65 78 70 00
00 00 02 77 04 00 00 00 02 74 00 00 71 00 7e 00
02 78
substring.list
ac ed 00 05 73 72 00 13 6a 61 76 61 2e 75 74 69
6c 2e 41 72 72 61 79 4c 69 73 74 78 81 d2 1d 99
c7 61 9d 03 00 01 49 00 04 73 69 7a 65 78 70 00
00 00 02 77 04 00 00 00 02 74 00 00 74 00 00 78
Note that substring continues "normal", two non related strings with different references. But empty has some extra bytes to handle the issue of same reference.
Six bytes from substring (00 00 74 00 00 78) versus eight bytes from emptylist (00 00 71 00 7e 00 02 78)
This goes wrong because every repeated string that you add, more extra bytes are added. So when you full your arrayList there will be a lot of extra bytes to make it possible to reconstruct as it's original way.
If you want to know why there is that sharing mechanism, I suggest you to take a look at this question:
What is the meaning of reference sharing in Serialization? How Enums are Serialized?
empty.list contains one String object and lots of references to it.
substring.list contains 2000 string objects, all of them are equal in content.
You could "fix" this by intern()ing the strings.
private void verify(String name, Supplier<String> stringSupplier) throws IOException, ClassNotFoundException {
List<String> inputStrings = new ArrayList<String>();
inputStrings.add(stringSupplier.get());
inputStrings.add(stringSupplier.get());
ByteArrayOutputStream boas = new ByteArrayOutputStream();
ObjectOutputStream emptyOut = new ObjectOutputStream(boas);
emptyOut.writeObject(inputStrings);
emptyOut.flush();
ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(boas.toByteArray()));
List<String> returnedStrings = (List<String>)ois.readObject();
if(returnedStrings.get(0) == returnedStrings.get(1)) {
System.out.println(name + " contains the same object");
} else {
System.out.println(name + " contains DIFFERENT objects");
}
}
#Test
public void test() throws IOException, ClassNotFoundException {
verify("empty string", new Supplier<String>() {
#Override
public String get() {
return "";
}
});
verify("sub string", new Supplier<String>() {
#Override
public String get() {
String data = " ";
return data.substring(0, 0);
}
});
verify("intern()ed substring", new Supplier<String>() {
#Override
public String get() {
String data = " ";
return data.substring(0, 0).intern();
}
});
}

Decode unknow charset text save by php

I have a some record in MySQL such as
Vận hành linh hoạt trong má»i Ä‘k giao thông
which in hex as
56 c3 a1 c2 ba c2 ad 6e 20 68 c3 83 c2 a0 6e 68 20 6c 69 6e 68 20 68
6f c3 a1 c2 ba c2 a1 74 20 74 72 6f 6e 67 20 6d c3 a1 c2 bb c2 8d 69
20 c3 84 e2 80 98 6b 20 67 69 61 6f 20 74 68 c3 83 c2 b4 6e 67 20
I dont know how PHP save it, but read it from Java MySQL Connector show some strange character. And I can make it show the origin text by
copy the text above --> Notepad++ - Encoding in ASCII --> Paste text
--> Encoding in UTF-8
the original text should be:
Vận hành linh hoạt trong mọi đk giao thông
I know the problem is PHP save incorrect text format, but is there a way to decode it correctly in Java?
Are you sure the hex is exactly correct? Here is what I did...
String MESS = "56 c3 a1 c2 ba c2 ad 6e 20 68 c3 83 c2 a0 6e 68 20 6c 69 6e 68 20 68 6f c3 a1 c2 ba c2 a1 74 20 74 72 6f 6e 67 20 6d c3 a1 c2 bb c2 8d 69 20 c3 84 e2 80 98 6b 20 67 69 61 6f 20 74 68 c3 83 c2 b4 6e 67 20";
String[] hexchars = MESS.split(" ");
byte[] buf = new byte[hexchars.length];
for (int i = 0; i < hexchars.length; i++) {
buf[i] = (byte) Integer.parseInt(hexchars[i], 16);
}
try {
String s1 = new String(buf, "UTF-8"); // First encode UTF-8
buf = s1.getBytes("cp1252"); // ...then translate to cp1252
s1 = new String(buf, "UTF-8"); // ...then back to UTF-8
System.out.println(s1);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
And the printed result is:
Vận hành linh hoạt trong m�?i đk giao thông
Which is almost right. Except the decoding of mọi it is incorrect, which makes me suspect the hex that you provided may not be correct. If you are 100% sure it is correct, I can try a little more to decode it.
UPDATE:
Here are my further thoughts:
You need to find out what encoding MySQL itself (the database) is set to.
You need to find out what encoding PHP is set to
possibly in PHP.INI
possibly set in the HTML metadata for the page that populates the table.
You need to find out what if any encoding the PHP MySQL driver runs with
Only then will there be a possibility of setting the MySQL Connector/J to the right encoding, and then possibly applying a second conversion in Java.

Categories