Unable to convert bytes from charset 65535 in to Japanese (5035) - java

I have a character set conversion issue:
I am updating Japanese Kanji characters in DB2 in iSeries system with the following conversion method:
AS400 sys = new AS400("<host>","username","password");
CharConverter charConv = new CharConverter(5035, sys);
byte[] b = charConv.stringToByteArray(5035, sys, "試験");
AS400Text textConverter = new AS400Text(b.length, 65535,sys);
While retrieving, I use the following code to convert & display:
CharConverter charConv = new CharConverter(5035, sys);
byte[] bytes = charConv.stringToByteArray(5035, sys, dbRemarks);
String s = new String(bytes);
System.out.println("Remarks after conversion to AS400Text :"+s);
But, the system is displaying garbled characters while displaying. Can anybody help me to decode Japanese characters from binary storage?

Well I don't know anything about CharConverter or AS400Text, but code like this is almost always a mistake:
String s = new String(bytes);
That uses the platform default encoding to convert the binary data to text.
Usually storage and retrieval should go through opposite processes - so while you've started with a string and then converted it to bytes, and converted that to an AS400Text object when storing it, I'd expect you to start with an AS400Text object, convert that to a byte array, and then convert that to a String using CharConverter when fetching. The fact that you're calling stringToByteArray in both cases suggests there's something amiss.
(It would also help if you'd tell us what dbRemarks is, and how you've fetched it.)
I do note that having checked some documentation for AS400Text, I've seen this:
Due to recent changes in the behavior of the character conversion routines, this system object is no longer necessary, except when the AS400Text object is to be passed as a parameter on a Toolbox Proxy connection.
There's similar documentation for CharConverter. Are you sure you actually need to go through this at all? Have you tried just storing the string directly and retrieving it directly, without going through intermediate steps?

Thank you Jon Skeet!
Yes. I have committed a mistake, not encoding the string while declaration.
My issue is to get the data stored in DB2, convert it into Japanese and provide for editing in web page. I am getting dbRemarks from the result set. I have missed another thing in my post:
While inserting, I am converting to text like:
String text = (String) textConverter.toObject(b);
PreparedStatement prepareStatementUpdate = connection.prepareStatement(updateSql);
prepareStatementUpdate.setString(1, text);
int count = prepareStatementUpdate.executeUpdate();
I am able to retrieve and display clearly with this code:
String selectSQL = "SELECT remarks FROM empTable WHERE emp_id = ? AND dep_id=? AND join_date='2013-11-15' ";
prepareStatement = connection.prepareStatement(selectSQL);
prepareStatement.setInt(1, 1);
prepareStatement.setString(2, 1);
ResultSet resultSet = prepareStatement.executeQuery();
while ( resultSet.next() ) {
byte[] bytedata = resultSet.getBytes( "remarks" );
AS400Text textConverter2 = new AS400Text(bytedata.length, 5035,sys);
String javaText = (String) textConverter2.toObject(bytedata);
System.out.println("Remarks after conversion to AS400Text :"+javaText);
}
It is working fine with JDBC, but for working with JPA, I need to convert to string for editing in web page or store in table. So, I have tried this way, but could not succeed:
String remarks = resultSet.getString( "remarks" );
byte[] bytedata = remarks.getBytes();
AS400Text textConverter2 = new AS400Text(bytedata.length, 5035,sys);
String javaText = (String) textConverter2.toObject(bytedata);
System.out.println("Remarks after conversion to AS400Text :"+javaText);

Thanks a lot Jon and Buck Calabro !
With your clues, I have succeeded with the following approach:
String remarks = new String(resultSet.getBytes("remarks"),"SJIS");
byte[] byteData = remarks.getBytes("SJIS");
CharConverter charConv = new CharConverter(5035, sys);
String convertedStr = charConv.byteArrayToString(5035, sys, byteData);
I am able to convert from string. I am planning to implement the same with JPA, and started coding.

Related

What encoding Java uses to create string from give unicode data?

I am quite perplexed on why I should not be encoding unicode text with UTF-8 for comparison when other text(to compare) has been encoded with UTF-8?
I wanted to compare a text(= アクセス拒否 - means Access denied) stored in external file encoded as UTF-8 with a constant string stored in a .java file as
public static final String ACCESS_DENIED_IN_JAPANESE = "\u30a2\u30af\u30bb\u30b9\u62d2\u5426"; // means Access denied
The java file was encoded as Cp1252.
I read the file as as input stream by using below code. Point to note that I am using UTF-8 for encoding.
InputStream in = new FileInputStream("F:\\sample.txt");
int b1;
byte[] bytes = new byte[4096];
int i = 0;
while (true) {
b1 = in.read();
if (b1 == -1)
break;
bytes[i++] = (byte) b1;
}
String japTextFromFile = new String(bytes, 0, i, Charset.forName("UTF-8"));
Now when I compare as
System.out.println(ACCESS_DENIED_IN_JAPANESE.equals(japTextFromFile)); // result is `true` , and works fine
but when I encode ACCESS_DENIED_IN_JAPANESE with UTF-8 and try to compare it with japTextFromFile result is false. The code is
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(),Charset.forName("UTF-8"));
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `false`
So my doubt is why above comparison is failing, when both the strings are same and have been encoded with UTF-8? The result should be true.
However, in first case, when compared different encoded strings- one with UTF-16(Java default way of encoding string) and other with UTF-8 , result is true, which I think should be false as it is different encoding ,no matter text we read, is same.
Where I am wrong in my understanding? Any clarification is greatly appreciated.
ACCESS_DENIED_IN_JAPANESE.getBytes() does not use UTF-8. It uses your platform's default charset. But then you use UTF-8 to turn those bytes back into a String. This gets you a different String to the one you started with.
Try this:
String encodedAccessDenied = new String(ACCESS_DENIED_IN_JAPANESE.getBytes(StandardCharsets.UTF_8),StandardCharsets.UTF_8
);
System.out.println(encodedAccessDenied .equals(japTextFromFile)); // result is `true`
The best way I know is put all static texts into a text file encoded with UTF-8. And then read those resources with FileReader, setting encoding parameter to "UTF-8"

convert charset X to unicode in Java

How do you convert a specific charset to unicode in Java?
charsets have been discussed quite a lot here, but I think this one hasn't been covered yet.
I have a hex-string that meets the criteria length%4==0 (e.g. \ud3faef8e). usually I just display this in an HTML container and add &#x to the front and ; to the back of each hex quadruple.
but in this case the following procedure led to the correct output (non-Java)
paste hex string into Hex-Editor and save the file to test.txt (utf-8)
open the file with Notepad++
change the encoding to Simplified Chinese (GB2312)
Now I'm trying to do the same in Java.
// having hex convert to ascii
String ascii = "";
for (int cnt = 0; cnt <= unicode.length() - 2; cnt += 2) {
String tmp = unicode.substring(cnt, cnt + 2);
int decimal = Integer.parseInt(tmp, 16);
ascii += (char) decimal;
}
// writing ascii to file at this point leads to the same result as in step 2 before
try {
// get the bytes
byte[] utf8 = ascii.getBytes("UTF-8"); // == UTF8
// convert to gb2312
String converted = new String(utf8, "GB2312"); // == EUC_CN
// write to file (writer with declared UTF-8)
writeToFile(converted, 20 + cntu);
cntu++;
} catch (Exception e) {
System.err.println(e.getMessage());
}
the output looks according the should-output, except the fact that randomly the following character is displayed: � why does this one come up? and how can I get rid of it?
in the end, what I'd like to get is the converted unicode again to be able to display it with my original approach (폴), but I haven't figured out a way to get to the hex values again (they don't match the criteria length%4==0). how do I get the hex values of the characters?
update1
to be more precise, regarding the input, I'm assuming that it is Unicode, because of the start of the String with \u, which would be sufficient for my usual approach, but not in the case I am describing above.
update2
the writeToFile method
FileOutputStream fos = new FileOutputStream("test" + id + ".txt");
Writer out = new OutputStreamWriter(fos, "UTF8");
out.write(str);
out.close();
I tried with GB2312 as well, but there is no change. I still get the ? inbetween the correct characters.
update3
the expected output for \ud3f6ef8e is 遇飵 , you get to it when following the steps 1 to 3. (HxD as an example of an hex editor)
there was no indication that I should delete my question, thus I'm writing my final comment as the answer
I was misinterpreting the incoming hex-digits. they were in a specific charset and not uni-code, so they represented the hex-values of a character in that charset. What I'm doing now is new String(byteArray, "CharsetName"); and get (int)s.charAt(i) to get the unicode value and write it to HTML. thanks for your ideas and hints
for more details see this answer here: https://stackoverflow.com/a/4049781/1338732 , and this question here: How to convert UTF-8 to unicode in Java?

Cannot deserialize featureset using DigitalPersona Sdk java edition

am trying to create a soap webservice method to match fingerprints using digitalpersona one touch for windows sdk java edition. I capture the featureset from an applet at the client side and compare it with my template on the server side. But I need to deserialize it and create the feature set again so that i can compare it with the template.
I dont know how to recreate the feature set so that i can use it for verification:
//This is for template retrieval: (no problem here)
String dbTemplate = result.getString("template");
byte[] byteArray = new byte[1];
byteArray = hexStringToByteArray(dbTemplate);
DPFPTemplate template = DPFPGlobal.getTemplateFactory().createTemplate();
template.deserialize(byteArray);
byte[] fsArray = new byte[1];
fsArray = hexStringToByteArray(ftSet);
//the problem is here, I've already converted it back into bytearray[] but i need to deserialize it and create the feature set again.
featureSet.deserialise(fsArray);
DPFPFeatureSet features = extractFeatures(sample, DPFPDataPurpose.DATA_PURPOSE_VERIFICATION);
//This is for matching features and template
DPFPVerification matcher = DPFPGlobal.getVerificationFactory().createVerification();
DPFPVerificationResult result1 = matcher.verify(features, template);
if (result1.isVerified()) {
return "The fingerprint was VERIFIED.";
} else {
return "The fingerprint was NOT VERIFIED.";
}
Please help me.
the best thing you can do here is not to convert the bytearray into string. if you are saving it in a database, you can automatically save it as byte array (since the blob can accept a bytearray).
you can insert it like this (just an example)
PreparedStatement st=con.prepareStatement("INSERT INTO EMPLOYEE(employee_id,template)"+"values(?,?)");
st.setInt(1,23);
st.setBytes(2, enroller.getTemplate().serialize());
Statement st = con.createStatement();
ResultSet rec = st.executeQuery("SELECT * FROM EMPLOYEE WHERE EMPLOYEE_ID=3");
Then when accessing the template, deserialize it (just follow the sdk, i think it's around page 37) Onetouch java sdk ==== link
a sample will be available below.
while(rec.next()){
blob = rec.getBlob("template");
int blobLength = (int)blob.length();
blobAsBytes = blob.getBytes(1, blobLength);
}
templater.deserialize(blobAsBytes);
verificator.setFARRequested(DPFPVerification.MEDIUM_SECURITY_FAR);
DPFPVerificationResult result = verificator.verify(fs, templater);
if (result.isVerified())
System.out.print("The fingerprint was VERIFIED.");

converting byte[] to string

I am having a bytearray of byte[] type having the length 17 bytes, i want to convert this to string and want to give this string for another comparison but the output i am getting is not in the format to validate, i am using the below method to convert.I want to output as string which is easy to validate and give this same string for comparison.
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = new String(byteArray);
System.out.println(value);
Output : ���{nY
What encoding is it? You should define it explicitly:
new String(byteArray, Charset.forName("UTF-32")); //or whichever you use
Otherwise the result is unpredictable (from String.String(byte[]) constructor JavaDoc):
Constructs a new String by decoding the specified array of bytes using the platform's default charset
BTW I have just tried it with UTF-8, UTF-16 and UTF-32 - all produce bogus results. The long series of 0 makes me believe that this isn't actually a text. Where do you get this data from?
UPDATE: I have tried it with all character sets available on my machine:
for (Map.Entry<String, Charset> entry : Charset.availableCharsets().entrySet())
{
final String value = new String(byteArray, entry.getValue());
System.out.println(entry.getKey() + ": " + value);
}
and no encoding produces anything close to human-readable text... Your input is not text.
Use as follows:
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
String value = Arrays.toString(byteArray);
System.out.println(value);
Your output will be
[0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0]
Is it actually encoded text? If so, specify the encoding.
However, the data you've got doesn't look like it's actually meant to be text. It just looks like arbitrary binary data to me. If it isn't really text, I'd recommend converting it to hex or base64, depending on requirements. There's a good public domain base64 encoder you can use.
String text = Base64.encodeBytes(byteArray);
And decoding:
byte[] data = Base64.decode(text):
not 100% sure if I get you right. Is this what you want?
String s = null;
StringBuffer buf = new StringBuffer("");
byte[] byteArray = new byte[] {0,127,-1,-2,-54,123,12,110,89,0,0,0,0,0,0,0,0};
for(byte b : byteArray) {
s = String.valueOf(b);
buf.append(s + ",");
}
String value = new String(buf);
System.out.println(value);
Maybe you should specify a charset:
String value = new String(byteArray, "UTF-8");

How to convert String into Byte and Back

For converting a string, I am converting it into a byte as follows:
byte[] nameByteArray = cityName.getBytes();
To convert back, I did: String retrievedString = new String(nameByteArray); which obviously doesn't work. How would I convert it back?
What characters are there in your original city name? Try UTF-8 version like this:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");
which obviously doesn't work.
Actually that's exactly how you do it. The only thing that can go wrong is that you're implicitly using the platform default encoding, which could differ between systems, and might not be able to represent all characters in the string.
The solution is to explicitly use an encoding that can represent all characts, such as UTF-8:
byte[] nameByteArray = cityName.getBytes("UTF-8");
String retrievedString = new String(nameByteArray, "UTF-8");

Categories