Gson unicode deserialization

Gson unicode deserialization - java

I am reading text from a longblob with table-wide default charset equal to ascii and deserializing it with Gson's fromJson(). The charset as given by another field is UTF-8. I want to serialize back a modified version, but I want to test that the serialized version is equivalent to the original text in the longblob - obviously aside from the modification.
byte[] words;//from the longblob field using getBytes()
String in = new String(words, Charsets.UTF_8);
MyClass myObj = gson.fromJson(in, MyClass.class);
//modify myObj...
String out = gson.toJson(myObj);
The problem seems to be unicode characters. The length of the Strings are not equal due to the affect unicode characters have. For example out as printed will show "we’ll" whereas in will show "we\u2019ll". I know that if I copy and paste these into java code as literals they will be equal and have equal length but in memory they are not equal in the above code.
I prefer a solution that doesn't rely on changing db field type.

Related

Base64 binary data type in java

I need to attach a Base64 binary element to a SOAP message...Im doing a dry run to check if I can convert a value read from a file into Base64 binary..
Here is the below code..In the last line I try to print the type of encoded1(I assume it should be Base64 binary values) but I get the following display..."Attachment[B"...How can I confirm if its Base64 binary string?
Path path = Paths.get("c:/tomcatupload/text.csv");
byte[] attachment1 = Files.readAllBytes(path);
byte[] encoded1 = Base64.encode(attachment1);
System.out.println("Attachment"+ encoded1.getClass().getName());

Base-64 encoding is a way to convert arbitrary bytes to bytes that fit in a range of text characters in ASCII encoding. This is done without any interpretation whatsoever - raw bytes are converted to base-64 on sender's end; the receiver converts them back to a stream of bytes, and that's all there is to it.
When your code prints encoded1.getClass().getName(), all it gets is the static type of the byte array. In order to interpret the data encoded in base-64 as something meaningful to your program, you need to know the format of underlying data transported as base-64. Once the bytes are delivered to you (in your case, that's encoded1 array of bytes) you need to decide what's inside, and act accordingly.
For example, if a serialized Java object is sent to you as base-64, you need to take encoded1, make an in-memory stream from it, and read the object using the regular serialization mechanism:
ByteArrayInputStream memStream = new ByteArrayInputStream(encoded1);
ObjectInputStream objStream = new ObjectInputStream(memStream);
Object attachedObject = objStream.readObject();

The encoding by Base64.encode() was successful if and only if size of encoded1 > size of obtained attachment1.
Please refer, to understand how the encoding works.
http://en.wikipedia.org/wiki/Base64
By the way, your last statement doesn't print the array content. It prints the name of the class to which encoded1 belongs to.

String to Byte[] and Byte to String

Given the following example:
String f="FF00000000000000";
byte[] bytes = DatatypeConverter.parseHexBinary(f);
String f2= new String (bytes);
I want the output to be FF00000000000000 but it's not working with this method.

You're currently trying to interpret the bytes as if they were text encoded using the platform default encoding (UTF-8, ISO-8859-1 or whatever). That's not what you actually want to do at all - you want to convert it back to hex.
For that, just look at the converter you're using for the parsing step, and look for similar methods which work in the opposite direction. In this case, you want printHexBinary:
String f2 = DatatypeConverter.printHexBinary(bytes);
The approach of "look for reverse operations near the original operation" is a useful one in general... but be aware that sometimes you need to look at a parallel type, e.g. DataInputStream / DataOutputStream. When you find yourself using completely different types for inverse operations, that's usually a bit of a warning sign. (It's not always wrong, it's just worth investigating other options.)

Java string .length() X does not fit into DB2 varchar(X)

I observed one issue that, I am trying to save a record in DB2 database with fields having length check in Java code. I have kept length check exactly equal to database varchar limit. And trying to save but getting SQL Exception DB2 SQL Error: SQLCODE=-302, SQLSTATE=22001, SQLERRMC=null, DRIVER=3.57.82;
Then I reduced the length (truncated) the length smaller than the database size. Truncated to substring (0, 900) apprx for varchar(1000).
Please let me what could be the reason of the same. Is it related to character encoding?
How it needs to be handled?
What is default character encoding applied to String (input from request parameter of a text area field) and corresponding no. of bytes?

DB2 counts string length in bytes, not characters. The max length of a string you can store can therefore be a lot shorter than the size given for varchar.
Unfortunately the only way to truncate a string to a number of bytes is to encode it as bytes, truncating, and reconstructing the string. From what you say it sounds like a variable length encoding such as UTF-8 is being used. The difficult part is not producing an invalid character at the end, and the way to do that is using the NIO charset API:
import java.nio.*;
...
CharBuffer in = CharBuffer.wrap(stringToTruncate);
ByteBuffer out = ByteBuffer.allocate(maxLength);
Charset db2charset = Charset.forName("UTF-8");
CharsetEncoder db2encoder = db2charset.newEncoder();
db2encoder.encode(in, out, true);
out.flip();
return db2charset.decode(out).toString();

How to convert byte array in String format to byte array?

I have created a byte array of a file.
FileInputStream fileInputStream=null;
File file = new File("/home/user/Desktop/myfile.pdf");
byte[] bFile = new byte[(int) file.length()];
try {
fileInputStream = new FileInputStream(file);
fileInputStream.read(bFile);
fileInputStream.close();
}catch(Exception e){
e.printStackTrace();
}
Now,I have one API, which is expecting a json input, there I have to put the above byte array in String format. And after reading the byte array in string format, I need to convert it back to byte array again.
So, help me to find;
1) How to convert byte array to String and then back to the same byte array?

The general problem of byte[] <-> String conversion is easily solved once you know the actual character set (encoding) that has been used to "serialize" a given text to a byte stream, or which is needed by the peer component to accept a given byte stream as text input - see the perfectly valid answers already given on this. I've seen a lot of problems due to lack of understanding character sets (and text encoding in general) in enterprise java projects even with experienced software developers, so I really suggest diving into this quite interesting topic. It is generally key to keep the character encoding information as some sort of "meta" information with your binary data if it represents text in some way. Hence the header in, for example, XML files, or even suffixes as parts of file names as it is sometimes seen with Apache htdocs contents etc., not to mention filesystem-specific ways to add any kind of metadata to files. Also, when communicating via, say, http, the Content-Type header fields often contain additional charset information to allow for correct interpretation of the actual Contents.
However, since in your example you read a PDF file, I'm not sure if you can actually expect pure text data anyway, regardless of any character encoding.
So in this case - depending on the rest of the application you're working on - you may want to transfer binary data within a JSON string. A common way to do so is to convert the binary data to Base64 and, once transferred, recover the binary data from the received Base64 string.
How do I convert a byte array to Base64 in Java?
is a good starting point for such a task.

String class provides an overloaded constructor for this.
String s = new String(byteArray, "UTF-8");
byteArray = s.getBytes("UTF-8");
Providing an explicit encoding charset is encouraged because different encoding schemes may have different byte representations. Read more here and here.
Also, your inputstream maynot read all the contents in one go. You have to read in a loop until there is nothing more left to be read. Read the documentation. read() returns the number of bytes read.
Reads up to b.length bytes of data from this input stream into an
array of bytes. This method blocks until some input is available

String.getBytes() and String(byte[] bytes) are methods to consider.

Convert byte array to String
String s = new String(bFile , "ISO-8859-1" );
Convert String to byte array
byte bArray[] =s.getBytes("ISO-8859-1");

Use Java Serialization to Save Object

I'm planning to use Serialization to save the Bean modified by user--to store the history record. But the ByteArrayOutputStream output a byte array:byte[]. If I convert it to String and convert it back, then it can't be de-serialized. --How to explain this?
If I use byte array to store in the oracle, it's complicated.Is there any way to make the String can be de-serialized? Thank you!
I'm a Chinese, so forgive me for my bad English. :)

Use ObjectOutputStream to serialize and ObjectInputStream to deserialize objects. The API documentation of those classes has examples that show how to use them to serialize and deserialize objects to and from a file.
Don't try to force a byte[] into a String. (Why would you want to put it in a String?). Serialized objects are binary data, not text characters that you would store in a String.

Brief Answer: encode the byte array as a Base64 string.
Base64 is a way of ensuring that binary data can be stored and transmitted as text - a reasonable explanation can be found on Wikipedia; if you don't encode the byte array, data can easily become "corrupted" by the use of different codepages etc. One thing to be aware of - base64 encoding will take more space than the byte array (so a byte array of 20 bytes may take around 30 characters to be stored)
There are many libraries that can encode/decode Base64 Apache Commons Codec is just one. See this question for more discussions on which library to use (there is a "private" one in the JDK, but use of it may be considered questionable by some developers).
So, in summary, to serialize an object into string, us an ObjectOutputStream and a ByeArrayOutputStream to convert to a byte array, and then use a Base64 encoder to translate that into a string.
To deserialize, use a Base64 decoder to convert the string back into a byte array, and then use a ByteArrayInputStream and a ObjectInputStream to read it back.

In byte[] all possible byte values can be used i.e. -128 to 127. However, in text these values and combination of values can be invalid and not convert to text as expected.
I suggest you consider a text based serialization like XML or JSon. These can be read/written as text safely. Text base serialization can be read by a human and possibly edited as text if you want to correct a value.
EDIT: I would look at using XMLEncoder which is crude, but built in or XStream for XML and JSon, which is more flexible and efficient (but requires a couple of extra libraries)

static final String SQL_SERIALIZE_OBJECT="insert into serialized_java_objects(serialized_id,object_name,serialized_object) values (ser_id_seq.nextval,?,?)";
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ObjectOutputStream oos = new ObjectOutputStream(baos);
oos.writeObject(objectToBeSerilize);
byte[] serializeBytes = baos.toByteArray();
prepStatement = connection.prepareStatement(SQL_SERIALIZE_OBJECT);
prepStatement.setString(1, objectToBeSerilize.getClass().getName());
prepStatement.setBytes(2, serializeBytes);
prepStatement.executeUpdate();

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.