Google Protobuf ByteString vs. Byte[] - java

I am working with google protobuf in Java.
I see that it is possible to serialize a protobuf message to String, byte[], ByteString, etc:
(Source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/MessageLite)
I don't know what a ByteString is. I got the following definition from the the protobuf API documentation (source: https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/ByteString):
"Immutable sequence of bytes. Substring is supported by sharing the reference to the immutable underlying bytes, as with String."
It is not clear to me how a ByteString is different from a String or byte[].
Can somebody please explain?
Thanks.

You can think of ByteString as an immutable byte array. That's pretty much it. It's a byte[] which you can use in a protobuf. Protobuf does not let you use Java arrays because they're mutable.
ByteString exists because String is not suitable for representing arbitrary sequences of bytes. String is specifically for character data.
The protobuf MessageLite Interface provides toByteArray() and toByteString() methods. If ByteString is an immutable byte[], would the byte representation of a message represented by both ByteString and byte[] be the same?
Sort of. If you call toByteArray() you'll get the same value as if you were to call toByteString().toByteArray(). Compare the implementation of the two methods, in AbstractMessageLite:
public ByteString toByteString() {
try {
final ByteString.CodedBuilder out =
ByteString.newCodedBuilder(getSerializedSize());
writeTo(out.getCodedOutput());
return out.build();
} catch (IOException e) {
throw new RuntimeException(
"Serializing to a ByteString threw an IOException (should " +
"never happen).", e);
}
}
public byte[] toByteArray() {
try {
final byte[] result = new byte[getSerializedSize()];
final CodedOutputStream output = CodedOutputStream.newInstance(result);
writeTo(output);
output.checkNoSpaceLeft();
return result;
} catch (IOException e) {
throw new RuntimeException(
"Serializing to a byte array threw an IOException " +
"(should never happen).", e);
}
}

A ByteString gives you the ability to perform more operations on the underlying data without having to copy the data into a new structure. For instance, if you wanted to provide a subset of bytes in a byte[] to another method, you would need to supply it with a start index and an end index. You can also concatenate ByteStrings without having to create a new data structure and manually copy the data.
However, with a ByteString you can give the method a subset of that data without the method knowing anything about the underlying storage. Just like a a substring of a normal String.
A String is for representing text and is not a good way to store binary data (as not all binary data has a textual equivalent unless you encode it in a manner that does: e.g. hex or Base64).

Related

JNA: Change String encoding for only one external native library

We have here an JAVA application which loads and use a lots of external librarys. The default encoding of the operating system (Windows) is "windows-1252" (or "cp-1252"). But there is one external library which want all String (incoming and outgoing) in "utf-8". How can I do that? How can I change the String encoding type for only one JNA library?
The normal JNA pattern is this:
public interface DemoLibrary extends Library {
DemoLibrary INSTANCE = Native.load("demoLibrary", DemoLibrary.class);
// abstract method declarations as interface to native library
}
However, Native#load is overloaded multiple times to support customizing the bindings. The relevant overload is: Native#load(String, Class, Map<String,?>). The third argument can be used to pass options to the native library loader. The options can be found in the com.sun.jna.Library Interface.
The relevant option here is Library.OPTION_STRING_ENCODING. That option is passed to the NativeLibrary instance loaded and will be used as the default encoding for this class.
The sample above becomes then
public interface DemoLibrary extends Library {
DemoLibrary INSTANCE = Native.load("demoLibrary", DemoLibrary.class,
Collections.singletonMap(Library.OPTION_STRING_ENCODING, "UTF-8"));
}
If you need to customize more (typemapper, calling convention), you'll need to create the option map for example in a static initializer block.
Matthias Bläsing's answer is a much better solution to this specific use case. Please read it first if you only need character encoding.
My original answer is below and is more general to a wider range of applications.
An easy way to handle it is to not directly map String fields/args at all. Just send and receive byte arrays from the library, and create a helper function to do the translation between Strings and the byte arrays. As you've pointed out, you can write those bytes to an allocated Memory block and pass the pointer.
If you want a more permanent solution to do the same thing behind the scenes, you can use a TypeMapper for that particular library.
The W32APITypeMapper is a good reference, with the stringConverter variable showing you how in unicode it maps String to the wide string WString (UTF16).
Create your own UTF8TypeMapper (or similar) and use Java's character set/encoding functions to translate your strings to a sequence of UTF-8 bytes.
This is untested, but should be close to what you need. You could do a bit more abstraction to create a new UTF8String type that handles the details.
public class UTF8TypeMapper extends DefaultTypeMapper {
public UTF8TypeMapper() {
TypeConverter stringConverter = new TypeConverter() {
#Override
public Object toNative(Object value, ToNativeContext context) {
if (value == null)
return null;
String str = (String) value;
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
// Allocate an extra byte for null terminator
Memory m = new Memory(bytes.length + 1);
// write the string's bytes
m.write(0, bytes, 0, bytes.length);
// write the terminating null
m.setByte((long) bytes.length, (byte) 0);
return m;
}
#Override
public Object fromNative(Object value, FromNativeContext context) {
if (value == null)
return null;
Pointer p = (Pointer) value;
// handles the null terminator
return p.getString(0, StandardCharsets.UTF_8.name());
}
#Override
public Class<?> nativeType() {
return Pointer.class;
}
};
addTypeConverter(String.class, stringConverter);
}
}
Then, add the type mapper to the options when loading the library:
private static final Map<String, ?> UTF8_OPTIONS =
Collections.singletonMap(Library.OPTION_TYPE_MAPPER, new UTF8TypeMapper());
TheUTF8Lib INSTANCE = Native.load("TheUTF8Lib", TheUTF8Lib.class, UTF8_OPTIONS);
The only way I have found is to use the Function class of JNA (see at https://java-native-access.github.io/jna/5.2.0/javadoc/com/sun/jna/Function.html ) like this:
public void setIp(String ip) {
Function fSetIp = Function.getFunction("myLib", "setIp", Function.C_CONVENTION, "utf-8");
Object[] args = {ip};
fSetIp.invoke(args);
}
But I must implementing this for each function I want to call. Not sure if there is a better/easier way. If so: Please answer my question.

Unserialize an array of bytes taking account of its useful length

I have an array of bytes whose length equals XXX. It contains a serialized object which I want to unserialise (ie. : I want to create a copy of this object from these stored bytes).
But I have a constraint : the useful length of my bytes array. Indeed, I want to take in consideration the latter to unserialise (ie. : the serialized object can be shorter than the array's size).
I hope you will understand easier with my two little methods (the first serialises, while the last unserialises) :
byte[] toBytes() throws IOException {
byte[] array_bytes;
ByteArrayOutputStream byte_array_output_stream = new ByteArrayOutputStream();
ObjectOutput object_output = new ObjectOutputStream(byte_array_output_stream);
object_output.writeObject(this);
object_output.close();
array_bytes = byte_array_output_stream.toByteArray();
return array_bytes;
}
And the current unserialisation method (which is "wrong" for the moment because I don't use the useful length) :
static Message fromBytes(byte[] bytes, int length) throws IOException, ClassNotFoundException, ClassCastException {
Message message;
ByteArrayInputStream byte_array_input_stream = new ByteArrayInputStream(bytes);
ObjectInput object_input = new ObjectInputStream(byte_array_input_stream);
message = (Message) object_input.readObject();
object_input.close();
return message;
}
As you can see, readObject doesn't need a length, and I must : that's a problem, and perhaps I should NOT use this method.
Thus, my question is : With or without using readObject, how could I take in consideration the useful length (ie. : "payload" ?) of my bytes array ?
I assume that your Message class implements Serializable.
In this case, when you write your message, it gets automatically serialized from the java runtime, as explained in the Serializable interface.
I cannot be sure how or why you might find part of the generated byte array as not useful, since it is all part of the serialized instance.
However, I might suggest that you follow the Externalizable interface way:
your Message class will implement Externalizable. Then you have the option of controlling how exactly your class gets serialized and de-serialized in writeExternal(ObjectOutput out) and readExternal(ObjectInput in) methods respectively, where you can write the length you want in the stream, read it back, and/or keep only the required amount of bytes.

Putting a ByteBuffer into a Bundle

I'm working with some code that I have no control over, and the ByteBuffer I'm working with gets passed to native method. I don't have access to the native code but it expects "buf" to be a ByteBuffer. Also note that the code doesn't really make any sense but there is a lot so I am distilling it down to the issue.
public class otherClass {
public final void setParams(Bundle params) {
final String key = params.keySet()[0];
Object buf = params.get(key));
nativeSet(key, buf);
}
private native final void nativeSet(key, buf);
}
Here is my code:
public void myMethod(ByteBuffer myBuffer) {
final Bundle myBundle = new Bundle();
myBundle.putByteBuffer("param", myBuffer);
otherClass.setParams(runTimeParam);
}
The problem? There is no putByteBuffer method in Bundle. Seems kind of weird that there is a get() that returns an object, but no generic put().
But what seems weirder to me is that the native code wants a ByteBuffer. When it gets passed from Java layer, won't it have a bunch of metadata with it? Can code in the native layer predict the metadata and extract from the ByteBuffer?
Is there any way to reliably pass a ByteBuffer here? It can be a little hacky. I was thinking maybe I could figure out what the ByteBuffer object would be in bits, convert to integer, and use putInt(). Not sure how to go from ByteBuffer object to raw data.
Hypothetically this should work. Turn the byte buffer to a string and pass that into your bundle like this:
byte[] bytes = myBuffer.getBytes( Charset.forName("UTF-8" ));
String byteString = new String( bytes, Charset.forName("UTF-8") );
myBundle.putString("param", byteString);
then reconstruct the bytebuffer from the string:
byte[] byteArray = byteString.getBytes();
ByteBuffer byteBuffer = ByteBuffer.allocate(byteArray.length + 8);
byteBuffer.put(byteArray);

How to use Data Type Renderers list for byte[] in IntelliJ

So I have all these byte[]'s, and their encoding could be one of many different formats, depending on where I'm currently debugging. I'd like to be able to toss together a list of projections for the byte array. I'm using IntelliJ's Data Type Renderers view, applying the renderer to type byte[], and I'm primarily concerned with the List of Expressions box.
So I've seen how you can render the node with the expression new String(this), however that doesn't work in the list of expressions view below. Eventually I want to do expressions like new String(this, "UTF16") (or do conversions to hex or base64 or w/e) but this doesn't appear to be a byte[] in the expressions list box - in fact, when I try to typecast like (byte[])this the result says Inconvertible types; cannot cast '_Dummy_.__Array__≤T≥' to 'byte[]' (same behavior for java.lang.Byte[]). It's really strange behavior, that in one place it's a byte[] and in another it's some opaque internal type.
What does work is simply displaying the fields - i.e., an expression like this.length works as expected. Also, an expression like this simply rerenders the node, claiming its type is byte[] and its object id is the same as the original's id.
I answered a similar question here. The message about being unable to cast _Dummy_.__Array__<T> to byte[] sounds like an IntelliJ error where it can't determine the name of the class in the call stack. Perhaps adding DTRs for other "forms" of byte[] will help. I've included examples of three DTRs below -- byte[], Byte[] and ArrayList.
The test and the helper method (written in Groovy, so make sure it's on your classpath or rewrite in pure Java):
#Test
void testShouldHandleDTR() {
// Arrange
byte[] primitiveArray = "90abcdef".bytes
Byte[] objectArray = "90abcdef".bytes
List<Byte> objectList = "90abcdef".bytes
final String EXPECTED_STRING = Hex.encodeHexString(primitiveArray)
logger.info("Expected hex string: ${EXPECTED_STRING}")
// Fully qualified for DTR dialog
String primitiveDTR = "org.bouncycastle.util.encoders.Hex.toHexString(this);"
String objectArrayDTR = "org.example.ThisTest.encodeObjectArrayToHex(this);"
String objectListDTR = "org.example.ThisTest.encodeObjectArrayToHex(this.toArray());"
def types = [primitiveArray, objectArray, objectList]
def expressions = [(primitiveArray.class.name): primitiveDTR,
(objectArray.class.name): objectArrayDTR,
(objectList.class.name): objectListDTR]
// Act
types.each { it ->
logger.info("Contents: ${it}")
logger.info("Type: ${it.class.name}")
String result = Eval.x(it, expressions[it.class.name].replaceAll(/this/, "x"))
logger.info("Result: ${result}")
// Assert
assert result == EXPECTED_STRING
}
}
public static String encodeObjectArrayToHex(Object[] objectArray) {
byte[] primitiveArray = objectArray.collect { it as byte }
Hex.encodeHexString(primitiveArray)
}
For each DTR you want to define, just copy the exact String defined above into the When rendering a node > Use following expression field. I'd recommend putting the utility method into a source class on your classpath as opposed to a test, as on every build you will have to re-import the test class in the DTR dialog because the target/ gets cleaned.

How to send a string array using sockets and objectoutputstream

I have this to send string or integer but if i want to send a string array what should i use?
// A string ("Hello, World").
out.write("Hello, World".getBytes());
// An integer (123).
out.write(ByteBuffer.allocate(4).putInt(123).array());
Thanks in advance
Just write the array
ObjectOutputStream out = ...
String[] array = ...
out.writeObject(array);
If you're using ObjectOutputStream, there's no need to muck about with byte arrays - the class provides high-level methods to read and write entire objects.
Similarly:
out.writeInt(123);
out.writeObject("Hello, World");
You only need to use the write(byte[]) methods if you're using the raw, low-level OutputStream class.

Categories