Anybody knows a faster way to do what java.nio.charset.Charset.decode(..)/encode(..) does?
It's currently one of the bottleneck of a technology that I'm using.
[EDIT]
Specifically, in my application, I changed one segment from a java-solution to a JNI-solution (because there was a C++ technology that was most suitable for my needs than the Java technology that I was using).
This change brought about some significant decrease in speed (and significant increase in cpu & mem usage).
Looking deeper into the JNI-solution that I used, the java application is communicating with the C++ application via byte[]. These byte[] are produced by Charset.encode(..) from the java side and passed to the C++ side. Then when the C++ response with a byte[], it gets decoded in the java side via Charset.decode(..).
Running this against a profiler, I see that Charset.decode(..) and Charset.encode(..) both took a significantly long time compared to the whole execution time of the JNI-solution (I profiled only the JNI-solution because it's something I could whip up quite quickly. I'll profile the whole application on a latter date once I free up my schedule :-) ).
Upon reading further regarding my problem, it's seems that it's a known problem with Charset.encode(..) and decode(..) and it's being addressed in Java7. However, moving to Java7 is not an option for me (for now) due to some constraints.
Which is why I ask here if somebody knows a Java5 solution / alternative to this (Sorry, should have mentioned that this was for Java5 sooner) ? :-)
The javadoc for encode() and decode() make it clear that these are convenience methods. For example, for encode():
Convenience method that encodes
Unicode characters into bytes in this
charset.
An invocation of this method upon a
charset cs returns the same result as
the expression
cs.newEncoder()
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE)
.encode(bb);
except that it is potentially more
efficient because it can cache
encoders between successive
invocations.
The language is a bit vague there, but you might get a performance boost by not using these convenience methods. Create and configure the encoder once, and then re-use it:
CharsetEncoder encoder = cs.newEncoder()
.onMalformedInput(CodingErrorAction.REPLACE)
.onUnmappableCharacter(CodingErrorAction.REPLACE);
encoder.encode(...);
encoder.encode(...);
encoder.encode(...);
encoder.encode(...);
It always pays to read the javadoc, even if you think you already know the answer.
First part - it is bad idea in general to pass arrays into JNI code. Because of GC, Java has to copy arrays. In the worth case array will be copied two times - on the way to JNI code and on the way back :)
Because of that Buffer class hierarchy was introduced. And of course Java dev team creates a nice way to encode/decode chars:
Charser#newDecoder returns you CharsetDecoder, which could be used to comvert ByteBuffer to CharBuffer according to a Charset. There are two main method versions:
CoderResult decode(ByteBuffer in, CharBuffer out, boolean endOfInput)
CharBuffer decode(ByteBuffer in)
For the max performance you need the first one. It has no hidden memory allocations inside.
You need to note that Encoder/Decoder could maintance internal state, so be careful (for example if you map from 2byte encoding and input buffer has one half of char...). Also encoder/decoder are not threadsafe
There are very few reasons to "squeeze" a string in a byte array.
I would recommend to write the C functions to take utf-16 strings as parameters.
This way there is no need for any conversion.
Related
Is there a canonical way to define a data structure in Java (and by extension Kotlin) which can be serialized into a byte array or sequence of bits in the order the bytes are defined in the structure?
Similar to a Struct in C.
Edit: So, further to this, I would like a simple and expressive way of defining a data structure and then accessing any part of it. So, for instance, in pseudocode:
DataStructure Message {
Bit newFlag;
Bit genderFlag;
Bit sizeFlag;
Bit activeFlag;
Bit[4] operation;
Byte messageSize;
Byte[] message;
}
So then we do:
Message firstMessage = new Message(1, 0, 1, 0, b'0010', 11, "Hello there");
And we can say:
ByteArray serialisedMessage = firstMessage.toBytes();
Which would give us an array which looked like:
[b'10100010', b'00001011', "Hello there" (but in bytes)]
Then we could do:
firstMessage.genderFlag = 1;
..and just rerun .toBytes on the object.
Obviously there are a million ways of doing stuff like this in Java, but nothing syntactically simple, as far as I can see - pretty much anything would involve having to write a custom serialisation (and not object serialisation as per Java) method for each object. Perhaps that's the canonical way to do this, but it would be nice to have it simpler, as per C, Rust, and, erm, COBOL.
I do not know the answer to your actual question.
I will offer, however, a thought or two about the nature of the question itself. C was not developed as a high-level language -- the best description I've heard of it is a "structured assembler", it has operators that were based on addressing modes available on
the 16-bit machines on which it was first developed, and was not developed as a standard to be used in applications but as something (much) easier than assembler that still allowed the programmer enough control to write very efficient code. The first two things done with it were a compiler and an operating system, so runtime efficiency was vital (in the early 70s) in ways that no one but mobile and embedded developers can begin to appreciate these days.
The "order the bytes are defined in the structure" is not, to my mind, a good way to think of the data in Java -- the programmer does not know or care the order in which the fields in his objects are stored, or whether they're stored at all -- it isn't part of the language definition. It seems to me any library or whatever that claimed to do this would have to put in disclaimers and/or have its own compiler; while I don't know of any reason it could not do so and follow the Java specification, I don't know why someone would bother.
So ask yourself why you want to do this. And if you have a good answer, put it in here; I'd be curious.
I need to reconstruct an object on the client side from a byte[] which stores bytes coming from an InputStream(TCP/IP). The Server is in C and structures are sent across as bytes. It is from these series of bytes that I have to reconstruct the object.
I can do this by reading chunks of bytes and converting them to variables of the object I want to reconstruct, but this method is tedious and I was wondering if there is an easy way out?
But this method is tedious and I was wondering if there is an easy way out?
Not that I'm aware of. But if you find yourself writing the same code multiple times, you may well find that if you extract some helper methods it actually becomes pretty simple. Yes, you'll need to call a method to read each field value... but the code should end up being easy to read and understand, and not rely on anything magical.
You could do all of this with reflection, possibly using annotations to specify the order in which fields have been serialized etc. But that's likely to be a lot of code to write - unless you've got a lot of different types to deserialize, it will probably be more code - and more complicated code - than the "dumb but straightforward" approach.
I hope the format of the bytes from the C side of things is well-specified though: if it's basically just dumping the in-memory representation, that can end up being pretty fragile in the face of change.
Take a look at JNA. You'll have to dig around a bit. JNA is designed to map C shared libraries (.DLL, .so, etc.) into Java. But it has various helper classes and methods that can be used to map a C structure in memory to a Java object of similar structure. I am almost 100% certain you could read these structures off the wire, write the bytes into a ByteBuffer (direct or otherwise), and then map a Java object over them.
Was recently reviewing some Java Swing code and saw this:
byte[] fooReference;
String getFoo() {
returns new String(fooReference);
}
void setFoo(String foo) {
this.fooReference = foo.getBytes();
}
The above can be useful to save on your memory foot print or so I'm told.
Is this overkill is anyone else encapsulating their Strings in this way?
That's a really, really bad idea. Don't use the platform default encoding. There's nothing to say that if you call setFoo and then getFoo that you'll get back the same data.
If you must do something like this, then use UTF-8 which can represent the whole of Unicode for certain... but I really wouldn't do it. It potentially saves some memory, but at the cost of performing conversions unnecessarily for most of the time - and being error-prone, in terms of failing to use an appropriate encoding.
I dare say there are some applications where this would be appropriate, but for 99.99% of them, it's a terrible idea.
This is not really useful:
1. You are copying the string every time getFoo or setFoo are called, therefore increasing both CPU and memory usage
2. It's obscure
A little historical excursion...
Using byte arrays instead of String objects actually used to have some considerable advantages in the early days of Java (1.0/1.1) if you could be sure that you would never need anything outside of ISO-8859-1. With the VMs of that time it was more than 10 times faster to use drawBytes() compared to drawString() and it actually does save memory which was still very scarce at that time and applets used to have a hard coded memory barrier of 32 and later 64 MB anyway. Not only is a byte[] smaller than the embedded char[] of String objects but you could also save the comparatively heavy String object itself which did make quite a difference if you had lots of short strings. Besides that accessing a plain byte array is also faster than using the accessor methods of String with all their extra bounds checks.
But since drawBytes ceased to be any faster in Java 1.2 and since current JITs are much better than the Symantec JIT of that time the remaining minimal performance advantage of byte[] arrays over strings is no longer worth the hassle. The memory advantage is still there and it might thus still be an option in some very rare extreme scenarios but nowadays it's nothing that should be considered if it's not really necessary.
It may well be overkill, and it may even consume more memory, since you now have two copies of the string. How long the actual string lives depends upon the client, but as with many such hacks, it smells a lot like premature optimization.
If you anticipate that you'll have a lot of identical strings, another much better way you can save memory is with the String.intern() method.
Each call to getFoo() is instantiating a new String. How is this saving memory? If anything you're adding additional overhead for your garbage collector to go and clean up these new instances when these new references become unreferenced
This does indeed not make any sense. If it were a compile time constant which you don't need to massage back to a String, then it would make a bit more sense. You still have the character encoding problem.
It would make more sense to me if it were a char[] constant. In real world there are several JSP compilers which optimizes String constants away into a char[] which in turn can easily be written to a Writer#write(char[]). This is finally "slightly" more efficient, but those little bits counts a lot in large and heavily used applications like Google Search and so on.
Tomcat's JSP compiler Jasper does this as well. Check the genStringAsCharArray setting. It does then like so
static final char[] text1 = "some static text".toCharArray();
instead of
static final String text1 = "some static text";
which ends up with less overhead. It doesn't need a whole String instance around those characters.
If, after profiling your code, you find that memory usage for strings is a problem, you're much better off using a general string compressor and storing compressed strings, rather than trying to use UTF-8 strings for the minor reduction in space they give you. With English language strings, you can generally compress them to 1-2 bits per character; most other languages are probably similar. Getting to <1 bit per character is hard, but possible if you have a lot of data.
Is there any way to create class that extends ByteBuffer class?
Some abstract methods from ByteBuffer are package private, and if I create package java.nio, security exception is thrown.
I would want to do that for performance reasons - getInt for example has about 10 method invocations, as well as quite a few if's. Even if all checks are left, and only method calls are inlined and big/small endian checks are removed, tests that I've created show that it can be about 4 times faster.
You cant extend ByteBuffer and thanks God for.
You cant extend b/c there are no protected c-tors. Why thank god part? Well, having only 2 real subclasses ensures that the JVM can Heavily optimizes any code involving ByteBuffer.
Last, if you need to extend the class for real, edit the byte code, and just add protected attribute the c-tor and public attribute to DirectByteBuffer (and DirectByteBufferR). Extending the HeapBuffer serves no purposes whatsoever since you can access the underlying array anyways
use -Xbootclasspath/p and add your own classes there, extend in the package you need (outside java.nio). That's how it's done.
Another way is using sun.misc.Unsafe and do whatever you need w/ direct access to the memory after address().
I would want to do that for
performance reasons - getInt for
example has about 10 method
invocations, as well as quite a few
if's. Even if all checks are left, and
only method calls are inlined and
big/small endian checks are removed,
tests that I've created show that it
can be about 4 times faster.
Now the good part, use gdb and check the truly generated machine code, you'd be surprised how many checks would be removed.
I can't imagine why a person would want to extend the classes. They exist to allow good performance not just OO polymorph execution.
edit:
How to declare any class and bypass Java verifier
On Unsafe: Unsafe has 2 methods that bypass the verifier and if you have a class that extends ByteBuffer you can just call any of them. You need some hacked version (but that's super easy) of ByteBuffer w/ public access and protected c-tor just for the compiler.
The methods are below. You can use 'em on your own risk. After you declare the class like that you can even use it w/ new keyword (provided there is a suitable c-tor)
public native Class defineClass(String name, byte[] b, int off, int len, ClassLoader loader, ProtectionDomain protectionDomain);
public native Class defineClass(String name, byte[] b, int off, int len);
You can disregard protection levels by using reflection, but that kinda defeats the performance goal in a big way.
You can NOT create a class in the java.nio package - doing so (and distributing the result in any way) violates Sun's Java license and could theoretically get you into legal troubles.
I don't think there's a way to do what you want to do without going native - but I also suspect that you're succumbing to the temptation of premature optimization. Assuming that your tests are correct (which microbenchmarks are often not): are you really sure that access to ByteBuffer is going to be the performance bottleneck in your actual application? It's kinda irrelevant whether ByteBuffer.get() could be 4 times faster when your app only spends 5% of its time there and 95% processing the data it's fetched.
Wanting to bypass all checks for the sake of (possibly purely theoretical) performance does not sound a good idea. The cardinal rule of performance tuning is "First make it work correctly, THEN make it work faster".
Edit: If, as stated in the comments, the app actually does spend 20-40% of its time in the ByteBuffer methods and the tests are correct, that means a speedup potential of 15-30% - significant, but IMO not worth starting to use JNI or messing with the API source. I'd try to exhaust all other options first:
Are you using the -server VM?
Could the app be modified to make fewer calls to ByteBuffer rather than trying to speed up those it does make?
Use a profiler to see where the calls are coming from - perhaps some are outright unnecessary
Maybe the algorithm can be modified, or you can use some sort of caching
ByteBuffer is abstract so, yes, you can extend it... but I think what you want to do is extend the class that is actually instantiated which you likely cannot. It could also be that the particular one that gets instantiated overrides that method to be more efficient than the one in ByteBuffer.
I would also say that you are likely wrong in general about all of that being needed - perhaps it isn't for what you are testing, but likely the code is there for a reason (perhaps on other platforms).
If you do believe that you are correct on it open a bug and see what they have to say.
If you want to add to the nio package you might try setting the boot classpath when you call Java. It should let you put your classes in before the rt.jar ones. Type java -X to see how to do that, you want the -Xbootclasspath/p switch.
+50 bounty for a way to circumvent the access restriction (tt cannot be
done using reflection alone. Maybe
there is a way using sun.misc.Unsafe
etc.?)
Answer is: there is no way to circumvent all access restrictions in Java.
sun.misc.Unsafe works under the authority of security managers, so it won't help
Like Sarnum said:
ByteBuffer has package private
abstract _set and _get methods, so you
couldn't override it. And also all the
constructors are package private, so
you cannot call them.
Reflection allows you to bypass a lot of stuff, but only if the security manager allows it. There are many situations where you have no control on the security manager, it is imposed on you. If your code were to rely on fiddling with security managers, it would not be 'portable' or executable in all circumstances, so to speak.
The bottom line of the question is that trying to override byte buffer is not going to solve the issue.
There is no other option than implementing a class yourself, with the methods you need. Making methods final were you can will help the compiler in its effort to perform optimizations (reduce the need to generate code for runtime polymorphism & inlining).
The simplest way to get the Unsafe instances is via reflection. However if reflection is not available to you, you can create another instance. You can do this via JNI.
I tried in byte code, to create an instance WITHOUT calling a constructor, allowing you create an instance of an object with no accessible constructors. However, this id not work as I got a VerifyError for the byte code. The object has to have had a constructor called on it.
What I do is have a ParseBuffer which wraps a direct ByteBuffer. I use reflection to obtain the Unsafe reference and the address. To avoid running off the end of the buffer and killing the JVM, I allocate more pages than I need and as long as they are not touched no physical memory will be allocated to the application. This means I have far less bounds checks and only check at key points.
Using the debug version of the OpenJDK, you can see the Unsafe get/put methods turn into a single machine code instruction. However, this is not available in all JVM and may not get the same improvement on all platforms.
Using this approach I would say you can get about a 40% reduction in timings but comes at a risk which normal Java code does not have i.e. you can kill the JVM. The usecase I have is an object creation free XML parser and processor of the data contained using Unsafe compared with using a plain direct ByteBuffer. One of the tricks I use in the XML parser is to getShort() and getInt() to examine multiple bytes at once rather than examining each byte one at a time.
Using reflection to the the Unsafe class is an overhead you incurr once. Once you have the Unsafe instance, there is no overhead.
A Java Agent could modify ByteBuffer's bytecode and change the constructor's access modifier. Of course you'd need to install the agent at the JVM, and you still have to compile get your subclass to compile. If you're considering such optimizations then you must be up for it!
I've never attempted such low level manipulation. Hopefully ByteBuffer is not needed by the JVM before your agent can hook into it.
I am answering the question you WANT the answer to, not the one you asked. Your real question is "how can I make this go faster?" and the answer is "handle the integers an array at a time, and not singly."
If the bottleneck is truly the ByteBuffer.getInt() or ByteBuffer.getInt(location), then you do not need to extend the class, you can use the pre-existing IntBuffer class to grab data in bulk for more efficient processing.
int totalLength = numberOfIntsInBuffer;
ByteBuffer myBuffer = whateverMyBufferIsCalled;
int[] block = new int[1024];
IntBuffer intBuff = myBuffer.asIntBuffer();
int partialLength = totalLength/1024;
//Handle big blocks of 1024 ints at a time
try{
for (int i = 0; i < partialLength; i++) {
intBuff.get(block);
// Do processing on ints, w00t!
}
partialLength = totalLength % 1024; //modulo to get remainder
if (partialLength > 0) {
intBuff.get(block,0,partialLength);
//Do final processing on ints
}
} catch BufferUnderFlowException bufo {
//well, dang!
}
This is MUCH, MUCH faster than getting an int at a time. Iterating over the int[] array, which has set and known-good bounds, will also let your code JIT much tighter by eliminating bounds checks and the exceptions ByteBuffer can throw.
If you need further performance, you can tweak the code, or roll your own size-optimized byte[] to int[] conversion code. I was able to get some performance improvement using that in place of the IntBuffer methods with partial loop unrolling... but it's not suggested by any means.
For some caching I'm thinking of doing for an upcoming project, I've been thinking about Java serialization. Namely, should it be used?
Now I've previously written custom serialization and deserialization (Externalizable) for various reasons in years past. These days interoperability has become even more of an issue and I can foresee a need to interact with .Net applications so I've thought of using a platform-independant solution.
Has anyone had any experience with high-performance use of GPB? How does it compare in terms of speed and efficiency with Java's native serialization? Alternatively, are there any other schemes worth considering?
I haven't compared Protocol Buffers with Java's native serialization in terms of speed, but for interoperability Java's native serialization is a serious no-no. It's also not going to be as efficient in terms of space as Protocol Buffers in most cases. Of course, it's somewhat more flexible in terms of what it can store, and in terms of references etc. Protocol Buffers is very good at what it's intended for, and when it fits your need it's great - but there are obvious restrictions due to interoperability (and other things).
I've recently posted a Protocol Buffers benchmarking framework in Java and .NET. The Java version is in the main Google project (in the benchmarks directory), the .NET version is in my C# port project. If you want to compare PB speed with Java serialization speed you could write similar classes and benchmark them. If you're interested in interop though, I really wouldn't give native Java serialization (or .NET native binary serialization) a second thought.
There are other options for interoperable serialization besides Protocol Buffers though - Thrift, JSON and YAML spring to mind, and there are doubtless others.
EDIT: Okay, with interop not being so important, it's worth trying to list the different qualities you want out of a serialization framework. One thing you should think about is versioning - this is another thing that PB is designed to handle well, both backwards and forwards (so new software can read old data and vice versa) - when you stick to the suggested rules, of course :)
Having tried to be cautious about the Java performance vs native serialization, I really wouldn't be surprised to find that PB was faster anyway. If you have the chance, use the server vm - my recent benchmarks showed the server VM to be over twice as fast at serializing and deserializing the sample data. I think the PB code suits the server VM's JIT very nicely :)
Just as sample performance figures, serializing and deserializing two messages (one 228 bytes, one 84750 bytes) I got these results on my laptop using the server VM:
Benchmarking benchmarks.GoogleSize$SizeMessage1 with file google_message1.dat
Serialize to byte string: 2581851 iterations in 30.16s; 18.613789MB/s
Serialize to byte array: 2583547 iterations in 29.842s; 18.824497MB/s
Serialize to memory stream: 2210320 iterations in 30.125s; 15.953759MB/s
Deserialize from byte string: 3356517 iterations in 30.088s; 24.256632MB/s
Deserialize from byte array: 3356517 iterations in 29.958s; 24.361889MB/s
Deserialize from memory stream: 2618821 iterations in 29.821s; 19.094952MB/s
Benchmarking benchmarks.GoogleSpeed$SpeedMessage1 with file google_message1.dat
Serialize to byte string: 17068518 iterations in 29.978s; 123.802124MB/s
Serialize to byte array: 17520066 iterations in 30.043s; 126.802376MB/s
Serialize to memory stream: 7736665 iterations in 30.076s; 55.93307MB/s
Deserialize from byte string: 16123669 iterations in 30.073s; 116.57947MB/s
Deserialize from byte array: 16082453 iterations in 30.109s; 116.14243MB/s
Deserialize from memory stream: 7496968 iterations in 30.03s; 54.283176MB/s
Benchmarking benchmarks.GoogleSize$SizeMessage2 with file google_message2.dat
Serialize to byte string: 6266 iterations in 30.034s; 16.826494MB/s
Serialize to byte array: 6246 iterations in 30.027s; 16.776697MB/s
Serialize to memory stream: 6042 iterations in 29.916s; 16.288969MB/s
Deserialize from byte string: 4675 iterations in 29.819s; 12.644595MB/s
Deserialize from byte array: 4694 iterations in 30.093s; 12.580387MB/s
Deserialize from memory stream: 4544 iterations in 29.579s; 12.389998MB/s
Benchmarking benchmarks.GoogleSpeed$SpeedMessage2 with file google_message2.dat
Serialize to byte string: 39562 iterations in 30.055s; 106.16416MB/s
Serialize to byte array: 39715 iterations in 30.178s; 106.14035MB/s
Serialize to memory stream: 34161 iterations in 30.032s; 91.74085MB/s
Deserialize from byte string: 36934 iterations in 29.794s; 99.98019MB/s
Deserialize from byte array: 37191 iterations in 29.915s; 100.26867MB/s
Deserialize from memory stream: 36237 iterations in 29.846s; 97.92251MB/s
The "speed" vs "size" is whether the generated code is optimised for speed or code size. (The serialized data is the same in both cases. The "size" version is provided for the case where you've got a lot of messages defined and don't want to take a lot of memory for the code.)
As you can see, for the smaller message it can be very fast - over 500 small messages serialized or deserialized per millisecond. Even with the 87K message it's taking less than a millisecond per message.
One more data point: this project:
http://code.google.com/p/thrift-protobuf-compare/
gives some idea of expected performance for small objects, including Java serialization on PB.
Results vary a lot depending on your platform, but there are some general trends.
You might also have a look at FST, a drop-in replacement for built-in JDK serialization that should be faster and have smaller output.
raw estimations on the frequent benchmarking i have done in recent years:
100% = binary/struct based approaches (e.g. SBE, fst-structs)
inconvenient
postprocessing (build up "real" obejcts at receiver side) may eat up performance advantages and is never included in benchmarks
~10%-35% protobuf & derivates
~10%-30% fast serializers such as FST and KRYO
convenient, deserialized objects can be used most often directly without additional manual translation code.
can be pimped for performance (annotations, class registering)
preserve links in object graph (no object serialized twice)
can handle cyclic structures
generic solution, FST is fully compatible to JDK serialization
~2%-15% JDK serialization
~1%-15% fast JSon (e.g. Jackson)
cannot handle any object graph but only a small subset of java data structures
no ref restoring
0.001-1% full graph JSon/XML (e.g. JSON.io)
These numbers are meant to give a very rough order-of-magnitude impression.
Note that performance depends A LOT on the data structures being serialized/benchmarked. So single simple class benchmarks are mostly useless (but popular: e.g. ignoring unicode, no collections, ..).
see also
http://java-is-the-new-c.blogspot.de/2014/12/a-persistent-keyvalue-server-in-40.html
http://java-is-the-new-c.blogspot.de/2013/10/still-using-externalizable-to-get.html
What do you means by high performance? If you want milli-second serialization, I suggest you use the serialization approach which is simplest. If you want sub milli-second you are likely to need a binary format. If you want much below 10 micro-seconds you are likely to need a custom serialization.
I haven't seen many benchmarks for serialization/deserialization but few support less that 200 micro-seconds for serialization/deserialization.
Platform independent formats come at a cost (in effort on your part and latency) you may have to decide whether you want performance or platform independence. However, there is no reason you cannot have both as a configuration option which you switch between as required.
If you are confusing between PB & native java serialization on speed and efficiency, just go for PB.
PB was designed to achieve such factors. See http://code.google.com/apis/protocolbuffers/docs/overview.html
PB data is very small while java serialization tends to replicate a whole object, including its signature. Why I always get my class name, field name... serialized, even though I know it inside out at receiver?
Think about across language development. It's getting hard if one side uses Java, one side uses C++...
Some developers suggest Thrift, but I would use Google PB because "I believe in google" :-).. Anyway, it's worth for a look:
http://stuartsierra.com/2008/07/10/thrift-vs-protocol-buffers
Here is the off the wall suggestion of the day :-) (you just tweaked something in my head that I now want to try)...
If you can go for the whole caching solution via this it might work: Project Darkstar. It is designed as very high performance game server, specifically so that reads are fast (so good for a cache). It has Java and C APIs so I believe (thought it has been a long time since I looked at it, and I wasn't thinking of this then) that you could save objects with Java and read them back in C and vice versa.
If nothing else it'll give you something to read up on today :-)
For wire-friendly serialisation, consider using the Externalizable interface. Used cleverly, you'll have intimate knowlege to decide how to optimally marshall and unmarshall specific fields. That said, you'll need to manage the versioning of each object correctly - easy to un-marshall, but re-marshalling a V2 object when your code supports V1 will either break, lose information, or worse corrupt data in a way your apps aren't able to correctly process. If you're looking for an optimal path, beware no library will solve your problem without some compromises. Generally libraries will fit most use-cases and will come with the added benefit that they'll adapt and enhance over time without your input, if you've opted for an active open source project. And they might add performance problems, introduce bugs, and even fix bugs that haven't affected you yet!