I'm working on a web service that will be accessed from an Android app. After doing some research on what's the best technology, I'm left somewhat confused and dazed by the options.
Obviously on the Android end I want it to be as lightweight as possible. I also would prefer to share the common code since both are java, although that's less important. My primary concern is having it be efficient, and after that, simple and elegant code.
I've tried gson on the Android end, and it works nicely. But then I read about protocol buffers, and that seems even more efficient, I'm not sure if it's a significant difference. Also I'm not sure whether to go for RPC or REST.
On the efficiency front, Protocol Buffers will probably be more efficient than any JSON implementation, thought not necessarily by as much as you think. GSON is not particularly fast, but the Jackson library can almost compete with most binary serializers (Jackson is 2-4x faster than GSON in most situations and 10-20x faster on UTF-8 because it has special code for UTF-8).
But I'd still take Protocol Buffers over any JSON library because of the programming model. With most JSON libraries, your have to check the structure of a message manually. With Protocol Buffers, you specify message structures declaratively and the library will take care of the structural validation for you (though there will still be things that you need to validate manually).
Other libraries like Protocol Buffers: Apache Avro, Apache Thrift.
The Protostuff library uses the Protocol Buffers data model (so you get structural validation for free) but support serializing to JSON and YAML in addition to other formats. This can be useful if you want your service to be consumed by Javascript code, where JSON is often the easiest thing to deal with.
Related
I have a client-server system implemented in C#, and the client and server exchange .Net objects via serialization / deserialization and communicating via TCP/IP. This runs on a local network, it is not web-based or Internet-based.
Now I want to include Android clients connected by wifi. Again, this is local network only, not via the Internet and not web-based. The Android programming will be in Java. (I am aware of Mono for Android, but prefer not to get into that now.)
Is there some fairly simple way to implement object to object interchange between Java and .Net objects, provided, of course, that they are compatible?
I've looked a bit at JSON (Jackson on the Java end and Json.Net on the .Net end), and I'm guessing it can probably be done, but only with major efforts on remapping things at each end as soon as the objects become fairly complicated.
Any other suggestions? JSON-based or otherwise?
PS. My question is somewhat related to this one Mapping tool for converting Java's JSON to/from C#, but it never got a suitable answer, perhaps due to insufficient info in the question. Also, I don't care whether I end up using a JSON-based transport or XML or something else.
I would suggest either JSON or XML (which is based on a .xsd file) because these are independent of their respective implementations (instead of something like an ObjectOutputStream in java).
The problem of having this format between the two components (client and server) is that they need to be at the same version. My best practice is to have one underlying definition of the format (i use xml with an xsd file which specifies how the xml has to look like), then use jaxb to generated java classes. That way you can (un)marshal from/to xml in the java part.
I am very sure a similar thing exists in the world of .NET.
JSON is smaller than xml in size, i find xml to be more readable.
SO user "default locale" should get the honor for this, but he/she has only answered via a comment. So just to make it very clear what my choice was I'll answer my own question.
I've decided to go with Google Protocol Buffers, which in my opinion has much better support for moving objects back and forth between Java and .Net than JSON. Because I have a lot of experience with C#, and a lot of existing C#-defined classes, I've selected Marc Gravell's protobuf-net program for the .Net end, and Google's own support for the Android end (no - see edit). This implies that I'm defining the objects in C#, not in .proto files - protobuf-net generates the .proto files from which I then generate the Java code.
Incidentally, as the transport mechanism I'm using a little-known program called naga on the Android end. http://code.google.com/p/naga/ Naga seems to work fine, and is well-documented and has sample programs, and should be better known in my opinion.
EDIT:
OK, I've got it working now to my satisfaction. Here's what I'm using:
Google Protocol buffers as the interchange format: https://developers.google.com/protocol-buffers/
Marc Gravell's protobuf-net at the C# end: http://code.google.com/p/protobuf-net/
A program called called protostuff at the Java end: http://code.google.com/p/protostuff/
(I prefer protostuff to the official Google Java implementation of protocol buffers due to Google's implementation being based on the Java objects being immutable.)
Actually, I'm not using pure protocol buffers as the interchange format - I prefix the data with the name of the (outermost) class being transmitted. This makes the data self-identifying for deserializing at the other end.
You can also try wox (https://github.com/codelion/wox), it is a cross platform serialization library for Java and C# based on XML.
I have done some searching but haven't come up with anything on this topic. I was wondering if anyone has ever compared (to some degree) the performance difference between an RPC over a socket and a REST web service. If both do the same thing, which would have a tendency to be the better performer? I've already started building some socket code and would like to know if REST would give better performance before I progress much further. Any input would be really appreciated. Thanks indeed
RMI
Feels like a local API, much like
XMLRPC
Can provide some fairly nice remote
exception data
Java specific means this causes lock
in and limits your options
Has horrible versioning problems
between different versions of clients
Skeleton files must be compiled in
like CORBA, which is not very flexible
REST:
easy to route around firewalls
useful for uploading files as it can
be rather lightweight
very simple if you just want to shove
simple things at something and get
back an integer (like for uploaders)
easy to proxy security behind Apache
and let it take the heat
does not define any standard format
for the way the data is being
exchanged (could be JSON, YAML 1.0,
YAML 2.0, arbitrary XML format, etc)
does not define any convention about
having remote faults sent back to the
caller, integer codes are frequently
used, but method of sending back data
is not defined. Ideally this would be
standardized.
may require a lot of work on the
client side caller of the library to
make use of data (custom serialization
and so forth)
In short from here
web services do allow a loosely
coupled architecture. With RMI, you
have to make sure that the objects
stay in sync in all applications
RMI works best for smaller
applications, that are not
internet-related and thus not scalable
Its hard to imagine that REST is faster than a simple socket connection given it also goes over a Socket.
However REST may be performant enough, standard and easier to use. I would test whether REST is fast enough and meets your requirements first (or one of the many other existing solutions) before attempting your own Socket solution.
I'm planning on developing an app for android that requires a back-end server to sync data with other users of the app. I'm planning on writing this server in standard java running on a unix server.
I once did this directly between two android devices, in that case I just serialized all the data needed to be sent on both ends.
However I suspect that the format that Dalvik serializes to and Java SE's format are not compatible. Is this the case? And if it is, what are my alternatives? One thing that popped into my mind was sending raw xml over a socket, but if there are better alternatives I'll be glad to hear them.
Thanks.
If you are doing a server then you should rely on something more standard like XML or JSON. I personally favor JSON. You shouldn't expect all your client to be Java friendly. Almost every mobile device support JSON. Look at Jackson library to generate your json. Then you can use Jackson again to deserialize your object.
The beauty of this solution is also stupid simple. You can look at the content by just just putting the request in your browser. Not so easy with binary data.
I have used data serialization successfully between Android devices and servers.
I did have to convert TimeZone class to String and back because the TimeZone class in particular is not fully compatible (it tried transferring something in the sun. package which got ClassNotFoundException on Android).
Other than that I have been able to transfer objects from java.util collections and maps and from java.sql data types and of course the java.lang types String, Integer, etc..
You can try protobuf for serialization. It is said to be more efficient, and you won't be concerned about compatibility.
You can also use some form of XML serialization (JAXB, XStream, XMLEncoder, etc)
The resolution of this question hints that it is compatible.
If your object graph is pretty simple and if you are comfortable with JSON at all, Android has JSON support included and it would be easy to get support in Java SE. I tend to think of JSON as a good alternative for when XML or Java serialization seems to "heavy".
Have a look at this benchmark. Kryo is the one I'm using. It supports creation of the custom binary serialization, which can be done in a way suitable for both Dalvik and JSE.
You may want to look at a related question, which provides additional discussion and links.
Protocol buffers would be a good format over the wire to consider.
I can't speak to the serialization of Dalvik.
i have set up a basic client and a basic server using java sockets. it can successfully send strings between them.
now i want to design some basic messages.
could you give me any recommendations on how to lay them out?
should i use java's serialzation to send classes?
or should i just encode the information i need in a custom string and decode on the other side?
what about recognizing the type of messages? is there some convention for this? like the first 4 characters of each message are a identifier for the message?
thanks!
I would recommend you not to reinvent the wheel. If java serialization suits you, just use it.
Also take into account that there are some nice serialization frameworks around:
thrift, from facebook, and protocol buffers from Google.
Thrift also is a RPC mechanism, so you could also use it instead of opening / reading raw sockets, but this, of course, depends on your problem domain.
Edit: And answering your question about the message formatting. Definitely if you want to implement your own protocol and if you have more than one type of messages you should implement a header yes. But I warn you that implementing a protocol is hard and very error prone. Just create an object containing the different inner objects + methods you need, if you want add it a version field and make it implement the java.io.Serializable interface.
Maybe JMS would help you, it's hard to say without knowing the details. But JMS is standard, well thought out and versatile, and there are an impressive number of implementations available, open source and commercial. We use Sun's OpenMQ implementation and we're quite happy with it. It's fast enough for our needs, very mature and reliable.
Mind you, JMS is not a lightweight affair by any standard so it may very well be overkill for your needs.
If you're going to deploy this in a production environment, I'd advice you to look at either RMI or XML web services. (Google's Protocol Buffers are interesting too, but do not include a standard protocol for message transport, although 3rd party implementations exist.)
If you're doing this for the pleasure of learning, there are tons of ways to go about this. In general, a message in a generic messaging system will have some kind of "envelope format" which contains not only the message body, but also metadata about the message. A bare minimum for the header is something that identifies the intended receiver - either an integer identifier, a string representing a method name or a file, or something like it.
A simple example is HTTP, a plain-text format where the envelope and the is made up of all the lines until the first blank line. The first line identifies the protocol version and the intended receiver (≈the file requested), the following lines are metadata about the request, and the message body follows the first blank line.
In general, XML is a common format for distributed services (mostly because of its good schema capabilities and cross-platform support), although some schemes use other formats for simplicity and/or performance. RMI uses standard Java object serialization, for example.
What you choose to use is ultimately based on your needs. If you want to make it easy to interact with your system from a large amount of platforms, use XML web services (or REST). For communication between distributed Java subsystems, use RMI. If your system is extremely transaction intensive, maybe a custom binary format is best for faster processing and smaller messages - but before doing this "optimization", remember that it requires a lot more work to get it working properly and that most apps won't benefit a lot from it.
We're looking into transport/protocol solutions and were about to do various performance tests, so I thought I'd check with the community if they've already done this:
Has anyone done server performance tests for simple echo services as well as serialization/deserialization for various messages sizes comparing EJB3, Thrift, and Protocol Buffers on Linux?
Primarily languages will be Java, C/C++, Python, and PHP.
Update: I'm still very interested in this, if anyone has done any further benchmarks please let me know. Also, very interesting benchmark showing compressed JSON performing similar / better than Thrift / Protocol Buffers, so I'm throwing JSON into this question as well.
Latest comparison available here at the thrift-protobuf-compare project wiki. It includes many other serialization libraries.
I'm in the process of writing some code in an open source project named thrift-protobuf-compare comparing between protobuf and thrift. For now it covers few serialization aspects, but I intend to cover more. The results (for Thrift and Protobuf) are discussed in my blog, I'll add more when I'll get to it.
You may look at the code to compare API, description language and generated code. I'll be happy to have contributions to achieve a more rounded comparison.
You may be interested in this question: "Biggest differences of Thrift vs Protocol Buffers?"
I did test performance of PB with number of other data formats (xml, json, default object serialization, hessian, one proprietary one) and libraries (jaxb, fast infoset, hand-written) for data binding task (both reading and writing), but thrift's format(s) was not included. Performance for formats with multiple converters (like xml) had very high variance, from very slow to pretty-darn-fast. Correlation between claims of authors and perceived performance was rather weak. Especially so for packages that made wildest claims.
For what it is worth, I found PB performance to be bit over hyped (usually not by its authors, but others who only know who wrote it). With default settings it did not beat fastest textual xml alternative. With optimized mode (why is this not default?), it was bit faster, comparable with the fastest JSON package. Hessian was rather fast, textual json also. Properietary binary format (no name here, it was company internal) was the slowest. Java object serialization was fast for larger messages, less so for small objects (i.e. high fixed per-operation noverhead).
With PB message size was compact, but given all trade-offs you have to do (data is not self-descriptive: if you lose the schema, you lose data; there are indexes of course, and value types, but from what you have reverse-engineer back to field names if you want), I personally would only choose it for specific use cases -- size-sensitive, closely coupled system where interface/format never (or very very rarely) changes.
My opinion in this is that (a) implementation often matters more than specification (of data format), (b) end-to-end, differences between best-of-breed (for different formats) are usually not big enough to dictate the choice.
That is, you may be better off choosing format+API/lib/framework you like using most (or has best tool support), find best implementation, and see if that works fast enough.
If (and only if!) not, consider next best alternative.
ps. Not sure what EJB3 here would be. Maybe just plain of Java serialization?
If the raw net performance is the target, then nothing beats IIOP (see RMI/IIOP).
Smallest possible footprint -- only binary data, no markup at all. Serialization/deserialization is very fast too.
Since it's IIOP (that is CORBA), almost all languages have bindings.
But I presume the performance is not the only requirement, right?
One of the things near the top of my "to-do" list for PBs is to port Google's internal Protocol Buffer performance benchmark - it's mostly a case of taking confidential message formats and turning them into entirely bland ones, and then doing the same for the data.
When that's been done, I'd imagine you could build the same messages in Thrift and then compare the performance.
In other words, I don't have the data for you yet - but hopefully in the next couple of weeks...
To back up Vladimir's point about IIOP, here's an interesting performance test, that should give some additional info over the google benchmarks, since it compares Thrift and CORBA. (Performance_TIDorb_vs_Thrift_morfeo.pdf // link no longer valid)
To quote from the study:
Thrift is very efficient with small
data (basic types as operation
arguments)
Thrifts transports are not so efficient as CORBA with medium and
large data (struct and >complex
types > 1 kilobytes).
Another odd limitation, not having to do with performance, is that Thrift is limited to returning only several values as a struct - although this, like performance, can surely be improved perhaps.
It is interesting that the Thrift IDL closely matches the CORBA IDL, nice. I haven't used Thrift, it looks interesting especially for smaller messages, and one of the design goals was for a less cumbersome install, so these are other advantages of Thrift. That said, CORBA has a bad rap, there are many excellent implementations out there like omniORB for example, which has bindings for Python, that are easy to install and use.
Edited: The Thrift and CORBA link is no longer valid, but I did find another useful paper from CERN. They evaluated replacements for their CORBA system, and, while they evaluated Thrift, they eventually went with ZeroMQ. While Thrift performed the fastest in their performance tests, at 9000 msg/sec vs. 8000 (ZeroMQ) and 7000+ RDA (CORBA-based), they chose not to test Thrift further because of other issues notably:
It is still an immature product with a buggy implementation
I have done a study for spring-boot, mappers (manual, Dozer and MapStruct), Thrift, REST, SOAP and Protocol Buffers integration for my job.
The server side: https://github.com/vlachenal/webservices-bench
The client side: https://github.com/vlachenal/webservices-bench-client
It is not finished and has been run on my personal computers (I have to ask for servers to complete the tests) ... but results can be consulted on:
Laptop: https://github.com/vlachenal/webservices-bench/blob/master/results.md
Desktop: https://github.com/vlachenal/webservices-bench/blob/master/results-desktop.md
As conclusion :
Thrift offers the best performance and is easy to use
RESTful webservice with JSON content type is pretty close to Thrift performance, is "browser ready to use" and is quite elegant (from my point of view)
SOAP has very poor performance but offers the best data control
Protocol Buffers has good performance ... until 3 simultaneous calls ... and I don't know why. It is very difficult to use: I give up (for now) to make for it work with MapStruct and I don't try with Dozer.
Projects can be completed through pull requests (either for fixes or other results).