I'm working on Clojure app where a client needs to send some commands to a server. These will happen in quite large volumes, so I'd like it to be reasonably efficient, both in terms of processing and over-the-wire serialised size.
What would be the best way of doing this in Clojure?
Currently I'm thinking of:
Creating a simple standard representation e.g. {:command-id 1, :params [1 2 3 "abc"]}
Serialising using some efficient Java library such as Kryo, and configuring it to understand the Clojure data types
Hacking together an appropriate Client/Server socket implementation using the Java NIO libraries for the transmission over TCP/IP
However this seems a little convoluted and I'm sure other people have come up with smarter approaches. Any ideas / advice much appreciated!
If parameters aren't too big and source is trusted, why not send s-expressions back and fort,
(eval (read-string "(println \"Hello World\")"))
Clojure being a LISP dialect code is data.
EDIT:
For safety, after reading the string you check the command against a valid set of commands,
(contains? #{'println}
(first (read-string "(println \"Hello World\")")))
or you can use a library designed for this such as
http://github.com/Licenser/clj-sandbox
How about Google's protocol buffers? There's a library for dealing with them from Clojure: clojure-protobuf. I remember someone on Freenode #clojure is doing a Haskell vs. OCaml vs. Clojure comparison on a serious task (processing loads of Twitter data); s/he's been lavishing praise on the lib.
Update: Here's the relevant utterance from the #clojure conversation I had in mind.
My answer is not Clojure-specific, but I tend to prefer strings over http - it's reasonably standard and reasonably efficient.
There are libraries for JSON in pretty much every language, I'd go with that (along with your simple standard command format) unless the data volume is massive.
My experience is that the less you need to fiddle with specialized formats, sockets and protocols the more likely it is that you can spend the weekend on the beach :).
I'd reserve anything more complicated than JSON over http until after benchmarking shows a need for something else.
Related
I have a client-server system implemented in C#, and the client and server exchange .Net objects via serialization / deserialization and communicating via TCP/IP. This runs on a local network, it is not web-based or Internet-based.
Now I want to include Android clients connected by wifi. Again, this is local network only, not via the Internet and not web-based. The Android programming will be in Java. (I am aware of Mono for Android, but prefer not to get into that now.)
Is there some fairly simple way to implement object to object interchange between Java and .Net objects, provided, of course, that they are compatible?
I've looked a bit at JSON (Jackson on the Java end and Json.Net on the .Net end), and I'm guessing it can probably be done, but only with major efforts on remapping things at each end as soon as the objects become fairly complicated.
Any other suggestions? JSON-based or otherwise?
PS. My question is somewhat related to this one Mapping tool for converting Java's JSON to/from C#, but it never got a suitable answer, perhaps due to insufficient info in the question. Also, I don't care whether I end up using a JSON-based transport or XML or something else.
I would suggest either JSON or XML (which is based on a .xsd file) because these are independent of their respective implementations (instead of something like an ObjectOutputStream in java).
The problem of having this format between the two components (client and server) is that they need to be at the same version. My best practice is to have one underlying definition of the format (i use xml with an xsd file which specifies how the xml has to look like), then use jaxb to generated java classes. That way you can (un)marshal from/to xml in the java part.
I am very sure a similar thing exists in the world of .NET.
JSON is smaller than xml in size, i find xml to be more readable.
SO user "default locale" should get the honor for this, but he/she has only answered via a comment. So just to make it very clear what my choice was I'll answer my own question.
I've decided to go with Google Protocol Buffers, which in my opinion has much better support for moving objects back and forth between Java and .Net than JSON. Because I have a lot of experience with C#, and a lot of existing C#-defined classes, I've selected Marc Gravell's protobuf-net program for the .Net end, and Google's own support for the Android end (no - see edit). This implies that I'm defining the objects in C#, not in .proto files - protobuf-net generates the .proto files from which I then generate the Java code.
Incidentally, as the transport mechanism I'm using a little-known program called naga on the Android end. http://code.google.com/p/naga/ Naga seems to work fine, and is well-documented and has sample programs, and should be better known in my opinion.
EDIT:
OK, I've got it working now to my satisfaction. Here's what I'm using:
Google Protocol buffers as the interchange format: https://developers.google.com/protocol-buffers/
Marc Gravell's protobuf-net at the C# end: http://code.google.com/p/protobuf-net/
A program called called protostuff at the Java end: http://code.google.com/p/protostuff/
(I prefer protostuff to the official Google Java implementation of protocol buffers due to Google's implementation being based on the Java objects being immutable.)
Actually, I'm not using pure protocol buffers as the interchange format - I prefix the data with the name of the (outermost) class being transmitted. This makes the data self-identifying for deserializing at the other end.
You can also try wox (https://github.com/codelion/wox), it is a cross platform serialization library for Java and C# based on XML.
My requirement is to IPC between a C client and Java server on Windows using JSON strings.
Just realized that I can't use named pipe ("\.\pipe\filename") in Windows from java. I'm not too keen on using any network based architecture, because its gonna get more complicated to ensure the security and speed.
Kindly suggest any shared memory/fast solution you happen to know?
Thanks in Advance :)
You can use named pipes on Windows; the answers to this question and this question give different solutions to that.
In addition to those, by using llvm (clang, in your case) followed by the LLJVM Translator, you can compile code from hundreds of programming languages to LLVM bytecode, and then translate that to JVM bytecode, at which point your existing Java code can call functions from your (compiled and translated) code.
Last but certainly not least, in order to avoid doing work that you might not need to do, you should focus on solving your problems using clear, maintainable code, and leave optimisation until you're certain that it needs to be done. At that point, your profiler becomes your friend for measuring the bottleneck and verifying the optimisations you perform.
You can use UDP or TCP for IPC.
It is also quite portable solution, if you later move your programs to an other OS.
With TCP, it is quite easy to scale the system: ie: running programs in different hosts.
Because of unreliable nature of UDP, it may be little bit harder to use it over unreliable network.
Recently I got introduced to node.js and cool packages like express and jade. I have few questions consistently knocking my door:
If I pick node.js to build my next website, I will be using JavaScript to write my server-side complicated logic? but I don't think you can compare JavaScript with Java or Python to write server-side code as they have such a vast ocean of libraries. Is node.js really meant for it? or I have missed something?
Can I call Java or Python from node.js?
Not quite sure what most of these folks are talking about.
A "vast ocean of libraries" is something the community is actively working on. Check this: http://search.npmjs.org/#/_analytics -- there were 8 packages published yesterday
Its not going to solve your software design for you. As for where and how to write business logic, many of us embrace mvc or mvvm or something close to it. If you're building an application and like how Rubyists (for example), structure their code you might look at doing something just like that -- aint nobody going to tell you how to structure your code.
Check https://github.com/joyent/node/wiki/modules
Some of the more popular libraries for doing the day to day:
Express: http://expressjs.com/ - https://github.com/visionmedia/express
Sinatra inspired, use it to build a typical web app
Stats: 3407 watchers, 286 forks, on pull request 778
Compare that to Sinatra itself! 2529 watchers, 366 forks
With connect, it supports all kinds of middleware:
sessions,
all kinds of routing,
static files
some 15 different templating engines
validation, form handling, etc, etc
Socket.io: http://socket.io/ - make it 'real-time'
DNode: https://github.com/substack/dnode - do rpc between anything
Backbone.js: http://documentcloud.github.com/backbone/ - MVC
Variety of techniques for re-using your models on the server:
http://andyet.net/blog/2011/feb/15/re-using-backbonejs-models-on-the-server-with-node/
Spine.js: http://maccman.github.com/spine.tutorials/index.html - MCV
Techniques for re-using code on the server:
http://maccman.github.com/spine.tutorials/node.html
caolan/async: https://github.com/caolan/async - Help manage your async business logic
Database, pick your poision
node_redis, https://github.com/mranney/node_redis - or one of the eight other clients
"This is a complete Redis client for node.js. It supports all Redis commands"
node-mysql, https://github.com/felixge/node-mysql - or one of eleven other clients/orms
node-mongodb-native, https://github.com/christkv/node-mongodb-native
node-postgres, https://github.com/brianc/node-postgres
There's also a host of ORMs out there, if thats your bag. Things like http://mongoosejs.com/, http://sequelizejs.com/ and friends
Test-driven development is at the core of node. There are 15 different TDD packages to choose from that range from full code coverage analysis to custom assert modules.
Saying all modules are incomplete is silly. There is an incredibly dedicated group of people building and maintaining tons working open-source in this community every day.
There might be reasons to pass over node, but its not for an inactive community or lack of libraries.
I would say you missed something - more specifically, the core purpose of Node.js, that is, the asynchronous I/O model.
I started a little pet project to test Node.js - how it "feels" and how to program on it. I became impressed by the ease of working in such ecosystem: Node.js code is easy to write (although its asynchronous paradigm is not that straightforward for the conventional programmer), libraries are easy to build etc. etc. Even npm is amazingly easy: I just found the most straightforward way to provide code of your own as a library is to make a public package of it - and it is absurdly easy!
However, there is not much good tools to work with Node.js. Maybe because it is too easy to do anything, most libraries are partially-implemented, undocumented solutions.
Also, note that the relevant difference of Node.js is not the JavaScript language, but the asynchronous I/O model. It is the most interesting aspect of Node.js, but the asynchronous programming style is not as well tested as the conventional way of web development. Maybe it is really the marvel that is propagandized - or perhaps, it is not as good as promised.
Even in the case it pays off, will you have enough developers to maintain such an (at least still) unusual codebase? If you can get a lot of advantages from the asynchronous "way of life" of Node.js, you can use more consolidated languages and frameworks, such as Twisted for Python (which is my preferred languabe, so take care with my opinion :) ). There may be something like this for Java, too. Anyway, I suspect that you do not have a lot of interest in this model for now, since your question focuses more on languages than in the programming paradigm, so Node.js does not have much to offer to you anyway.
So... no, I would not develop something professonaly in Node.js for now, although I think it is both fun and instructive to study. You can do it, however - just do not do it without having in mind the main purpose of Node.js: asynchronous-IO, event-driven programming. If it is what you want, Node.js is a good alternative.
Ryan did not start with JavaScript. A large part of why Node was created in JavaScript is that JavaScript lacked vast oceans of libraries.
Those vast oceans of libraries are almost all written in blocking code.
To take full advantage of Node.js you need to limit your self to non blocking libraries. Which means that might need to write some libraries to complete your project in Node.js.
I think you'll be surprised by the amount of work you can get done in JavaScript via Node.js. There are a bunch of libraries available for Node and more are being written all the time. Furthermore, native extensions are also available for those times where you might need to drop down to a lower-level.
If you think there's a gap where Node won't be able to provide for your business logic, take a look around NPM or give Google a quick serch to see if anyone else has already solved your problem.
Of course, you can use Python, PHP, c++ or other technologies with nodejs 'cuz node can run it as a child process. Nodejs give you the freedom to use any technology which you want inside itself. You can use whatever you want combining the most performance programs.
There are some things that JavaScript just can't do. If you come up against those Node might not be the best choice for your app. However you can probably accomplish most of what you need.
As far as the API being limited, I suggest you take a look at npm and all the libraries in its repository. Specifically ones like underscore.js. Many aim to fill in the gaps of what native JavaScript lacks compared to other languages.
I am doing a Software Engineering course in which different teams are building different prototype subsystems of a big system (different subsystem of F35 Lightning aircraft!).
The problem is that teams can use different programming languages (like C++ and Java) depending upon what they are most comfortable in. However, these subsystems need to be communicating with each other (like radar needs to provide object corodinates to navigation and control). Hence we need to come up with a solution in which different modules can interact in real time.
Someone suggested XML-RPC and hence I was reading about it. After reading it I think it is used in server client architecture. Is this a good way of doing interprocess kind of communication? What are my options?
Any help would be appreciated.
regards,
Newbie
There are a couple of options beside XML-RPC. For a short bullet-point comparison, take a look at:
http://michaeldehaan.net/2008/07/17/xmlrpc-vs-rest-vs-soap-vs-all-your-rpc-options/
If your exchange is more data-oriented, Protocol Buffers might be an alternative.
Protocol Buffers are a way of encoding structured data in an efficient yet extensible format.
Personally, I would go for lightweight exchange format or method first since the components are considered prototypes. Something like REST or some custom message passing might be simple enough, yet sufficient.
If you are already familiar with XML, it can be a reasonable answer. An advantage of XML is that you don't have to worry about how different machines represent numbers. A disadvantage is the time it takes to keep converting numbers to text and back to numbers.
We're looking into transport/protocol solutions and were about to do various performance tests, so I thought I'd check with the community if they've already done this:
Has anyone done server performance tests for simple echo services as well as serialization/deserialization for various messages sizes comparing EJB3, Thrift, and Protocol Buffers on Linux?
Primarily languages will be Java, C/C++, Python, and PHP.
Update: I'm still very interested in this, if anyone has done any further benchmarks please let me know. Also, very interesting benchmark showing compressed JSON performing similar / better than Thrift / Protocol Buffers, so I'm throwing JSON into this question as well.
Latest comparison available here at the thrift-protobuf-compare project wiki. It includes many other serialization libraries.
I'm in the process of writing some code in an open source project named thrift-protobuf-compare comparing between protobuf and thrift. For now it covers few serialization aspects, but I intend to cover more. The results (for Thrift and Protobuf) are discussed in my blog, I'll add more when I'll get to it.
You may look at the code to compare API, description language and generated code. I'll be happy to have contributions to achieve a more rounded comparison.
You may be interested in this question: "Biggest differences of Thrift vs Protocol Buffers?"
I did test performance of PB with number of other data formats (xml, json, default object serialization, hessian, one proprietary one) and libraries (jaxb, fast infoset, hand-written) for data binding task (both reading and writing), but thrift's format(s) was not included. Performance for formats with multiple converters (like xml) had very high variance, from very slow to pretty-darn-fast. Correlation between claims of authors and perceived performance was rather weak. Especially so for packages that made wildest claims.
For what it is worth, I found PB performance to be bit over hyped (usually not by its authors, but others who only know who wrote it). With default settings it did not beat fastest textual xml alternative. With optimized mode (why is this not default?), it was bit faster, comparable with the fastest JSON package. Hessian was rather fast, textual json also. Properietary binary format (no name here, it was company internal) was the slowest. Java object serialization was fast for larger messages, less so for small objects (i.e. high fixed per-operation noverhead).
With PB message size was compact, but given all trade-offs you have to do (data is not self-descriptive: if you lose the schema, you lose data; there are indexes of course, and value types, but from what you have reverse-engineer back to field names if you want), I personally would only choose it for specific use cases -- size-sensitive, closely coupled system where interface/format never (or very very rarely) changes.
My opinion in this is that (a) implementation often matters more than specification (of data format), (b) end-to-end, differences between best-of-breed (for different formats) are usually not big enough to dictate the choice.
That is, you may be better off choosing format+API/lib/framework you like using most (or has best tool support), find best implementation, and see if that works fast enough.
If (and only if!) not, consider next best alternative.
ps. Not sure what EJB3 here would be. Maybe just plain of Java serialization?
If the raw net performance is the target, then nothing beats IIOP (see RMI/IIOP).
Smallest possible footprint -- only binary data, no markup at all. Serialization/deserialization is very fast too.
Since it's IIOP (that is CORBA), almost all languages have bindings.
But I presume the performance is not the only requirement, right?
One of the things near the top of my "to-do" list for PBs is to port Google's internal Protocol Buffer performance benchmark - it's mostly a case of taking confidential message formats and turning them into entirely bland ones, and then doing the same for the data.
When that's been done, I'd imagine you could build the same messages in Thrift and then compare the performance.
In other words, I don't have the data for you yet - but hopefully in the next couple of weeks...
To back up Vladimir's point about IIOP, here's an interesting performance test, that should give some additional info over the google benchmarks, since it compares Thrift and CORBA. (Performance_TIDorb_vs_Thrift_morfeo.pdf // link no longer valid)
To quote from the study:
Thrift is very efficient with small
data (basic types as operation
arguments)
Thrifts transports are not so efficient as CORBA with medium and
large data (struct and >complex
types > 1 kilobytes).
Another odd limitation, not having to do with performance, is that Thrift is limited to returning only several values as a struct - although this, like performance, can surely be improved perhaps.
It is interesting that the Thrift IDL closely matches the CORBA IDL, nice. I haven't used Thrift, it looks interesting especially for smaller messages, and one of the design goals was for a less cumbersome install, so these are other advantages of Thrift. That said, CORBA has a bad rap, there are many excellent implementations out there like omniORB for example, which has bindings for Python, that are easy to install and use.
Edited: The Thrift and CORBA link is no longer valid, but I did find another useful paper from CERN. They evaluated replacements for their CORBA system, and, while they evaluated Thrift, they eventually went with ZeroMQ. While Thrift performed the fastest in their performance tests, at 9000 msg/sec vs. 8000 (ZeroMQ) and 7000+ RDA (CORBA-based), they chose not to test Thrift further because of other issues notably:
It is still an immature product with a buggy implementation
I have done a study for spring-boot, mappers (manual, Dozer and MapStruct), Thrift, REST, SOAP and Protocol Buffers integration for my job.
The server side: https://github.com/vlachenal/webservices-bench
The client side: https://github.com/vlachenal/webservices-bench-client
It is not finished and has been run on my personal computers (I have to ask for servers to complete the tests) ... but results can be consulted on:
Laptop: https://github.com/vlachenal/webservices-bench/blob/master/results.md
Desktop: https://github.com/vlachenal/webservices-bench/blob/master/results-desktop.md
As conclusion :
Thrift offers the best performance and is easy to use
RESTful webservice with JSON content type is pretty close to Thrift performance, is "browser ready to use" and is quite elegant (from my point of view)
SOAP has very poor performance but offers the best data control
Protocol Buffers has good performance ... until 3 simultaneous calls ... and I don't know why. It is very difficult to use: I give up (for now) to make for it work with MapStruct and I don't try with Dozer.
Projects can be completed through pull requests (either for fixes or other results).