I may be wrong, but as far as I understand, the whole Reactive/Event Loop thing, and Netty in particular, was invented as an answer to the C10K+ problem. It has obvious drawbacks, as all your code now becomes Async, with ugly callbacks, meaningless stack traces, and therefore hard to maintain and to reason about.
Go's language with goroutines was a solution, now they can write Sync code and also handle C10K+. So now Java comes up with Loom, which essentially copies the Go's solution, soon we will have Fibers and Continuations and will be able to write Sync code again.
So the questions are:
When the Loom is released in production, doesn't it make Netty kinda obsolete?
If we have Fibers and Continuations in Java, can we write nice Sync code and be ok with C10K+ without Netty?
Are there any advantages, for performance or solving C10K+, in writing Async code and using Netty, after production release of Loom?
I understand that Netty is more than just Reactive/Event Loop framework, it also has all the codecs for various protocols, which implementations will be useful somehow anyway, even afterwards.
I'm focusing on the reactive parts of Netty because those you seem to mostly want to address answering on a general level:
Currently reactive programming paradigms are often used to solve performance problems, not because they fit the problem. Those should be covered completely via project Loom.
However, some problems may remain where the reactive programming approach makes sense and is more straight forward to read than imperative code.
Reactive frameworks are typically stream oriented and are well suited to combine elements and operations on different entity/data streams. They also provide straight forward local event bus solutions with their provider/subscriber model. In such cases the reactive model might still be the best choice, performant and more readable than an imperative approach. But indeed, project loom should make all the "misuse" due to lack of better support in the native language structures obsolete.
Related
I would like to build a distributed NoSQL database or key-value store using golang, to learn golang and practice distribute system knowledge I've learnt from school. The target use case I can think of is running MapReduce on top of it, and implement a HDFS-compatible "filesystem" to expose the data to Hadoop, similar to running Hadoop on Ceph and Amazon S3.
My question is, what difficulties should I expect to integrate such an NoSQl database with Hadoop? Or integrate with other languages (e.g., providing Ruby/Python/Node.js/C++ APIs?) if I use golang to build the system.
Ok, I'm not much of a Hadoop user so I'll give you some more general lessons learned about the issues you'll face:
Protocol. If you're going with REST Go will be fine, but expect to find some gotchas in the default HTTP library's defaults (not expiring idle keepalive connections, not necessarily knowing when a reader has closed a stream). But if you want something more compact, know that: a. the Thrift implementation for Go, last I checked, was lacking and relatively slow. b. Go has great support for RPC but it might not play well with other languages. So you might want to check out protobuf, or work on top the redis protocol or something like that.
GC. Go's GC is very simplistic (STW, not generational, etc). If you plan on heavy memory caching in the orders of multiple Gs, expect GC pauses all over the place. There are techniques to reduce GC pressure but the straight forward Go idioms aren't usually optimized for that.
mmap'ing in Go is not straightforward, so it will be a bit of a struggle if you want to leverage that.
Besides slices, lists and maps, you won't have a lot of built in data structures to work with, like a Set type. There are tons of good implementations of them out there, but you'll have to do some digging up.
Take the time to learn concurrency patterns and interface patterns in Go. It's a bit different than other languages, and as a rule of thumb, if you find yourself struggling with a pattern from other languages, you're probably doing it wrong. A good talk about Go concurrency is this one IMHO http://www.youtube.com/watch?v=QDDwwePbDtw
A few projects you might want to have a look at:
Groupcache - a distributed key/value cache written in Go by Brad Fitzpatrick for Google's own use. It's a great implementation of a simple yet super robust distributed system in Go. https://github.com/golang/groupcache and check out Brad's presentation about it: http://talks.golang.org/2013/oscon-dl.slide
InfluxDB which includes a Go based version of the great Raft algorithm: https://github.com/influxdb/influxdb
My own humble (pretty dead) project, a redis compliant database that's based on a plugin architecture. My Go has improved since, but it has some nice parts, and it includes a pretty fast server for the redis protocol. https://bitbucket.org/dvirsky/boilerdb
As part of a study I am doing, I am exploring the supposed simplicity of using languages like Scala & Clojure to achieve concurrency on the JVM.
By simplicity, I am hoping to prove that these languages provide easier concurrency constructs than what Java 7 provides.
Therefore, I am hoping to find some good references that explain the complexities of Java's concurrency model.
Outside of pointing me in the direction of Google (which I have already searched with limited success), I would appreciate if those in-the-know could provide me with some good references to get me started off in this area.
Thanks
Java does not support lambda expressions. Creating an inline callback (eg, for the completion of an asynchronous call) requires 5 lines of boilerplate for an anonymous type.
This strongly discourages people from using callbacks. This is probably why Java 7 still does not have an interface for a callback that takes a value (as opposed to Runnable and Callbable), whereas C# has had one since 2005.
Therefore, the JDK does not have any real support for asynchronous operations.
The key to an asynchronous operation is the ability to kick off a long-running request, and have it run a callback when it finishes, without consuming a thread for the duration of the request. In Java, you can only do this by making a separate thread call get() on a Future<V>. This limits the concurrency of an application using the standard API to the number of threads you can sanely support.
To solve this problem, Google's Guava framework for better Java code introduces a ListenableFuture<V> interface which does have completion callbacks.
Languages like Scala fix this problem by supporting lambda expressions (which compile to anonymous classes) and adding their own Promise / Future types.
While higher level languages are easier to use multiple cores, what is often forgotten is why you want to use multiple cores which is to make the program faster e.g. increase its throughput.
When you consider options which increase concurrency, you need to test whether these options actually improve performance in some way. (Because very often they don't)
e.g. STM (Software Transactional Memory) makes it easier to write multi-threaded applications without having to worry about concurrency issues. The problem is that for trivial examples, it would be faster to not use STM and only use one thread.
Using multiple threads adds complexity and makes your application more fragile, so there has to be a good reason to do it otherwise you should stick to the simplest solution possible.
For more discussion
http://vanillajava.blogspot.co.uk/2011/11/why-concurency-examples-are-confusing.html
Recently I got introduced to node.js and cool packages like express and jade. I have few questions consistently knocking my door:
If I pick node.js to build my next website, I will be using JavaScript to write my server-side complicated logic? but I don't think you can compare JavaScript with Java or Python to write server-side code as they have such a vast ocean of libraries. Is node.js really meant for it? or I have missed something?
Can I call Java or Python from node.js?
Not quite sure what most of these folks are talking about.
A "vast ocean of libraries" is something the community is actively working on. Check this: http://search.npmjs.org/#/_analytics -- there were 8 packages published yesterday
Its not going to solve your software design for you. As for where and how to write business logic, many of us embrace mvc or mvvm or something close to it. If you're building an application and like how Rubyists (for example), structure their code you might look at doing something just like that -- aint nobody going to tell you how to structure your code.
Check https://github.com/joyent/node/wiki/modules
Some of the more popular libraries for doing the day to day:
Express: http://expressjs.com/ - https://github.com/visionmedia/express
Sinatra inspired, use it to build a typical web app
Stats: 3407 watchers, 286 forks, on pull request 778
Compare that to Sinatra itself! 2529 watchers, 366 forks
With connect, it supports all kinds of middleware:
sessions,
all kinds of routing,
static files
some 15 different templating engines
validation, form handling, etc, etc
Socket.io: http://socket.io/ - make it 'real-time'
DNode: https://github.com/substack/dnode - do rpc between anything
Backbone.js: http://documentcloud.github.com/backbone/ - MVC
Variety of techniques for re-using your models on the server:
http://andyet.net/blog/2011/feb/15/re-using-backbonejs-models-on-the-server-with-node/
Spine.js: http://maccman.github.com/spine.tutorials/index.html - MCV
Techniques for re-using code on the server:
http://maccman.github.com/spine.tutorials/node.html
caolan/async: https://github.com/caolan/async - Help manage your async business logic
Database, pick your poision
node_redis, https://github.com/mranney/node_redis - or one of the eight other clients
"This is a complete Redis client for node.js. It supports all Redis commands"
node-mysql, https://github.com/felixge/node-mysql - or one of eleven other clients/orms
node-mongodb-native, https://github.com/christkv/node-mongodb-native
node-postgres, https://github.com/brianc/node-postgres
There's also a host of ORMs out there, if thats your bag. Things like http://mongoosejs.com/, http://sequelizejs.com/ and friends
Test-driven development is at the core of node. There are 15 different TDD packages to choose from that range from full code coverage analysis to custom assert modules.
Saying all modules are incomplete is silly. There is an incredibly dedicated group of people building and maintaining tons working open-source in this community every day.
There might be reasons to pass over node, but its not for an inactive community or lack of libraries.
I would say you missed something - more specifically, the core purpose of Node.js, that is, the asynchronous I/O model.
I started a little pet project to test Node.js - how it "feels" and how to program on it. I became impressed by the ease of working in such ecosystem: Node.js code is easy to write (although its asynchronous paradigm is not that straightforward for the conventional programmer), libraries are easy to build etc. etc. Even npm is amazingly easy: I just found the most straightforward way to provide code of your own as a library is to make a public package of it - and it is absurdly easy!
However, there is not much good tools to work with Node.js. Maybe because it is too easy to do anything, most libraries are partially-implemented, undocumented solutions.
Also, note that the relevant difference of Node.js is not the JavaScript language, but the asynchronous I/O model. It is the most interesting aspect of Node.js, but the asynchronous programming style is not as well tested as the conventional way of web development. Maybe it is really the marvel that is propagandized - or perhaps, it is not as good as promised.
Even in the case it pays off, will you have enough developers to maintain such an (at least still) unusual codebase? If you can get a lot of advantages from the asynchronous "way of life" of Node.js, you can use more consolidated languages and frameworks, such as Twisted for Python (which is my preferred languabe, so take care with my opinion :) ). There may be something like this for Java, too. Anyway, I suspect that you do not have a lot of interest in this model for now, since your question focuses more on languages than in the programming paradigm, so Node.js does not have much to offer to you anyway.
So... no, I would not develop something professonaly in Node.js for now, although I think it is both fun and instructive to study. You can do it, however - just do not do it without having in mind the main purpose of Node.js: asynchronous-IO, event-driven programming. If it is what you want, Node.js is a good alternative.
Ryan did not start with JavaScript. A large part of why Node was created in JavaScript is that JavaScript lacked vast oceans of libraries.
Those vast oceans of libraries are almost all written in blocking code.
To take full advantage of Node.js you need to limit your self to non blocking libraries. Which means that might need to write some libraries to complete your project in Node.js.
I think you'll be surprised by the amount of work you can get done in JavaScript via Node.js. There are a bunch of libraries available for Node and more are being written all the time. Furthermore, native extensions are also available for those times where you might need to drop down to a lower-level.
If you think there's a gap where Node won't be able to provide for your business logic, take a look around NPM or give Google a quick serch to see if anyone else has already solved your problem.
Of course, you can use Python, PHP, c++ or other technologies with nodejs 'cuz node can run it as a child process. Nodejs give you the freedom to use any technology which you want inside itself. You can use whatever you want combining the most performance programs.
There are some things that JavaScript just can't do. If you come up against those Node might not be the best choice for your app. However you can probably accomplish most of what you need.
As far as the API being limited, I suggest you take a look at npm and all the libraries in its repository. Specifically ones like underscore.js. Many aim to fill in the gaps of what native JavaScript lacks compared to other languages.
I remember reading that the following features lead to the development of interesting frameworks/libraries in Python:-
(I read the article from http://www.python.org/workshops/2002-02/papers/09/index.htm)
A simple class model, which facilitates inheritance.
Dynamic typing, which means that the code needs to assume less.
Built-in memory management.
Java is statically compiled, and it has a garbage collector too. I wonder if its class model can be termed simple, however, keeping in mind the above mentioned points I have the following doubts:-
Does Java has a Twisted analogue in Python(which is just as powerful)?
Netty is an event-driven networking framework written in Java, so it would most likely be Twisted's equivalent. The features are relatively similar to Twisted, and it seems powerful (I don't have any firsthand experience). It seems like it is still actively maintained. You'd have to look into it yourself to really get an idea of whether or not it meets your requirements.
Apache Mina v2.0 is similar to Twisted.
There is one built for an openTSDB application. API is similar to python twisted.
http://tsunanet.net/~tsuna/async/1.0/com/stumbleupon/async/
When Java is providing the capabilities for concurrent programming, what are the major advantages in using Clojure (instead of Java)?
Clojure is designed for concurrency.
Clojure provides concurrency primitives at a higher level of abstraction than Java. Some of these are:
A Software Transactional Memory system for dealing with synchronous and coordinated changes to shared references. You can change several references as an atomic operation and you don't have to worry about what the other threads in your program are doing. Within your transaction you will always have a consistent view of the world.
An agent system for asynchronous change. This resembles message passing in Erlang.
Thread local changes to variables. These variables have a root binding which are shared by every thread in your program. However, when you re-bind a variable it will only be visible in that thread.
All these concurrency primitives are built on top of Clojures immutable data structures (i.e., lists, maps, vectors etc.). When you enter the world of mutable Java objects all of the primitives break down and you are back to locks and condition variables (which also can be used in clojure, when necessary).
Without being an expert on Clojure I would say that the main advantage is that Clojure hides a lot of the details of concurrent programming and as we all know the devil is in the details, so I consider that a good thing.
You may want to check this excellent presentation from Rick Hickey (creator of Clojure) on concurrency in Clojure. EDIT: Apparently JAOO has removed the old presentations. I haven't been able to locate a new source for this yet.
Because Clojure is based on the functional-programming paradigm, which is to say that it achieves safety in concurrency by following a few simple rules:
immutable state
functions have no side effects
Programs written thus pretty much have horizontal scalability built-in, whereas a lock-based concurrency mechanism (as with Java) is prone to bugs involving race conditions, deadlocks etc.
Because the world has advanced in the past 10 years and the Java language (!= the JVM) is finding it hard to keep up. More modern languages for the JVM are based on new ideas and improved concepts which makes many tedious tasks much more simple and safe.
One of the cool things about having immutable types is that most of the built-in functions are already multi-threaded. A simple 'reduce' will span multiple cores/processors, without any extra work from you.
So, sure you can be multi-threaded with Java, but it involves locks and whatnot. Clojure is multi-threaded without any extra effort.
Yes, Java provides all necessary capabilities for concurrent programs.
An analogy: C provides all necessary capabilities for memory-safe programs, even with lots of string handling. But in C memory safety is the programmer's problem.
As it happens, analyzing concurrency is quite hard. It's better to use inherently safe mechanisms rather than trying to anticipate all possible concurrency hazards.
If you attempt to make a shared-memory mutable-data-structure concurrent program safe by adding interlocks you are walking on a tightrope. Plus, it's largely untestable.
One good compromise might be to write concurrent Java code using Clojure's functional style.
In addition to Clojure's approach to concurrency via immutable data, vars, refs (and software transactional memory), atoms and agents... it's a Lisp, which is worth learning. You get Lisp macros, destructuring, first class functions and closures, the REPL, and dynamic typing - plus literals for lists, vectors, maps, and sets - all on top of interoperability with Java libraries (and there's a CLR version being developed too.)
It's not exactly the same as Scheme or Common Lisp, but learning it will help you if you ever want to work through the Structure and Interpretation of Computer Programs or grok what Paul Graham's talking about in his essays, and you can relate to this comic from XKCD. ;-)
This video presentation makes a very strong case, centred around efficient persistent data structures implemented as tries.
Java programming language evolution is quite slow mainly because of Sun's concern about backward compatibility.
Why don't you want just directly use JVM languages like Clojure and Scala?