I'm looking into a NoSQL database for use with Vert.x
Based on the not so favorable results mongoDB is out, so I'm looking at CouchDB/CouchBase, not at least since some of our data collection runs on RaberryPI fed by Arduino I/O (with a Rasbery PI CouchDB instance for offline collection).
What Java library would be suitable/best for use with CouchDB and Vert.x
I don't know a lot about vert.x but it appears to run on the JVM, so you should just be able to use Ektorp, which is pretty much the standard Java library for CouchDB nowadays. It covers all the core functionality, it's fairly well thought out, and the maintainer has been reasonably responsive to pull requests etc, as far as I've seen.
There's more documentation on Ektorp here.
Related
I would like to build a distributed NoSQL database or key-value store using golang, to learn golang and practice distribute system knowledge I've learnt from school. The target use case I can think of is running MapReduce on top of it, and implement a HDFS-compatible "filesystem" to expose the data to Hadoop, similar to running Hadoop on Ceph and Amazon S3.
My question is, what difficulties should I expect to integrate such an NoSQl database with Hadoop? Or integrate with other languages (e.g., providing Ruby/Python/Node.js/C++ APIs?) if I use golang to build the system.
Ok, I'm not much of a Hadoop user so I'll give you some more general lessons learned about the issues you'll face:
Protocol. If you're going with REST Go will be fine, but expect to find some gotchas in the default HTTP library's defaults (not expiring idle keepalive connections, not necessarily knowing when a reader has closed a stream). But if you want something more compact, know that: a. the Thrift implementation for Go, last I checked, was lacking and relatively slow. b. Go has great support for RPC but it might not play well with other languages. So you might want to check out protobuf, or work on top the redis protocol or something like that.
GC. Go's GC is very simplistic (STW, not generational, etc). If you plan on heavy memory caching in the orders of multiple Gs, expect GC pauses all over the place. There are techniques to reduce GC pressure but the straight forward Go idioms aren't usually optimized for that.
mmap'ing in Go is not straightforward, so it will be a bit of a struggle if you want to leverage that.
Besides slices, lists and maps, you won't have a lot of built in data structures to work with, like a Set type. There are tons of good implementations of them out there, but you'll have to do some digging up.
Take the time to learn concurrency patterns and interface patterns in Go. It's a bit different than other languages, and as a rule of thumb, if you find yourself struggling with a pattern from other languages, you're probably doing it wrong. A good talk about Go concurrency is this one IMHO http://www.youtube.com/watch?v=QDDwwePbDtw
A few projects you might want to have a look at:
Groupcache - a distributed key/value cache written in Go by Brad Fitzpatrick for Google's own use. It's a great implementation of a simple yet super robust distributed system in Go. https://github.com/golang/groupcache and check out Brad's presentation about it: http://talks.golang.org/2013/oscon-dl.slide
InfluxDB which includes a Go based version of the great Raft algorithm: https://github.com/influxdb/influxdb
My own humble (pretty dead) project, a redis compliant database that's based on a plugin architecture. My Go has improved since, but it has some nice parts, and it includes a pretty fast server for the redis protocol. https://bitbucket.org/dvirsky/boilerdb
I am trying to pick a right web technology both for I/O heavy and CPU heavy tasks. NodeJs is perfect for handling large load and it also can be scaled out. However, I am stuck with the cpu heavy part. Is it possible to integrate another technology (e.g. Java) into node, so that I will have it running my algorithms in other threads and then use the results again in node. Is there any existing solution? Any other suggestions will be very good.
You can intergrate NodeJS with Java using node-java.
As mentioned in a previous answer, you can use node-java which is an npm module that talks to Java. You can also use J2V8 which wraps Node.js as a Java library and provides a Node.js API in Java.
The answer is lambda architecture.
NodeJs is nice by itself - handling fast queries in a lightweight manner, not doing any extra computations on data.
The CPU heavy tasks can be easily delegated to specialized components based on JVM (well, the most famous ones are on JVM). This is nicely implemented by using message brokers and microservices.
An event-based architecture, where nodejs can be hooked up to databases like Cassandra or Mongodb and cluster computing frameworks like Apache Spark (not necessarily, though, it depends on the problem) to handle the cpu-heavy parts of the system. And lightweight containers add an icing to the cake by providing nice isolated runtime environments for each of the components to live in.
That's my conclusion so far regarding this question.
I think the suggestions above sort of eliminate the need to wrap node under java or other JVM based solution for cpu-heavy tasks.
NodeJS is based on the v8 javascript engine which is written in c++.
It is therefore possible to write fully native addons in c++ for NodeJS. Check out some of these resources:
https://github.com/nodejs/node-addon-api
https://github.com/nodejs/node-addon-examples
Our frontend is simple Jetty (might be replaced with Tomcat later on) server. Through servlets, we are providing a public HTTP API (more or less RESTful) to expose our product functionality.
In the backend, we have a Java process which does several kind of maintenance tasks. While the backend process usually does it own tasks when it's time, now and then, the frontend needs to wake-up the backend to execute a certain task in the background.
Which (N)IO library would be ideal for this task? I found Netty, Grizzly, kryonet and plain RMI. For now, I am inclined to say Netty, it seems simple to use and it is probably very reliable.
Does any of you have experience in this kind of setups? What would your choice be?
thanks!
Try to translate this document which answer to your question.
http://blog.xebia.fr/2011/11/09/java-nio-et-framework-web-haute-performance/
This society, as french famous Java EE experts, did a lot of poc of NIO servers in the context of a french challenge sponsored by VmWare (USI2011). It was about building a simple quizz app that can handle a load of 1 million connected users.
They won that challenge with great results.
Their implementation was Netty + Gemfire and they only replaced the CachedThreadPool by a MemoryAwareThreadPool.
Netty seems to offer great performances, and is well documented.
They also considered Deft, inspired by Tornado (python/facebook) but it's still a bit immature for them
Edit: here's the translated link provided in the comments
My preference is Netty. It's simple yet flexible. Very fast and the community around Netty is awesome.
The company I work for is currently evaluating CoralReactor. It is a commercial software but it has the easiest API I have ever seen for Java NIO. My personal opinion is that Netty makes things too complicated, especially if you want to go garbage-free and single-threaded, which are a requirement for many companies from the finance, advertisement and game industry.
I would decouple them by using JMS, just have some (set of) control queues your backend sits there listening on and you're done. No need to write a custom nio api here.
One sample provider is hornetq. This can be run as an in process jms broker as well, it uses Netty under the covers.
Well, I've taken help of Google, Stackoverflow and whatever else I could find, did as much as I could, but it seems that I am unable to find out an exact answer! I have multiple queries, and I would love to have answers from the database-people as well as from the programmers and framework users.
From the programming languages, I know C/C++, Java and Python. I have undertaken a CMS project that would require frequent C's & R's of the CRUD. The project would have 50k users atleast. The head-to-toe of the project has been all figured out, and now I need to code it and make it live online.
Well, I want to use Neo4j as my database as its data representation model (nodes and relationships) is closest to the real project model. Now, neo4j has bindings for various languages, and one of them is Python (whose python bindings are very oldish, the jpype hasn't been updated since ages). I am thinking of going for some Java based framework, but then I leave this idea as I personally haven't heard much of java frameworks. But one of my partner tells me to go for Zend (PHP) as it has some kind of functionality that lets us execute Java code. Won't this slow the code? I mean executing one language's code in another language...
So, it all comes to this:
1) Database: I would want to go for Neo4j. But does it goes well off when the scalability factor kicks in? (From what I could gather from google, there are no scalability issues).
2) What framework to use in case of Neo4j? I would require a framework that is able to handle tonnes of requests and large data as the users of the project would be Creating and Reading data a lot.
P.S.: I know it is a long question, but couldn't jot it down in lesser words!
I can't speak about the scalability or suitability of Neo4J for your particular project.
However, I'd strongly advise you against trying to mix and match languages like Java and PHP. It's so much easier to stick to the best one for your particular task. I'd also strongly advise you against using JNI for anything unless you have no other option. Java is fast enough that you should almost never need JNI for performance.
That said, it's OK to run Neo4j in its "full server" mode and then have your PHP or Python application access it using some driver over the network. I just wouldn't recommend making an ugly hybrid of PHP and Java at your application layer.
Some decent Java frameworks you could check out include:
Spring
Google Guice with Sitebricks
Apache Struts 2
They're pretty standard in the industry and there are tons of good resources available on all of them.
In regards to the mini-question about language interoperability, Java provides the JNI interface, which allows the JVM and user code to make calls into other languages and vice versa. When the native code (e.g. C code called by Java, or Java called from C) runs, it is actually running in its natural environment, so there's no performance loss in terms of actual execution.
Neo4j as a standalone server has also REST API: http://docs.neo4j.org/chunked/milestone/rest-api.html, if you can embedded your requests in single REST queries, there is no need to use native embedded neo4j. If there is no need to use the embedded neo4j, you can take any language of your choice.
Regarding the scalability, recently neo4j can be used on Azure, so it must be quite easy to scale. To learn more how to scale neo4j, go to this page on neo4j.org.
UPDATE: in the newest version of Neo4j, there is added the support for a new query language - http://blog.neo4j.org/2011/06/kiruna-stol-14-milestone-4.html.
Would like to hear from people about their experience with java clustering (ie. implementing HA solutions). aka . terracotta, JGroups etc. It doesn't have to be web apps. Experience writing custom stand alone servers would be great also.
UPDATE : I will be a bit more specific -> not that interested in Web App clustering (unless it can be pulled out and run standalone). I know it works. But we need a bit more than just session clustering. Examining solutions in terms of ease of programming, supported topologies (ie. single data center versus over the WAN ), number of supported nodes. Issues faced, workarounds. At the moment I am doing some POC (Proof of concept) work on Terracotta and JGroups to see if its worth the effort for our app (which is stand alone, outside of a web container).
Jboss clustering was very easy to get up and running.
It seems to work well for us.
You might want to take a look at Hazelcast. It is super lite, easy and free clustering platform with cluster API. If you are clustering your application state/data, Hazelcast can be great help with its distributed/partitioned, queue, map, set, list and lock implementations.
Regards,
-talip
http://www.hazelcast.com
You may look at Oracle Coherence (formerly Tangosole Coherence).
http://www.oracle.com/technology/products/coherence/coherencedatagrid/coherence_solutions.html
I saw a demonstration of GridGain at our local JUG and I was very impressed. The documentation is very complete and it's very easy to get it going. I haven't started using it yet, so I can't quite say that it's working for us.
http://www.gridgain.com/
JBossCache is a standalone open source project that JbossClustering makes use of in the Application Server.
Our company made use of it in our own custom network server, its working well so far in development, though yet to be deployed.
Its a pretty simple API, and it comes in two flavors, a flat cache, or a "POJO Cache" that uses insturmentation to keep State across servers. Basically, updates to fields are propgated across the network using JGroups.