I am trying to set up a Kafka system. Since most of the existing code in my project is already in PHP, I will most probably be writing the producers in PHP itself. But I am comparatively very less constrained when it comes to choosing a language to write the consumer. Now, that there are so many clients which can be used I am in a fix.
In other to order to choose the right tech here, what are the various factors that should be kept in mind?
Would especially like to apply this knowledge to choose between java client vs node client(multithreaded model vs async model)
Any help will be highly appreciated.
The Java client is the most advance client and officially supported by the Kafka Project -- most other clients are third party projects and many do not implement all available features.
Thus, I would recommend to use Java clients.
Kafka is basically written in pure Java and Kafka’s native API is java, so this is the only language where you’re not using a third-party library.You always have an edge over writing in other languages which have an additional overhead.
Node.js isn’t optimized for high throughput applications such as Kafka. So if you need the high processing rates that come standard on Kafka, or perhaps C++.
Also, I believe Kafka consumer clients written in Java has good community support. So it makes sense to implement it using java as long as you don't have any other dependency stopping you from implementing it from.
Also, check this out for the benchmarking results using various Kafka Clients. The results are contrasting.
Client Type Throughput(No of messages)
Java 40,000 - 50,0000
Go 28,000 - 30,0000
Node 6,000 - 8,0000
Kafka-pixy 700 - 800
Logstash 250
As far as Kafka goes, I'd use any of the languages with an official Confluent supported client: JVM, C/C++, .NET, Python, Go
I'm sure you can get others to work like Node or PHP, and maybe those can use the C library, but I would prefer something with official language support and a broader user to ask questions to.
Related
#kafka users:
I have been trying to understand python client for kafka, including PyPy client as well. Following was a good benchmarking i read and realized some similar results:
http://mrafayaleem.com/2016/03/31/apache-kafka-producer-benchmarks/
I am extremely confused whether Java has a massive advantage over python, as the libraries are written using Java and kafka. So my question is, does the native implementation of Kafka in Java helps with performance tremendously when Java is used, or PyPy/Python works equally better?
Being a python programmer, I am not at all comfortable with java, and hence confused.
Apache Kafka defines a language neutral wire protocol (see https://kafka.apache.org/protocol) and so clients can be written in any programming language and do not need to be based on the Java client that ships with the core Kafka distribution. For example, the c/c++ librdkafka library is a very high performance non-Java client. There are a number of python Kafka clients including one that is based on librdkafka. Benchmark results and other comparison information for the various Kafka Python clients is available here http://activisiongamescience.github.io/2016/06/15/Kafka-Client-Benchmarking/
I am trying to pick a right web technology both for I/O heavy and CPU heavy tasks. NodeJs is perfect for handling large load and it also can be scaled out. However, I am stuck with the cpu heavy part. Is it possible to integrate another technology (e.g. Java) into node, so that I will have it running my algorithms in other threads and then use the results again in node. Is there any existing solution? Any other suggestions will be very good.
You can intergrate NodeJS with Java using node-java.
As mentioned in a previous answer, you can use node-java which is an npm module that talks to Java. You can also use J2V8 which wraps Node.js as a Java library and provides a Node.js API in Java.
The answer is lambda architecture.
NodeJs is nice by itself - handling fast queries in a lightweight manner, not doing any extra computations on data.
The CPU heavy tasks can be easily delegated to specialized components based on JVM (well, the most famous ones are on JVM). This is nicely implemented by using message brokers and microservices.
An event-based architecture, where nodejs can be hooked up to databases like Cassandra or Mongodb and cluster computing frameworks like Apache Spark (not necessarily, though, it depends on the problem) to handle the cpu-heavy parts of the system. And lightweight containers add an icing to the cake by providing nice isolated runtime environments for each of the components to live in.
That's my conclusion so far regarding this question.
I think the suggestions above sort of eliminate the need to wrap node under java or other JVM based solution for cpu-heavy tasks.
NodeJS is based on the v8 javascript engine which is written in c++.
It is therefore possible to write fully native addons in c++ for NodeJS. Check out some of these resources:
https://github.com/nodejs/node-addon-api
https://github.com/nodejs/node-addon-examples
I have 2 processes that need to communicate over the same PC and different PCs. In the local case the process communication is among different processes e.g Process A and Process B.
In the remote case it will be among 2 instances of Process A running in different PCs.
I will create them from scratch and I am wondering what is the best approach. I am aware of RMI and sockets but I was wondering for my case as described, and taking also into account that the messages exchanged are small and the number of APIs really small, if there is a standard approach/library for this.
Any suggesstions are highly welcome
Update after #EJP comments:
My interest is 1)to implement the requirement for communication in a light manner since the API exposed will be really small and the messages as well 2)use and learn a new popular framework if possible (I already know RMI and sockets)
If you are just looking for messaging frameworks, there's a bunch available out such as
RabbitMQ - http://www.rabbitmq.com/
ZeroC Ice - http://www.zeroc.com/ice.html
AMQP - http://www.amqp.org
OpenSplice DDS - http://www.prismtech.com/opensplice
But when you use a 3rd party framework, you are then adding an additional dependency to your application. If it is something very simple like your case, perhaps writing a TCP client/server would be sufficient for a client/server paradigm or if you are looking for publisher/subscriber paradigm then you can look into using UDP multicast. You just need your data class to extends Serializable if you want to be able to marshal and unmarshal your data to buffer and send it over to network using typical JAVA socket API.
I strongly suggest having a look at Thrift. From all the technologies I've used (web services, RMI, XML-RPC, Corba comes to mind) it is currently my favourite. Essentially the steps involved are:
Download the Thrift compiler.
Add the Maven dependency (make sure it is the same version as the compiler!) I currently use 0.8.0.
Write your Thrift IDL (incredibly easy, google for it as there are plenty of examples).
Compile it for Java.
Writer your server/client.
In general, you can whip together a server and a client in about 30 lines of code. In terms of speed and reliability it has never failed me before.
You might have a look at Versile Java (full disclosure: I am one of the developers), it satisfies at least your criteria #1. From the API documentation, here are some examples of writing remote-enabled objects, running a service, and connecting to a service.
If you want to learn something new then I'd look at OpenSplice. The reason is pretty simple, among the technologies suggested above is the only one that provides you with Data-Centric abstractions.
The cool thing about OpenSplice is that gives you the abstraction of a Global Data Space, yet the implementation of this global data space is fully distributed and very high performance.
Take a look at some of the slides available at http://www.slideshare.net/angelo.corsaro and I am sure you'll get in love with the technology.
Finally OpenSplice is Open Source.
Happy Hacking.
A+
JMX is a good alternative .
Example :
http://www.javalobby.org/java/forums/t49130.html
IMB JMX Example
http://alvinalexander.com/blog/post/java/source-code-java-jmx-hello-world-application
Our frontend is simple Jetty (might be replaced with Tomcat later on) server. Through servlets, we are providing a public HTTP API (more or less RESTful) to expose our product functionality.
In the backend, we have a Java process which does several kind of maintenance tasks. While the backend process usually does it own tasks when it's time, now and then, the frontend needs to wake-up the backend to execute a certain task in the background.
Which (N)IO library would be ideal for this task? I found Netty, Grizzly, kryonet and plain RMI. For now, I am inclined to say Netty, it seems simple to use and it is probably very reliable.
Does any of you have experience in this kind of setups? What would your choice be?
thanks!
Try to translate this document which answer to your question.
http://blog.xebia.fr/2011/11/09/java-nio-et-framework-web-haute-performance/
This society, as french famous Java EE experts, did a lot of poc of NIO servers in the context of a french challenge sponsored by VmWare (USI2011). It was about building a simple quizz app that can handle a load of 1 million connected users.
They won that challenge with great results.
Their implementation was Netty + Gemfire and they only replaced the CachedThreadPool by a MemoryAwareThreadPool.
Netty seems to offer great performances, and is well documented.
They also considered Deft, inspired by Tornado (python/facebook) but it's still a bit immature for them
Edit: here's the translated link provided in the comments
My preference is Netty. It's simple yet flexible. Very fast and the community around Netty is awesome.
The company I work for is currently evaluating CoralReactor. It is a commercial software but it has the easiest API I have ever seen for Java NIO. My personal opinion is that Netty makes things too complicated, especially if you want to go garbage-free and single-threaded, which are a requirement for many companies from the finance, advertisement and game industry.
I would decouple them by using JMS, just have some (set of) control queues your backend sits there listening on and you're done. No need to write a custom nio api here.
One sample provider is hornetq. This can be run as an in process jms broker as well, it uses Netty under the covers.
I know that there is a supported Scala DSL for Camel. Apart from that
Is it realistic to replace Java (the language) completely by Scala for a Camel based project?
Which kind of known problems are known to exist?
Which workarounds exist for those problems (other than using Java)?
I am mainly looking for less boilerplaty code.
Akka offers stable Scala-idiomatic Camel integration.
The akka-camel module allows actors,
untyped actors and typed actors to
receive and send messages over a great
variety of protocols and APIs. This
section gives a brief overview of the
general ideas behind the akka-camel
module, the remaining sections go into
the details. In addition to the native
Scala and Java actor API, actors can
now exchange messages with other
systems over large number of protcols
and APIs such as HTTP, SOAP, TCP, FTP,
SMTP or JMS, to mention a few. At the
moment, approximately 80 protocols and
APIs are supported.
Apart from that, I'm sure this replacement is possible due to a good interop, and there could hardly be any Scala-specific issues that are not peculiar to Java. E.g., Akka Actors used for publishing to/consuming from Camel endpoints are based on java.util.concurrency, and the only problem I can think of is a fixable bug in the library.
In the meantime a relatively simple Scala DSL has been developed for Camel, that should have the functionality of the Java DSL.
To decide if it is realistic for you, consider:
- The quality of the IDE support for the languages
- The Scala language complexity
- The Scala/Java language popularity
- DSL extension possibilities. In Scala, it should be possible (with some Scala magic) to to extend the DSL (add additional DSL elements)
If you decide to try it out, it would be great if you share your experience with the Apache Camel community your impressions on:
code readability, code maintainability, code efficiency, developer satisfaction, code size, the number of "man-days".
Since then (2010-2011), there is now (Sept 2016) a recent initiative named for Akka Streams Integration, codename Alpakka.
We believe that Akka Streams can be the tool for building a modern alternative to Apache Camel. That will not happen by itself overnight and this is a call for arms for the community to join us on this mission. The biggest asset of Camel is its rich set of endpoint components. We would like to see that similar endpoints are developed for Akka Streams.
See "akka/akka-stream-contrib".