Our application uses a topic to push message to a small set of subscribers. what sort of things should i look for when modeling a jms message with respect to the size of the actual message to be pushed. Are there any known limits or is application server specific? Any best practices or suggestions on this topic (pun unintended)?
You are likely to hit practical limits before you hit technical ones. That is, message lengths may have technical limits in the lengths that can be expressed in an int or long, but that's unlikely to be the first constraint you hit.
Message lengths up in the Megabytes tend to be heavyweight. Think in terms of a few K as the sort of ballpark you want to be in.
A technique used sometimes is to send small messages saying "Item 123435 has been updated", consumers then go retrieve data associated with Item 12345 from a database or other storage mechanism. Hence each client can get only the data they need, we don't spray large chunks of data around when subscribers may not need it all.
I suggest you to check the book Enterprise Integration Patterns, where a lot of patterns dealing with issues like the one you are asking are exhaustively analyzed. In particular, if the size of your message is large, you can use Message Sequence to solve the problem:
Huge amounts of data: Sometimes
applications want to transfer a really
large data structure, one that may not
fit comfortably in a single message.
In this case, break the data into more
managable chunks and send them as a
Message Sequence. The chunks
have to be sent as a sequence, and not
just a bunch of messages, so that the
receiver can reconstruct the original
data structure.
Quoted from http://www.eaipatterns.com/MessageConstructionIntro.html
The home page for a brief description of each pattern of the book is available at http://www.eaipatterns.com/index.html
It is implementation specific. As you might expect, smaller is better. Tibco, for instance, recommends to keep message sizes under 100 KB.
Small messages are faster, obviously. Having said that, the underlying JMS server implementation may improve performance using, for instance, message compression, such as Weblogic 9. (http://download.oracle.com/docs/cd/E13222_01/wls/docs92/perform/jmstuning.html#wp1150012)
Related
I have an small question about Kafka Group IDs, I can use this Annotaiton in Java:
#KafkaListener(topics = "insert", groupId = "user")
There I can set an groupId which it wanna consume, but i does not consume just this group id maybe because of that I can't send to specific group id. How I can send just to an special groupid? For what I can use the GroupID or I need to set the Topic special for sending the Kafka Messages specific?
I tried already to find an answer online, but I find nothing, maybe I use google false haha
I hope all understand me, if not pls quest :)
Thx alot already!
Welcome to Kafka! First of all: You can't send to a consumer group, you send to a Topic.
Too much text below. Be aware of possible drowsiness while trying to read the entire answer.
If you are still reading this, I assume you truly want to know how to direct messages to specific clients, or you really need to get some sleep ASAP.
Maybe both. Do not drive afterwards.
Back to your question.
From that topic, multiple Consumer Groups can read. Every CG is independent from others, so each one will read the topic from start to end, by their own. Think of a CG as an union of endophobic consumers: they won't care about other groups, they won't ever talk to another group, they don't even know if others exist.
I can think of three different ways to achieve your goal, by using different methodologies and/or architectures. The only one using Consumer Groups is the first one, but the other two may also be helpful:
subscribe
assign
Multiple Topics
The first two ones are based on mechanisms to divide messages within a single topic. The third one should only be justified on certain cases. Let's get into these options.
1. Subscribe and Consumer Groups
You could create a new Topic, fill it with messages, and add some metadata in order to recognize who needs to process each message (to who that message is directed).
Messages stored in Kafka contain, among other fields , a KEY and a VALUE (the message itself).
So let's say you want only GROUP-A to process some specific messages. One simple solution could be including an identifier on the key, such as a suffix. One of your keys could look like: key#GA.
On the consumer side, you poll() the messages from that topic, and add a little extra conditional logic before processing it: you'll just read the key and check the suffix. If it corresponds with the specified consumer group, in this case, it contains GA, then the consumer from GROUP-A knows that it must process the message.
For example, your Topic stores messages of two different natures, and you want them to be directed to two groups: GROUP-A and GROUP-Z.
key value
- [11#GA][MESSAGE]
- [21#GZ][MESSAGE]
- [33#GZ][MESSAGE]
- [44#GA][MESSAGE]
Both consumer groups will poll those 4 messages, but only some of them will be processed by each group.
Group-A will discard the 2nd and 3rd messages. It will process the 1st and 4th.
Group-Z will discard the 1st and 4th messages. It will process the 2nd and 3rd.
This is basically what you are aiming, but using some extra logic and playing with Kafka's architecture. The messages with certain suffix will be "directed" to an specific consumer group, and ignored by the other ones.
2. Assign
The above solution is focused on consumer groups and Kafka's subscribe methodology. Another possible solution, instead of subscribing consumer groups, would be to use Kafka's assign method. No ConsumerGroup is involved here, so references to the previous groups will be quoted in order to avoid any confusion.
Assign allows you to directly specify the topic/partition from which your consumer must read.
In the producer side, you should partition your messages in order to divide them between the partitions within your topic, using your own logic. Some more deeper info about custom partitioners here (yeah the author from the link seems like a complete douche).
For example, let's say you have 5 different types of consumers. So you create a Topic with 5 partitions, one for each "group". Your producer's custom partitioner identifies the corresponding partition for each message, and the topic would present this structure after producing the messages from the previous example:
In order to direct the messages to their corresponding "groups" :
"Group-Z" is assigned the 5th partition.
"Group-A" is assigned the 1st partition.
The advantage of this solution is that less resources are wasted: each "group" just polls his own messages, and as every message is verified to be directed to the consumer which polled it, you avoid the discard/accept logic: less traffic on the wire, fewer objects in memory, fewer cpu work.
The disadvatange consists in a more complex Kafka producer mechanism, which involves a custom partitioner, that most surely should be constantly updated regarding changes on your data or topic structures. Moreover, this will also lead to update the defined assigments of your consumers as well, each time the producer side is altered.
Personal note:
Assign offers a better perfomance, but carries a high price: manual and constant control of producers, topics, partitions and consumers, hence being (possibly) more error-prone. I would call it the efficient solution.
Subscribe makes all the process much simpler, and possibly will involve fewer problems/error on the system, hence being more reliable. I would call it the effective solution.
Anyway, this is a totally subjective oppinion.
Not finished yet
3. . Multi-topic solution.
The previously proposed solutions assume that the messages share the same nature, hence will be produced in the same Topic.
In order to explain what I'm trying to say here, let's say a Topic is represented as a storage building.
<--laptops, tables, smartphones,...
The previous solutions assume that you store similar elements there, for example, electronic devices; Their end of life is similar, the storage method is similar regardless of the specific device type, the machinery you use is the same, etc. With this in mind, it's completely logical to store all those elements into the same warehouse, divided in different sections (into the same topic, divided in different partitions).
There is no real reason to build a new warehouse for each electronic-device family (one for tv, other for mobile phones,... unless you are wrapped in money). The previous solutions assume that your messages are different types of "electronic devices".
But time passes by and you are doing well, so decide to start a new business: fruits storage.
Fruits have fewer life (log.retention.ms anyone?), must be stored inside a range of temperature, and probably your device storing elements and techniques from the first warehouse will differ by a lot. Moreover, you fruit business could be closed on certain periods of the year, while electronic devices are received 365/24. Even if you open your device's warehouse daily, maybe the fruit storage is only working on mondays and tuesdays (and with luck is not temporaly closed because of the period).
As fruits and electronic devices need different types of storage management, you decide to build a new warehouse. Your new fruis topic.
<--bananas, kiwis, apples, chicozapotes,...
Creating a second topic is justified here, since each one could need different configuration values, and each one stores contents from very different natures. This leads to consumers with also very different processing logics.
So, is this a 3rd possible solution?
Well, it does make you forget about consumer groups, partitioning mechanisms, manual assignations, etc. You only have to decide which consumers subscribe to which Topic, and you're done: you effectively directed the messages to specific consumers.
But, if you build a warehouse and start storing computers, would you really build another warehouse to store the phones that just arrived? In real life, you'll have to pay for the construction of the second building, as well as pay two taxes, pay for the cleaning of two buildings, and so on.
laptops here-> <-tablets here
In kafka's world, this would be represented as extra work for the kafka cluster(twice replication petitions,zookeeper has a newborn with new ACLs and controllers, ...), extra time for the human assigned to this job, since now is responsible of the management of two topics: A worker spending time on something that could be avoided is synonym of €€€ lost by the company. Also, I am not aware if they already do this or ever will, but cloud providers are somehow fond to insert small taxes for certain operations, such as creatiing a topic (but this is just a possibility, and I may be wrong here).
To resume, this is not necessarilly a bad idea: it just needs a justified context. Use it if you are working with Bananas and Qualcomm chips.
If you are working with Laptops and Tablets, go for the consumer group and partition solutions previously shown.
A client and a server application needs to be implemented in Java. The scenario requires to read large number of small objects from database on the server side and send them to client.
This is not about transferring large files rather it requires streaming large number of small objects to client.
The number of objects needs to be sent from server to client in a single request could be one or one million (let's assume the number of clients is limited for the sake of discussion - ignore throttling).
The total size of the objects in most cases will be too big to hold them in memory. A way to defer read and send operation on the server side until client requests the object is needed.
Based on my previous experience, WCF framework of .NET supports the scenario above with
transferMode of StreamedResponse
ability to return IEnumerable of objects
with the help of yield defer serialization
Is there a Java framework that can stream objects as they requested while keeping the connection open with the client?
NOTE: This may sound like a very general question, but I am hoping to give specific details that would hopefully lead to a clear answer benefiting me and possible others.
A standard approach is to use a form of pagination and get the results in chunks which can be accommodated temporarily in memory. How to do that specific it depends, but a basic JDBC approach would be to first execute a statement to find out the number of records and then get them in chunks. For example, Oracle has a ROWNUM column that you use in order to manage the ranges of records to return. Other databases have some other options.
You could use ObjectOutputStream / ObjectInputStream to do this.
The key to making this work would be to periodically call reset() on the output stream. If you don't do that, the sending and receiving ends will build a massive map that contains references to all objects sent / received over the stream.
However, there may be issues with keeping a single request / response (or database cursor) open for a long time. And resuming a stream that failed could be problematic. So your solution should probably combine the above with some kind of pagination.
The other thing to note is that a scalable solution needs to avoid network latency from becoming the bottleneck. It may be worth implementing a receiver thread that eagerly pulls objects from the stream and buffers them in a (bounded) queue.
I'm writing an application that needs to deserialize quickly millions of messages from a single file.
What the application does is essentially to get one message from the file, do some work and then throw away the message. Each message is composed of ~100 fields (not all of them are always parsed but I need them all because the user of the application can decide on which fields he wants to work on).
In this moment the application consists in a loop that in each iteration just executes using a readDelimitedFrom() call.
Is there a way to optimize the problem to fit better this case (splitting in multiple files, etc...). In addition, in this moment due to the number of messages and the dimension of each message, I need to gzip the file (and it is fairly effective in reducing the size since the value of the fields are quite repetitive) - this though reduces the performance.
If CPU time is your bottleneck (which is unlikely if you are loading directly from HDD with cold cache, but could be the case in other scenarios), then here are some ways you can improve throughput:
If possible, use C++ rather than Java, and reuse the same message object for each iteration of the loop. This reduces the amount of time spent on memory management, as the same memory will be reused each time.
Instead of using readDelimitedFrom(), construct a single CodedInputStream and use it to read multiple messages like so:
// Do this once:
CodedInputStream cis = CodedInputStream.newInstance(input);
// Then read each message like so:
int limit = cis.pushLimit(cis.readRawVarint32());
builder.mergeFrom(cis);
cis.popLimit(limit);
cis.resetSizeCounter();
(A similar approach works in C++.)
Use Snappy or LZ4 compression rather than gzip. These algorithms still get reasonable compression ratios but are optimized for speed. (LZ4 is probably better, though Snappy was developed by Google with Protobufs in mind, so you might want to test both on your data set.)
Consider using Cap'n Proto rather than Protocol Buffers. Unfortunately, there is no Java version yet, but EDIT: There is capnproto-java and also implementations in many other languages. In the languages it supports it has been shown to be quite a bit faster. (Disclosure: I am the author of Cap'n Proto. I am also the author of Protocol Buffers v2, which is the version Google released open source.)
I expect that the majority of the time spent by your CPU is in garbage collection. I would look to replace the default garbage collector with one better suited for your use case of short lived objects.
If you do decide to write this in C++ - use an Arena to create the first message before parsing: https://developers.google.com/protocol-buffers/docs/reference/arenas
I am making a game in java and it is going well. I want to early on implement multi-player so I build on it instead of porting the entire game to multi-player when it has a ton of different features..
I would like to make it a Client / Server application.
Now I'm now sure how or what to implement the multi-player. I have read the java tutorials about sockets and everything, I tested them and made a successful connection (in a test project). I am not sure where to go from here. I don't know how would I transfer for example where different players are on the map or even just if there ARE any player at all.. I don't know if to use a library or do it my self or what... If anyone could please either give me some kind of guideline on how would I transfer player data or anything like that though a TCP connection or maybe give me a library that makes it simpler..
This is pretty wide question and there are multiple ways to do things, but here's my take on it. Disclaimer: I am the server system architect of a mobile multiplayer gaming company. I don't consider myself an expert on these things, but I do have some experience both due to my work and hobbies (I wrote my first "MMORPG" that supported a whopping 255 players in 2004), and feel that I can poke you in the right direction. For most concepts here, I still suggest you do further research using Google, Stackoveflow etc., this is just my "10000 feet view" of what is needed for game networking.
Depending on the type of game you are making (think realtime games like first person shooters vs. turn-based games like chess), the underlying transport layer protocol choice is important. Like Matzi suggested, UDP gives you lower latency (and lower packet overhead, as the header is smaller than TCP), but on the downside the delivery of the packet to the destination is never guaranteed, ie. you can never be sure if the data you sent actually reached the client, or, if you sent multiple packets on a row, if the data arrived in correct order. You can implement a "reliable UDP"-protocol by acknowledging the arrived data with separate messages (although again, if the acknowledgements use UDP, they can also get lost) and handling the order by some extra data, but then you're (at least partially) losing the lower latency and lower overhead. TCP on the other hand guarantees delivery of the data and that the order stays correct, but has higher latency due to packet acknowledgements and overhead (TCP-packets have larger headers). You could say that UDP packets are sort of like "separate entities", while TCP is a continuous, unbreaking stream (you need some way to distinguish where one message ends and another begins).
There are games that use both; separate TCP-connection for important data that absolutely must make it to the client, like player death or such, and another UDP-connection for "fire and forget" -type of data, like the current position of the player (if the position does not arrive to another client, and the player is moving, there's not much point of sending the data again, because it's probably already outdated, and there's going to be another update in a short while).
After you've selected UDP and/or TCP for the transport, you still probably need a custom protocol that encodes and decodes the data ("payload") the TCP/UDP packets move around. For games, the obvious choice is some binary protocol (vs. text-based protocols like HTTP). A simple binary protocol could for example mark the number of bytes in total contained in the message, then type of data, data-field length and the actual data of the field (repeat for the number of fields per message). This can be a bit tricky, so at least for starters you could just use something like just serializing and deserializing your message-objects, then look at already existing protocols or cook your own (it's really not that hard). When you get the encoding and decoding of basic data types (like Strings, ints, floats...) working and some data moving, you need to design your own high-level protocol, that is actually the messages your games and server will be using to talk with each other. These messages are the likes of "player joined game", "player left game", "player is at this location, facing there and moving this way at this speed", "player died", "player sent a chat message" etc.
In real-time games you have some other challenges also, like predicting the position of the player (remember that the data the client sent could easily be hundreds of milliseconds ago when it arrives to another players client, so you need to "guess" where the player is at the time of arrival). Try googling for things like "game dead reckoning" and "game network prediction" etc., also Gamasutra has a pretty good article: Dead Reckoning: Latency Hiding for Networked Games, there are probably loads of others to be found.
Another thing that you need to think about is the concurrency of the server-side code. Many people will tell you that you need to use Java NIO to achieve good performance and using thread per connection is bad, but actually at least on Linux using Native Posix Thread Library (NPTL, pretty much any modern linux-distribution will have it out of the box), the situation is reverse, for reference, see here: Writing Java Multithreaded Servers - whats old is new. We have servers running 10k+ threads with thousands of users and not choking (of course at any given time, the sheer majority of those threads will be sleeping, waiting for client messages or messages to send to client).
Lastly, you need to measure how much computing power and bandwidth your game is going to need. For this, you need to measure how much load a certain (server?) hardware can take with your software and how much traffic your game causes. This is important for determining how many clients you can support with your server and how fast network connection you need (and how much traffic quota per month).
Hope this helped answer some of your questions.
First of all, multiplayer games use UDP for data transfer. There are a lot of reasons for this, for example lower lag and such. If your game contains intensive action, and need fast reactions then you should choose something based on UDP.
Probably there are solutions for gaming on the web, but it is not so hard to write your own implementation either. If you have problems with that, you probably will have problms wih the rest of the game. There are non-game oriented libraries and solutions on the net or even in java, but they are mostly not designed for something that fast as a game can be. A remote procedure call for example can contain costly serializations and generate much larger package, than you really need. They can be convenient solutions, but have poor performance considering games and not reglar business applications.
For example if you have 20 players, each have coordinates, states, and of course moving object. You need at least 20 updates per second to not have much lag, this means a lot of traffic. 20*20 incoming message with user input, and 20*20 outgoing message containing a lot of information. Do the math. You must compress all the players and as many object data into one package, as you can, to have optimal performance. This means you probably have to write small data packages which can be serialized into bytestream quite easily, and they must contain only viable information. If you lose some data, it is not a problem, but you need to take care of important informations to ensure they reach the destination. E.g. you don't want players miss a message about their death.
I wrote a reliable and usable network "library" in C#, and it is not a huge work, but it's recommended to look around and build it good. This is a good article about this topic, read it. Even if you use external library it is good to have a grasp on what it is doing and how should you use it.
For communication between VMs, it doesn't get much simpler than RMI. With RMI you can call methods on an object on an entirely different computer. You can use entire objects as arguments and return values. Thus notifying the server of your move can be as simple as server.sendMove(someMoveObject, somePlayerObject, someOtherObject).
If you're looking for a starting point, this could be a good one.
I'm attempting to transfer a large two dimensional array (17955 X 3) from my server to the client using Asynchronous RPC calls. This is taking a very long period of time which is especially bad because the data is needed in order to initialize the application. I've read that using a JSON object might be faster, but I'm not sure how to do the conversion in Java as I'm pretty new to the language and GWT, and I don't know if the speed difference is significant. I also read somewhere that I can zip the data, but I only read that in a forum and I'm not sure if it's actually possible as I couldn't find information for it elsewhere. Is there any way to transfer large amounts of data from server to client? Thanks for your time.
Read this article on adding JSON capabilities to GWT. In regards to compression this article explains gzipping with GWT.
Also the size of your array is still very large even with the compression you may achieve with gzipping, which will vary depending on how much data is repeated in your array. You may want to consider logically breaking up the array in multiple RPC calls if at all possible.
I would recommend revisiting your design if your application needs such a large amount of data to initialize.
As other's pointed out, you should re-consider your design because even if you are able to solve the data transfer speed issue somehow you will likely find other issues waiting for you:
Processing large amount of data in the browser can be slow.
Lot of data means a lot of used-up memory
What you can think about is:
Partitioning the data:
How is your user going to cope with a lot of data. Your user will probably need some kind of user interface aid to be able to work with such a huge data. If you are going to use paging, tabs or other means to partition the data for user's consumption, why not load the data on demand. For example, you can load a single page of records if you are using a paging grid or you can load a single tab worth of records if you are going to use tabs. Similary, if you are going to allow filtering on the records, you can set a default filter after the load to keep the data to a minumum.
Summarizing the data:
You can also summarize the data on the server, if you are not going to show each row to the user. For example you can initially show summary for each group of records and let the user drill-down in a specific group