Basically I want a Java, Python, or C++ script running on a server, listening for player instances to: join, call, bet, fold, draw cards, etc and also have a timeout for when players leave or get disconnected.
Basically I want each of these actions to be a small request, so that players could either be processes on same machine talking to a game server, or machines across network.
Security of messaging is not an issue, this is for learning/research/fun.
My priorities:
Have a good scheme for detecting when players disconnect, but also be able to account for network latencies, etc before booting/causing to lose hand.
Speed. I'm going to be playing millions of these hands as fast as I can.
Run on a shared server instance (I may have limited access to ports or things that need root)
My questions:
Listen on ports or use sockets or HTTP port 80 apache listening script? (I'm a bit hazy on the differences between these).
Any good frameworks to work off of?
Message types? I'm thinking JSON or Protocol Buffers.
How to make it FAST?
Thanks guys - just looking for some pointers and suggestions. I think it is a cool problem with a lot of neat things to learn doing it.
As far as frameworks goes, Ginkgo looks promising for building a network service (which is what you're doing). The Python is very straightforward, and the asynchronicity enabled by gevent lets you do asynchronous things without generally having to worry about callbacks. The gevent core also gives you access to a lot of building blocks.
Rather than having lots of services communicating over ports, you might look into either 1) a good message queue, like RabbitMQ or 0mq, or 2) a distributed coordination server, like Zookeeper.
That being said, what you aim to do is difficult, especially if you're not familiar with the basics. It's a worthwhile endeavor to learn about those basics.
Don't worry about speed at first. Get it working, then make it scale. Of course, there are directions you can go that will make it easier to scale in the future. Zookeeper in particular gives you easy-to-implement primitives for scaling horizontally (i.e. multiple workers sharing the load). In particular, see the Zookeeper recipe book and their corresponding python implementations (courtesy of the kazoo, a gevent-based client library).
Don't forget that "fast" also means optimizing your own development time, for quicker iterations and less time cursing your development environment. So use Python, which will let you get up and running quickly now, and optimize later if you really truly start to bind on CPU time or memory use. (With this particular application, you're far more likely to bind on network IO.)
Anything else? Maybe a cup of coffee to go with your question :-)
Answering your question from the ground up would require several books worth of text with topics ranging from basic TCP/IP networking to scalable architectures, but I'll try to give you some direction nevertheless.
Questions:
Listen on ports or use sockets or HTTP port 80 apache listening script? (I'm a bit hazy on the differences between these).
I would venture that if you're not clear on the definition of each of these maybe designing an implementing a service that will be "be playing millions of these hands as fast as I can" is a bit hmm, over-reaching? But don't let that stop you as they say "ignorance is bliss."
Any good frameworks to work off of?
I think your project is a good candidate for Node.js. There main reason being that Node.js is relatively scaleable and it is good at hiding the complexity required for that scalability. There are downsides to Node.js, just Google search for 'Node.js scalability critisism'.
The main point against Node.js as opposed to using a more general purpose framework is that scalability is difficult, there is no way around it, and Node.js being so high level and specific provides less options for solving though problems.
The other drawback is Node.js is Javascript not Java or Phyton as you prefer.
Message types? I'm thinking JSON or Protocol Buffers.
I don't think there's going to be a lot of traffic between client and server so it doesn't really matter I'd go with JSON just because it is more prevalent.
How to make it FAST?
The real question is how to make it scalable. Running human vs human card games is not computationally intensive, so you're probably going to run out of I/O capacity before you reach any computational limit.
Overcoming these limitations is done by spreading the load across machines. The common way to do in multi-player games is to have a list server that provides links to identical game servers with each server having a predefined number of slots available for players.
This is a variation of a broker-workers architecture were the broker machine assigns a worker machine to clients based on how busy they are. In gaming users want to be able to select their server so they can play with their friends.
Related:
Have a good scheme for detecting when players disconnect, but also be able to account for network latencies, etc before booting/causing to lose hand.
Since this is in human time scales (seconds as opposed to miliseconds) the client should send keepalives say every 10 seconds with say 30 second session timeout.
The keepalives would be JSON messages in your application protocol not HTTP which is lower level and handled by the framework.
The framework itself should provide you with HTTP 1.1 connection management/pooling which allows several http sessions (request/response) to go through the same connection, but do not require the client to be always connected. This is a good compromise between reliability and speed and should be good enough for turn based card games.
Honestly, I'd start with classic LAMP. Take a stock Apache server, and a mysql database, and put your Python scripts in the cgi-bin directory. The fact that they're sending and receiving JSON instead of HTTP doesn't make much difference.
This is obviously not going to be the most flexible or scalable solution, of course, but it forces you to confront the actual problems as early as possible.
The first problem you're going to run into is game state. You claim there is no shared state, but that's not right—the cards in the deck, the bets on the table, whose turn it is—that's all state, shared between multiple players, managed on the server. How else could any of those commands work? So, you need some way to share state between separate instances of the CGI script. The classic solution is to store the state in the database.
Of course you also need to deal with user sessions in the first place. The details depend on which session-management scheme you pick, but the big problem is how to propagate a disconnect/timeout from the lower level up to the application level. What happens if someone puts $20 on the table and then disconnects? You have to think through all of the possible use cases.
Next, you need to think about scalability. You want millions of games? Well, if there's a single database with all the game state, you can have as many web servers in front of it as you want—John Doe may be on server1 while Joe Schmoe is on server2, but they can be in the same game. On the other hand, you can a separate database for each server, as long as you have some way to force people in the same game to meet on the same server. Which one makes more sense? Either way, how do you load-balance between the servers. (You not only want to keep them all busy, you want to avoid the situation where 4 players are all ready to go, but they're on 3 different servers, so they can't play each other…).
The end result of this process is going to be a huge mess of a server that runs at 1% of the capacity you hoped for, that you have no idea how to maintain. But you'll have thought through your problem space in more detail, and you'll also have learned the basics of server development, both of which are probably more important in the long run.
If you've got the time, I'd next throw the whole thing out and rewrite everything from scratch by designing a custom TCP protocol, implementing a server for it in something like Twisted, keeping game state in memory, and writing a simple custom broker instead of a standard load balancer.
Related
I have the following situation:
I have 2 JVM processes (really 2 java processes running separately, not 2 threads) running on a local machine. Let's call them ProcessA an ProcessB.
I want them to communicate (exchange data) with one another (e.g. ProcessA sends a message to ProcessB to do something).
Now, I work around this issue by writing a temporary file and these process periodically scan this file to get message. I think this solution is not so good.
What would be a better alternative to achieve what I want?
Multiple options for IPC:
Socket-Based (Bare-Bones) Networking
not necessarily hard, but:
might be verbose for not much,
might offer more surface for bugs, as you write more code.
you could rely on existing frameworks, like Netty
RMI
Technically, that's also network communication, but that's transparent for you.
Fully-fledged Message Passing Architectures
usually built on either RMI or network communications as well, but with support for complicated conversations and workflows
might be too heavy-weight for something simple
frameworks like ActiveMQ or JBoss Messaging
Java Management Extensions (JMX)
more meant for JVM management and monitoring, but could help to implement what you want if you mostly want to have one process query another for data, or send it some request for an action, if they aren't too complex
also works over RMI (amongst other possible protocols)
not so simple to wrap your head around at first, but actually rather simple to use
File-sharing / File-locking
that's what you're doing right now
it's doable, but comes with a lot of problems to handle
Signals
You can simply send signals to your other project
However, it's fairly limited and requires you to implement a translation layer (it is doable, though, but a rather crazy idea to toy with than anything serious.
Without more details, a bare-bone network-based IPC approach seems the best, as it's the:
most extensible (in terms of adding new features and workflows to your
most lightweight (in terms of memory footprint for your app)
most simple (in terms of design)
most educative (in terms of learning how to implement IPC). (as you mentioned "socket is hard" in a comment, and it really is not and should be something you work on)
That being said, based on your example (simply requesting the other process to do an action), JMX could also be good enough for you.
I've added a library on github called Mappedbus (http://github.com/caplogic/mappedbus) which enable two (or many more) Java processes/JVMs to communicate by exchanging messages. The library uses a memory mapped file and makes use of fetch-and-add and volatile read/writes to synchronize the different readers and writers. I've measured the throughput between two processes using this library to 40 million messages/s with an average latency of 25 ns for reading/writing a single message.
What you are looking for is inter-process communication. Java provides a simple IPC framework in the form of Java RMI API. There are several other mechanisms for inter-process communication such as pipes, sockets, message queues (these are all concepts, obviously, so there are frameworks that implement these).
I think in your case Java RMI or a simple custom socket implementation should suffice.
Sockets with DataInput(Output)Stream, to send java objects back and forth. This is easier than using disk file, and much easier than Netty.
I tend to use jGroup to form local clusters between processes. It works for nodes (aka processes) on the same machine, within the same JVM or even across different servers.
Once you understand the basics it is easy working with it and having the options to actually run two or more processes in the same JVM makes it easy to test those processes easily.
The overhead and latency is minimal if both are on the same machine (usually only a TCP rountrip of about >100ns per action).
socket may be a better choice, I think.
Back in 2004 I implement code which do the job with sockets. Until then, many times I search for a better solution, because socket approach triggers firewall and my clients worry. There is no better solution until now. Client must serialize your data, send and server must receive and unserialize.
It is easy.
I suppose this is not possible. But I am looking at best way to separate different layers of my service yet be able to access layers quickly or without overhead of IPC/RMI.
The main programming language I am using is java, but can use C++ if required.
What we have right now is a server that host database and access control. And we use RMI for consumers to request data. This slow and doesn't scale very well.
We need performance and scalability which we dont have at the moment.
What we are thinking of is using a layered architecture with database at base, access control ontop of it along with a notification bus to notify clients of changes in database.
The main problem is the overhead of communication that we want to avoid/or minimize.
Is there any magic thread that can run in two context (switch context) and share information that way. I know the short answer would be no, but what are the options?
Update
We are currently using Java RMI.
Our base layer will provide an API that can be used to create plugins that will run on top. So its not a fixed collectors/consumer we have. We can have 5-6 collectors running and same amount of consumers.
We can have upto 1000 consumers.
My first suggestion is that you should buy a book (or find an online tutorial) on building scalable applications, because you seem to be pretty lost.
Sharing a thread between processes doesn't make sense at any level - it is meaningless, but you can share the data that the thread accesses, which is probably what you want.
The fastest method will be C based IPC (e.g., shared memory, semasphores, etc: Shmget). You say you want to avoid the overhead of IPC, but really, it isn't going to get any faster than that.
But why do you want multiple processes? If you are worried about the overhead of communicating between processes, just have your threads in one process? There is no reason your different layers have to be in different processes.
But anyway, I am not convinced that your original statement that RMI is slow and doesn't scale is completely correct. If it is not scaling, you are probably not using the right framework. Maybe you have an issue that you only have one RMI end point on the server. Have you considered an J2EE system with stateless session beans?
Without knowing about your requirements, it is hard to say.
It is not possible in general to share thread between two processes due to OS design. The problem of sharing data between two or more processes is usually solved by sharing files, sharing database or sharing messages (which in turn can be synchronous or asynchronous), having processes communicate via pipes, say in Linux, or even sharing memory. You scenario description is not very precise, you need to describe all processes and how information is supposed to flow, what triggers information flow, etc.
Most likely you need high performance messaging library, https://github.com/real-logic/Aeron/ is one. But to get precise answer you would need to describe better what overhead exactly you want to minimize.
If your goal is to notify users, you should consider publish/subscribe messaging (pub/sub). There are many middleware vendors out there that provide this architecture though most are expensive in production scenarios. For open source, check out http://redis.io/topics/pubsub. (No affiliation.)
I am wondering how fast client side Javascript is compared to server side Java in terms of raw computational power.
For instance, sorting. Should it all be done server side if possible? And how about iterating through a collection?
The answer is very complex and depends on each specific situation.
A server is generally going to be orders of magnitude more powerful than a client machine; and managed code is generally much faster than scripting.
However - the client machine also usually has a lot of spare computational power that isn't being used, while the server could be running requests for thousands of users. So in that case much of the work that can be offloaded to the client is preferable.
You must understand the needs and expectations of your users for each individual piece of functionality in your application and look at the relative load versus development cost for your organization to split development between two environments and figure out what works best. For example, your users probably expect that your site does not freeze their browser or cause unfortunate "this web page is eating your computer" dialogs, so your client scripts should be written intelligently. That's not to say you can't do a ton of work on the client (you can), you just have to be smart about how you do it and remember it blocks the UI thread.
Server side Java will certainly run much faster, you'll need to benchmark for your particular case but you're probably looking at a 10-20x speed advantage.
However that probably doesn't matter much: regardless of raw computational power I would still recommend trying to do as much calculation as possible client side in Javascript for the following reasons:
Even 20x slower is still likely to be unnoticeable to the user
When you factor in the latency of client to server communications, doing it locally on the client will almost certainly be more responsive to the user
Client machines are probably not CPU-bound, so executing some additional code on them is effectively free
If you can offload work from the server to the client, you will need less server side infrastructure, which can get expensive when you need to start scaling up
Having lots of client to server communications is likely to complicate your architecture and make it harder to develop new functionality in the future.
Doing calculations on the client can often reduce bandwidth requirements
There are of course good reasons to keep things on the server e.g.:
Security implications (if client can't be trusted)
Very large data set needed (would take too long to download to client)
Need to exploit massively parallel calculations (e.g. for Google search)
Avoid need to allow for differences in clients (e.g. Javascript versions)
But if these don't apply then I would try to push things to the client as much as possible.
The big difference here is not the speed of the VMs. The difference is that a single server has to serve dozens or hundreds of clients. Another factor: round trips to the server add a lot of overhead, so you want to minimize them.
Basically, anything that's not security-critical and can be done on the client easily, should be done on the client.
These two things cannot be compared side-by-side.
There are far too many factors, and the languages are far too different, and serve far too different purposes to effectively compare their speed.
You really need to decide where you do your calculations on a case-by-case basis.
If the client machine is required to do too much work, it will degrade the performance of the app, but if the server is asked to do too much, it can slow down the response time for everybody.
Javascript is way fast enough to do sorting of data on the client. I have used it with datasets of 5,000 rows, 11 fields per row and used that to sort tables on the client (with pagination). These sorts used compare functions so that it would sort the rows by field and datatype. The actual Javascript part of the process took something on the order of the high tens of milliseconds (~80 if I recall).
I would rather push that kind of mundane task down to the client any day rather than clog up a very busy server with it. YMMV.
Don't mixup Java with Javascript - the name is similar but they are completely different languages.
Javascript is a client side, interpreted language, Java is a byte-code language running inside a virtual machine, with much more optimization for handling large data.
As of the fact, that servers running Java services are normally have much more power (faster CPUs and disk-I/O, more RAM) computing on Java is always faster on my experience.
Javascript can be used on client-side if you want to compute small datas (like sorting just a few hundred elements).
All in all you will have to decide which way is faster: compute and prepare the data on a server and transmit them to the client (where the transmit via internet is the by far biggest slowdown reason), or to compute the data already on the client-side via javascript.
My suggestion is: if there are none of the data you want on client-side are already on client-side it is meaningful to compute them on the server and transmit the already prepared data to the client. But if the data is already on the client-side and they are not more than a few hundred the better user-experience is to compute them in the user's browser.
It really depends on the boxes you are running the code, how big the data is and the availability to work with the process and other factors, plus you have to think sending data through the wire that it's expensive. You have to balance what you gonna do with that and if it's better to spend more time processing things before and let the resources free for the heavy stuff, and playing sending back and forth data.
There is not an specific answer. It depends on the power of your client and the size of the computation. Is it a smart watch, a smart phone? If you can't guarantee the power of your client, I would leave the computation to the server.
I have to write an architecture case study but there are some things that i don't know, so i'd like some pointers on the following :
The website must handle 5k simultaneous users.
The backend is composed by a commercial software, some webservices, some message queues, and a database.
I want to recommend to use Spring for the backend, to deal with the different elements, and to expose some Rest services.
I also want to recommend wicket for the front (not the point here).
What i don't know is : must i install the front and the back on the same tomcat server or two different ? and i am tempted to put two servers for the front, with a load balancer (no need for session replication in this case). But if i have two front servers, must i have two back servers ? i don't want to create some kind of bottleneck.
Based on what i read on this blog a really huge charge is handle by one tomcat only for the first website mentionned. But i cannot find any info on this, so i can't tell if it seems plausible.
If you can enlight me, so i can go on in my case study, that would be really helpful.
Thanks :)
There are probably two main reasons for having multiple servers for each tier; high-availability and performance. If you're not doing this for HA reasons, then the unfortunate answer is 'it depends'.
Having two front end servers doesn't force you to have two backend servers. Is the backend going to be under a sufficiently high load that it will require two servers? It will depend a lot on what it is doing, and would be best revealed by load testing and/or profiling. For a site handling 5000 simultaneous users, though, my guess would be yes...
It totally depends on your application. How heavy are your sessions? (Wicket is known for putting a lot in the session). How heavy are your backend processes.
It might be a better idea to come up with something that can scale. A load-balancer with the possibility to keep adding new servers for scaling.
Measurement is the best thing you can do. Create JMeter scripts and find out where your app breaks. Built a plan from there.
To expand on my comment: think through the typical process by which a client makes a request to your server:
it initiates a connection, which has an overhead for both client and server;
it makes one or more requests via that connection, holding on to resources on the server for the duration of the connection;
it closes the connection, generally releasing application resources, but generally still hogging a port number on your server for some number of seconds after the conncetion is closed.
So in designing your architecture, you need to think about things such as:
how many connections can you actually hold open simultaneously on your server? if you're using Tomcat or other standard server with one thread per connection, you may have issues with having 5,000 simultaneous threads; (a NIO-based architecture, on the other hand, can handle thousands of connections without needing one thread per connection); if you're in a shared environment, you may simply not be able to have that many open connections;
if clients don't hold their connections open for the duration of a "session", what is the right balance between number of requests and/or time per connection, bearing in mind the overhead of making and closing a connection (initialisation of encrypted session if relevant, network overhead in creating the connection, port "hogged" for a while after the connection is closed)
Then more generally, I'd say consider:
in whatever architecture you go for, how easily can you re-architecture/replace specific components if they prove to be bottlenecks?
for each "black box" component/framework that you use, what actual problem does it solve for you, and what are its limitations? (Don't just use Tomcat because your boss's mate's best man told them about it down the pub...)
I would also agree with what other people have said-- at some point you need to not be too theoretical. Design something sensible, then run a test bed to see how it actually copes with your expected volumes of data. (You might not have the whole app built, but you can start making predictions about "we're going to have X clients sending Y requests every Z minutes, and p% of those requests will take n milliseconds and write r rows to the database"...)
At what point is it better to switch from java.net to java.nio? .net (not the Microsoft entity) is easier to understand and more familiar, while nio is scalable, and comes with some extra nifty features.
Specifically, I need to make a choice for this situation: We have one control center managing hardware at several remote sites (each site with one computer managing multiple hardware units (a transceiver, TNC, and rotator)). My idea was to have write a sever app on each machine that acts as a gateway from the control center to the radio hardware, with one socket for each unit. From my understanding, NIO is meant for one server, many clients, but what I'm thinking of is one client, many servers.
I suppose a third option is to use MINA, but I'm not sure if that's throwing too much at a simple problem.
Each remote server will have up to 8 connections, all from the same client (to control all the hardware, and separate TX/RX sockets). The single client will want to connect to several servers at once, though. Instead of putting each server on different ports, is it possible to use channel selectors on the client side, or is it better to go multi-threaded io on the client side of things and configure the servers differently?
Actually, since the remote machines serve only to interact with other hardware, would RMI or IDL/CORBA be a better solution? Really, I just want to be able to send commands and receive telemetry from the hardware, and not have to make up some application layer protocol to do it.
Avoid NIO unless you have a good reason to use it. It's not much fun and may not be as beneficial as you would think. You may get better scalability once you are dealing with tens of thousands of connections, but at lower numbers you'll probably get better throughput with blocking IO. As always though, make your own measurements before committing to something you might regret.
Something else to consider is that if you want to use SSL, NIO makes it extremely painful.
Scalability will probably drive your choice of package. java.net will require one thread per socket. Coding it will be significantly easier. java.nio is much more efficient, but can be hairy to code around.
I would ask yourself how many connections you expect to be handling. If it's relatively few (say, < 100), I'd go with java.net.
There is almost no reason to write this kind of networking code from scratch now. Packages like netty.io will almost always get you more reliable and flexible code with fewer lines of code than a hand-crafted solution will. Also, with Netty, you can get SSL support w/o complicating your implementation at all. Libraries like netty also obviate the "async vs threads" question almost entirely, gives you good performance, and still lets you tweak the threading model as needed.
The number of connections you're talking about tells me you should use java.net. Really, there's no reason to complexify your task with non-blocking I/O. (Unless your remote systems are underpowered, but then why are you using Java on them?)
Take a look at Apache's XML-RPC package. It's easy to use, completely hides the network stuff from you, and works over good ol' HTTP. No need to worry about protocol issues ... it'll all look like method calls to you, on both ends.
Given the small number of connections involved, java.net sounds like the right solution.
Other posters talked about using XML-RPC. This is a good choice if the volumes of data being shipped are small, however I have had bad experiences with XML-based protocols when writing inter-process communications that ship large amounts of data (e.g. large request/responses, or frequent small amounts of data). The cost of XML parsing is typically orders of magnitude higher than more optimised wire formats (e.g. ASN.1).
For low volume control applications the simplicity of XML-RPC should outweigh the performance costs. For high volume data communications it may be better to use a more efficient wire protocol.