I am wondering how fast client side Javascript is compared to server side Java in terms of raw computational power.
For instance, sorting. Should it all be done server side if possible? And how about iterating through a collection?
The answer is very complex and depends on each specific situation.
A server is generally going to be orders of magnitude more powerful than a client machine; and managed code is generally much faster than scripting.
However - the client machine also usually has a lot of spare computational power that isn't being used, while the server could be running requests for thousands of users. So in that case much of the work that can be offloaded to the client is preferable.
You must understand the needs and expectations of your users for each individual piece of functionality in your application and look at the relative load versus development cost for your organization to split development between two environments and figure out what works best. For example, your users probably expect that your site does not freeze their browser or cause unfortunate "this web page is eating your computer" dialogs, so your client scripts should be written intelligently. That's not to say you can't do a ton of work on the client (you can), you just have to be smart about how you do it and remember it blocks the UI thread.
Server side Java will certainly run much faster, you'll need to benchmark for your particular case but you're probably looking at a 10-20x speed advantage.
However that probably doesn't matter much: regardless of raw computational power I would still recommend trying to do as much calculation as possible client side in Javascript for the following reasons:
Even 20x slower is still likely to be unnoticeable to the user
When you factor in the latency of client to server communications, doing it locally on the client will almost certainly be more responsive to the user
Client machines are probably not CPU-bound, so executing some additional code on them is effectively free
If you can offload work from the server to the client, you will need less server side infrastructure, which can get expensive when you need to start scaling up
Having lots of client to server communications is likely to complicate your architecture and make it harder to develop new functionality in the future.
Doing calculations on the client can often reduce bandwidth requirements
There are of course good reasons to keep things on the server e.g.:
Security implications (if client can't be trusted)
Very large data set needed (would take too long to download to client)
Need to exploit massively parallel calculations (e.g. for Google search)
Avoid need to allow for differences in clients (e.g. Javascript versions)
But if these don't apply then I would try to push things to the client as much as possible.
The big difference here is not the speed of the VMs. The difference is that a single server has to serve dozens or hundreds of clients. Another factor: round trips to the server add a lot of overhead, so you want to minimize them.
Basically, anything that's not security-critical and can be done on the client easily, should be done on the client.
These two things cannot be compared side-by-side.
There are far too many factors, and the languages are far too different, and serve far too different purposes to effectively compare their speed.
You really need to decide where you do your calculations on a case-by-case basis.
If the client machine is required to do too much work, it will degrade the performance of the app, but if the server is asked to do too much, it can slow down the response time for everybody.
Javascript is way fast enough to do sorting of data on the client. I have used it with datasets of 5,000 rows, 11 fields per row and used that to sort tables on the client (with pagination). These sorts used compare functions so that it would sort the rows by field and datatype. The actual Javascript part of the process took something on the order of the high tens of milliseconds (~80 if I recall).
I would rather push that kind of mundane task down to the client any day rather than clog up a very busy server with it. YMMV.
Don't mixup Java with Javascript - the name is similar but they are completely different languages.
Javascript is a client side, interpreted language, Java is a byte-code language running inside a virtual machine, with much more optimization for handling large data.
As of the fact, that servers running Java services are normally have much more power (faster CPUs and disk-I/O, more RAM) computing on Java is always faster on my experience.
Javascript can be used on client-side if you want to compute small datas (like sorting just a few hundred elements).
All in all you will have to decide which way is faster: compute and prepare the data on a server and transmit them to the client (where the transmit via internet is the by far biggest slowdown reason), or to compute the data already on the client-side via javascript.
My suggestion is: if there are none of the data you want on client-side are already on client-side it is meaningful to compute them on the server and transmit the already prepared data to the client. But if the data is already on the client-side and they are not more than a few hundred the better user-experience is to compute them in the user's browser.
It really depends on the boxes you are running the code, how big the data is and the availability to work with the process and other factors, plus you have to think sending data through the wire that it's expensive. You have to balance what you gonna do with that and if it's better to spend more time processing things before and let the resources free for the heavy stuff, and playing sending back and forth data.
There is not an specific answer. It depends on the power of your client and the size of the computation. Is it a smart watch, a smart phone? If you can't guarantee the power of your client, I would leave the computation to the server.
Related
Let's assume that I have a JavaScript front-end (Angular.js, for example), a Java-based back-end (Spring, running on Tomcat, for example) and a database management system (SAP HANA In-Memory, in my case). For example, I have graphs that can change relatively quickly.
I am wondering what an efficient and fast architecture could look like. Do you usually send a whole collection of objects to the UI or do you just send deltas?
In my case, data consistency on the UI is very important in order for the application to work properly, but low-latency as well, especially when it comes to data merges.
When it comes to consistency, I often tend to do a SELECT from the database on an insert and read the whole object collection again, but my concerns are that this does not scale.
Is there a generic approach to that problem or even existing frameworks?
Edit:
Currently, it is around 300 objects with a couple of integer attributes and cross-references that can change and rearrange in a millisecond time, but could go up to 10000 in the future. My challenge here is the communication between front-end and back-end, so the front-end always has a consistent data set in real-time.
How close is the client to the server? Is it a mile/km away or hundreds/thousands of miles away? Is the client on the internet or is it on a high-performance VPN? Are you close to the backbone or dozens of hops away? You're not normally going to consistently get 1 millisecond latency on the web if you're trusting the general internet.
If you are on an internal company network and the client is physically close to the server, e.g., same machine, same local network, you can get single digit ms latency with WebSocket (I personally have gotten 3-4 ms across internal data centers at a big investment bank).
Don't optimize too early. That's usually a bad thing.
Although with any high-performance UI, its always good to just send the deltas.
You may want to consider some sort of event mechanism to reduce your polling the data source. Then you would only update the data when it actually changed.
Basically I want a Java, Python, or C++ script running on a server, listening for player instances to: join, call, bet, fold, draw cards, etc and also have a timeout for when players leave or get disconnected.
Basically I want each of these actions to be a small request, so that players could either be processes on same machine talking to a game server, or machines across network.
Security of messaging is not an issue, this is for learning/research/fun.
My priorities:
Have a good scheme for detecting when players disconnect, but also be able to account for network latencies, etc before booting/causing to lose hand.
Speed. I'm going to be playing millions of these hands as fast as I can.
Run on a shared server instance (I may have limited access to ports or things that need root)
My questions:
Listen on ports or use sockets or HTTP port 80 apache listening script? (I'm a bit hazy on the differences between these).
Any good frameworks to work off of?
Message types? I'm thinking JSON or Protocol Buffers.
How to make it FAST?
Thanks guys - just looking for some pointers and suggestions. I think it is a cool problem with a lot of neat things to learn doing it.
As far as frameworks goes, Ginkgo looks promising for building a network service (which is what you're doing). The Python is very straightforward, and the asynchronicity enabled by gevent lets you do asynchronous things without generally having to worry about callbacks. The gevent core also gives you access to a lot of building blocks.
Rather than having lots of services communicating over ports, you might look into either 1) a good message queue, like RabbitMQ or 0mq, or 2) a distributed coordination server, like Zookeeper.
That being said, what you aim to do is difficult, especially if you're not familiar with the basics. It's a worthwhile endeavor to learn about those basics.
Don't worry about speed at first. Get it working, then make it scale. Of course, there are directions you can go that will make it easier to scale in the future. Zookeeper in particular gives you easy-to-implement primitives for scaling horizontally (i.e. multiple workers sharing the load). In particular, see the Zookeeper recipe book and their corresponding python implementations (courtesy of the kazoo, a gevent-based client library).
Don't forget that "fast" also means optimizing your own development time, for quicker iterations and less time cursing your development environment. So use Python, which will let you get up and running quickly now, and optimize later if you really truly start to bind on CPU time or memory use. (With this particular application, you're far more likely to bind on network IO.)
Anything else? Maybe a cup of coffee to go with your question :-)
Answering your question from the ground up would require several books worth of text with topics ranging from basic TCP/IP networking to scalable architectures, but I'll try to give you some direction nevertheless.
Questions:
Listen on ports or use sockets or HTTP port 80 apache listening script? (I'm a bit hazy on the differences between these).
I would venture that if you're not clear on the definition of each of these maybe designing an implementing a service that will be "be playing millions of these hands as fast as I can" is a bit hmm, over-reaching? But don't let that stop you as they say "ignorance is bliss."
Any good frameworks to work off of?
I think your project is a good candidate for Node.js. There main reason being that Node.js is relatively scaleable and it is good at hiding the complexity required for that scalability. There are downsides to Node.js, just Google search for 'Node.js scalability critisism'.
The main point against Node.js as opposed to using a more general purpose framework is that scalability is difficult, there is no way around it, and Node.js being so high level and specific provides less options for solving though problems.
The other drawback is Node.js is Javascript not Java or Phyton as you prefer.
Message types? I'm thinking JSON or Protocol Buffers.
I don't think there's going to be a lot of traffic between client and server so it doesn't really matter I'd go with JSON just because it is more prevalent.
How to make it FAST?
The real question is how to make it scalable. Running human vs human card games is not computationally intensive, so you're probably going to run out of I/O capacity before you reach any computational limit.
Overcoming these limitations is done by spreading the load across machines. The common way to do in multi-player games is to have a list server that provides links to identical game servers with each server having a predefined number of slots available for players.
This is a variation of a broker-workers architecture were the broker machine assigns a worker machine to clients based on how busy they are. In gaming users want to be able to select their server so they can play with their friends.
Related:
Have a good scheme for detecting when players disconnect, but also be able to account for network latencies, etc before booting/causing to lose hand.
Since this is in human time scales (seconds as opposed to miliseconds) the client should send keepalives say every 10 seconds with say 30 second session timeout.
The keepalives would be JSON messages in your application protocol not HTTP which is lower level and handled by the framework.
The framework itself should provide you with HTTP 1.1 connection management/pooling which allows several http sessions (request/response) to go through the same connection, but do not require the client to be always connected. This is a good compromise between reliability and speed and should be good enough for turn based card games.
Honestly, I'd start with classic LAMP. Take a stock Apache server, and a mysql database, and put your Python scripts in the cgi-bin directory. The fact that they're sending and receiving JSON instead of HTTP doesn't make much difference.
This is obviously not going to be the most flexible or scalable solution, of course, but it forces you to confront the actual problems as early as possible.
The first problem you're going to run into is game state. You claim there is no shared state, but that's not right—the cards in the deck, the bets on the table, whose turn it is—that's all state, shared between multiple players, managed on the server. How else could any of those commands work? So, you need some way to share state between separate instances of the CGI script. The classic solution is to store the state in the database.
Of course you also need to deal with user sessions in the first place. The details depend on which session-management scheme you pick, but the big problem is how to propagate a disconnect/timeout from the lower level up to the application level. What happens if someone puts $20 on the table and then disconnects? You have to think through all of the possible use cases.
Next, you need to think about scalability. You want millions of games? Well, if there's a single database with all the game state, you can have as many web servers in front of it as you want—John Doe may be on server1 while Joe Schmoe is on server2, but they can be in the same game. On the other hand, you can a separate database for each server, as long as you have some way to force people in the same game to meet on the same server. Which one makes more sense? Either way, how do you load-balance between the servers. (You not only want to keep them all busy, you want to avoid the situation where 4 players are all ready to go, but they're on 3 different servers, so they can't play each other…).
The end result of this process is going to be a huge mess of a server that runs at 1% of the capacity you hoped for, that you have no idea how to maintain. But you'll have thought through your problem space in more detail, and you'll also have learned the basics of server development, both of which are probably more important in the long run.
If you've got the time, I'd next throw the whole thing out and rewrite everything from scratch by designing a custom TCP protocol, implementing a server for it in something like Twisted, keeping game state in memory, and writing a simple custom broker instead of a standard load balancer.
Here's the situation:
I currently have a web application that uses PHP to serve HTML/CSS/JS and that talks to a MySQL DB. Completely vanilla and common. The PHP is a mixture of presentation logic (HTML generation, etc) and business logic (the app uses Ajax extensively to make requests for data or to tell the server to make changes to something).
As part of a redesign of this system I am removing all of the presentation logic from the PHP. Instead, I will be using Ext JS 4 (a javascript-based windowing toolkit / app) connected to a web socket gateway (a COMET/AJAX replacement that allows bi-directional communication) on the server. Let's wave a magic wand for a minute and forget about how the Ext JS 4 gets delivered to the browser and how it talks to the web socket gateway.
What we are left with is a web socket gateway (written in Java and running persistently listening on a specific port for web socket connections) and some business logic / DB interaction currently written in PHP.
At this point, I see one of two options:
Keep the business logic / DB interaction in the PHP and execute it by calling either PHP from the command line or by having the PHP / Apache listen on a different port only for communications from the web socket gateway.
Write a new Java or C++ application that will be persistent and listen on a specific port for communications from the web socket gateway. The business logic / DB integration is re-written in Java or C++ code and is part of this application.
Would re-writing in Java or C++ give better performance than calling PHP over and over? (The PHP code is pretty cleanly written: object-oriented using packages like CodeIgniter and Doctrine).
Would the performance benefits outweigh the hassle of re-writing all the business logic? Obviously dependent on many factors such as quantity of code but what is your gut feeling?
In case it might influence your thinking / feedback, you should know that the web socket gateway (Kaazing) supports JMS, Stomp, AMQP, XMPP, or something custom you build yourself.
Let me know if there is any other info I can provide to help you with your answers.
Thanks!
I know a lot of the solutions I mention here are "ugly" but you sound like a person who's looking to get results and refactor, so I hope it's okay.
Do it the easy way (PHP if I understood correctly) first. Then run a realistic stress test. Since you're making PHP calls, just create a realistic sequence (log in, change this, do that, log out) and run as many as you think is realistic. 100? 10000? It depends on how stressed you expect this thing to be and still preform.
That step is easier than it sounds. Don't think "ultimate test framework", think 20 line python script that runs as many threads as you want executing a few lines that will keep your application busy. If it takes you more than 40 minutes, stop and simplify. The hour you spend will be worth it.
If CPU hits 100 or you run out of some resource then perhaps it's time for a rewrite, or you can probably guess what's taking the longest and write it in C. If you do use C/C++ and you're not 100% comfortable with it, avoid a major rewrite, since it's a dangerous language with lots of opportunities for introducing bugs. Maybe even call compiled code from the PHP you have if that suits your application.
I've written server-side HTML-generating C code once. It's not exactly the right tool for the job. PHP may be hackish but it gets the job done fast. I would avoid optimization unless/until it is actually needed.
Good luck, don't forget to tell us how it goes!
Edit: If you do go for a mixed-language solution, don't forget to clean it up after! Standardize what you do fast and what you do in PHP, do it in a common format, maybe write up a short readme. Again, those fifteen minutes will save you, or the next person, a few days and many hairs.
Writing in a compiled language (Java or C++, in your examples) would almost certainly give better performance than an interpreted language like PHP. The performance benefits almost certainly would not outweigh the hassle of rewriting all of the code.
If your business logic has high processing costs, Java or C++ will give you a much better performance.
If you are simply fetching some results from a DB, do not expect any great performance gains.
I would do some prototyping/testing to identify the performance bottleneck.
My opinion is that PHP is too slow for processing HUGE datasets if you have many 100,000s of objects to analyse C++ rocks and Java benefits from the HotSpot JIT performance optimizer.
The HotSpot effect is very specific to doing number crunching in Java. You really can see the JRE is pushing the accelerator, ironing out detected bottlenecks. In some rare cases HotSpot JIT optimised Java can be even faster than C.
In some also very rare cases HotSpot performance voodooism can make your code slower!
Have you ever thought of turning a PHP application into a faster Java or C++ app?
Maybe the HipHop php2cpp compiler is all you want:
https://github.com/facebook/hiphop-php/wiki/
Quercus is a php4java runtime which can help you migrate more cheaply to Java.
http://quercus.caucho.com/
Quite interesting was Joshua Bloch's talk about "Performance Anxiety" last year.
http://www.wiki.jvmlangsummit.com/images/1/1d/PerformanceAnxiety2010.pdf
http://parleys.com/#st=5&id=2103 (32min video)
I am building a complex HTML 5 application that takes advantage of Websockets. I am getting to the point where I have a lot of different types of data that gets updated in real time on the screen.
I want to know if it is going to be better for me to have fewer Websockets that are more complex, or a lot of simple Websockets open per page.
I added http://github.com/TooTallNate/Java-WebSocket web socket server to my Grails Application.
Right now I am going down the path of using a lot of simple web sockets for each task. I know using more sockets will use more memory on the server side but also more sockets means more concurrent processing.
Does anyone have any advice on how I can balance this.
Thanks for any tips in advance. Keith Blanchard
I think it is hard to make any reasonable statements about websockets without measuring the actual performance in specific browsers.
My inclination would be to have a single websocket per client.
There are some pretty hard limits on capacity server-side when doing IO ... relatively easily to saturate the channel when you have many connections (something that can bite heavily ajaxified systems as well).
Again, need to really measure to make intelligent statements about this.
Websocket-per-client would also make the application much more manageable ... depends on your actual use-case, but "more concurrency" is not necessarily better and can make managing state incredibly complex.
I personally did some benchmark on this one, and the results are:
10 websockets on a single page will cause page a little unresponsive when data coming in from each socket.
50 websockets on a single page will cause an unbearable freeze on the web.
So somewhere around 10 or less than 10 would be your upper limit.
What's a good method for assigning work to a set of remote machines? Consider an example where the task is very CPU and RAM intensive, but doesn't actually process a large dataset. The language of choice would be Java. I was thinking Hadoop would be a good option, but the dataset passed between remote machines is fairly small, and Hadoop seems to focus mainly on the distribution of data rather than distribution of work.
What are some good technologies that can help?
EDIT: I'm mainly interested in load balancing. There will be a series of jobs with a small (< 3MB) dataset, but significant processing and memory needs.
MPI would probably be a good choice, there's even a JAVA implementation.
MPI may be part of your answer, but looking at the question, I'm not sure if it addresses the portion of the problem you care about.
MPI provides a communication layer between processing components. It is low level requiring you to do a fair amount of work, but from what I saw in an introduction presentation, it also comes with some common matrix data manipulation functions.
In your question, you seem to be more interested in the load balancing/job processing aspects of the problem. If that really is your focus, maybe a small program hosted in a Servlet or an RMI server might be sufficient. Let each program go to the server for their next unit of work and then submit the results back (you might even be able to use a database/file share, but pay attention to locking issues). In other words, a pull mechanism versus a push mechanism.
This approach is fairly simple to implement and gives you the advantage of scaling up by just running more distributed clients. Load balancing isn't too important if you intend to allow your process to take full control of the machine. You can experiment with running multiple clients on a machine that has multiple cores to see if you can improve overall through-put for the node. A multi-threaded client would be more efficient, but can increase complexity depending on the structure of the code you are using to solve the problem.