Latency vs decoupling architecture advice - java

I am desgining an architecture of system.( Java EE/Spring)
The main factor of this system is low latency.(talking about 1 ms and less from end to end)
We have planned several components for this real time system.
My question to you experts: I know all the advantages of coupling and decoupling(fail over, separation, maintenance, extension etc..)
The problem I am facing here is:
For example let's say I have two diffrent applications on machine A(app1) and application on Machine B(app2).
a request must go through both machines. and final answer will be sent to the client after both machines processed the request.
The integration latency between those two will surely be higher then having those apps under the same machine(networking time, etc..)
In other hand I can update and maintenance each application on it's own without being depended on the same machine. same goes for failover, clustering, load balancing
What would you advice me? what should I consider? Latency vs decoupling and maintenance
thanks,
ray.

a request must go through both machines. and final answer will be sent to the client after both machines processed the request.
It could add 0.1 to 0.2 ms. This may be acceptable.
In other hand I can update and maintenance each application on it's own without being depended on the same machine.
You are more likely to update the software than the hardware. Hadrware can usually be updated in off peak times like on weekend.s
same goes for failover,
The more machines you have the more points of failure you have.
clustering
You might not need to cluster if you have it all on one machine.
load balancing
This make more sense if you need to use multiple machines.
If you have a web application 1 ms is fairly aggressive target. If you have a networked service such as trading system sub milli-second or even sub 100 micro-seconds is achievable depending on your requirements.

The more machines handle the same request, the more latency - this is obvious. Moreover, eliminate all boundaries between applications - JVMs, threads, and implement them as 2 procedures called sequentially on the same thread.
More machines can decrease latency in one case: to distribute load and so to free resources (processors) on one machine, to eliminate congestion. Let different instruments (currencies, shares) be traded on different machines.

Related

Single threaded Node.js on a multi core CPU vs Java (Tomcat)

Since Node.js is single threaded, if I run an app that doesn't have any IO involved on Node.js, will it always use just one CPU core even if it is running on a 8 core CPU machine?
Say I have a service that returns a sum of two numbers running on Node, and there are 1000 requests hitting that service at the same time (assume the service computes the sum on the main thread and doesn't use call back since it is a simple task). So Node would handle only one request at a time even when there are 7 cores sitting idle? In Java world, if I set the HTTP thread pool size to 8 in an app server Tomcat, the same 1000 requests would be executed 8 times faster than node.
So do I have to run 8 instances of Node.js on an 8 core machine and front it with a load balancer so all 8 cores get used?
Since Node.js is single threaded, if I run an app that doesn't have any IO involved on Node.js, will it always use just one CPU core even if it is running on a 8 core CPU machine?
Supposing that you do not fork any child processes, Node will use at most one core at a time, yes, regardless of how many cores are available. Having multiple cores will help a bit, though, because that reduces contention between Node and other processes.
So Node would handle only one request at a time even when there are 7 cores sitting idle?
Basically yes, though in practice, you're unlikely to ever have all other cores idle.
In Java world, if I set the HTTP thread pool size to 8 in an app server Tomcat, the same 1000 requests would be executed 8 times faster than node.
That follows only if the Java webapp processes requests in the same amount of time that Node does. Whether it will do so depends on a great many factors, and therefore is more or less impossible to predict. It can only be measured. It is conceivable that measuring the performance of a Java implementation of the webapp vs. a Node implementation of the same webapp would find that that particular Java implementation is much slower than that particular Node implementation under the test conditions. Or not.
So do I have to run 8 instances of Node.js on an 8 core machine and front it with a load balancer so all 8 cores get used?
I'm sure there are possible variations on the details, but basically yes.
Use a load balancer to distribute load across multiple servers. To run Node.js in multiple processes on the same server, use cluster mode.
Like you, I also wondered what the impact of these different threading models would have in terms of performance which is why I conducted this investigation into that subject. I wrote two functionally identical I/O bound micro-services, one in Node.js and the other in Java / DropWizard (uses Jetty). I ran the same load test on both then captured the performance data and compared the results.
What I found was that latency for the two micro-services was very similar. It was throughput that was different. Without cluster mode, the Node.js service has 40% lower throughput. With cluster mode, the Node.js service had 16% lower throughput.

JAVA Distributed processing on a single machine (Ironic i know)

I am creating a (semi) big data analysis app. I am utilizing apache-mahout. I am concerned about the fact that with java, I am limited to 4gb of memory. This 4gb limitation seems somewhat wasteful of the memory modern computers have at their disposal. As a solution, I am considering using something like RMI or some form of MapReduce. (I, as of yet, have no experience with either)
First off: is it plausible to have multiple JVM's running on one machine and have them talk? and if so, am I heading in the right direction with the two ideas alluded to above?
Furthermore,
In attempt to keep this an objective question, I will avoid asking "Which is better" and instead will ask:
1) What are key differences (not necessarily in how they work internally, but in how they would be implemented by me, the user)
2) Are there drawbacks or benefits to one or the other and are there certain situations where one or the other is used?
3) Is there another alternative that is more specific to my needs?
Thanks in advance
First, re the 4GB limit, check out Understanding max JVM heap size - 32bit vs 64bit . On a 32 bit system, 4GB is the maximum, but on a 64 bit system the limit is much higher.
It is a common configuration to have multiple jvm's running and communicating on the same machine. Two good examples would be IBM Websphere and Oracle's Weblogic application servers. They run the administrative console in one jvm, and it is not unusual to have three or more "working" jvm's under its control.
This allows each JVM to fail independently without impacting the overall system reactiveness. Recovery is transparent to the end users because some fo the "working" jvm's are still doing their thing while the support team is frantically trying to fix things.
You mentioned both RMI and MapReduce, but in a manner that implies that they fill the same slot in the architecture (communication). I think that it is necessary to point out that they fill different slots - RMI is a communications mechanism, but MapReduce is a workload management strategy. The MapReduce environment as a whole typically depends on having a (any) communication mechanism, but is not one itself.
For the communications layer, some of your choices are RMI, Webservices, bare sockets, MQ, shared files, and the infamous "sneaker net". To a large extent I recommend shying away from RMI because it is relatively brittle. It works as long as nothing unexpected happens, but in a busy production environment it can present challenges at unexpected times. With that said, there are many stable and performant large scale systems built around RMI.
The direction the world is going this week for cross-tier communication is SOA on top of something like spring integration or fuse. SOA abstracts the mechanics of communication out of the equation, allowing you to hook things up on the fly (more or less).
MapReduce (MR) is a way of organizing batched work. The MR algorithm itself is essentially turn the input data into a bunch of maps on input, then reduce it to the minimum amount necessary to produce an output. The MR environment is typically governed by a workload manager which receives jobs and parcels out the work in the jobs to its "worker bees" splattered around the network. The communications mechanism may be defined by the MR library, or by the container(s) it runs in.
Does this help?

Hardware sizing an application which runs fine on a laptop?

If an application already runs well on a laptop with a local webserver and db, how does it impact hardware sizing for when it is deployed into product?
We're piloting this application for the first time, and up until now the application runs fine off a mid tier laptop.
I assume any server will be more powerful than a laptop. How should one scale the requirements appropriately?
The main impacts I can see are:
Locality of DB (may be installed on a seperate server or data centre causing network issues - no idea if this even impacts cpu, memory specs)
Overhead of enterprise web container (currently using jetty, expected to move to tomcat for support reasons)
We're currently using Windows, server will most likely be in unix.
Not sure what applications details are relevant but:
- Single thread application
- Main function is to host a REST service which computes a algorithm of average complexity. Expecting around 16 requests a second max
- Using Java and Postgre currently
Thanks!
There's no substitute for a few things including testing on comparable server hardware and knowing the performance model of your stack. Keep in mind your requirements are likely to change over time, and often times it is more important to have a flexible approach to hardware (in terms of being able to move pieces of it onto other servers) than it is to have a rigid formula for what you need.
You should understand however, that different parts of your stack have different needs. PostgreSQL usually (not always, but usually) needs fast disk I/O and more processor cores (processor speed is usually less of a factor) while your Java app is likely to benefit from faster cores.
Here are my general rules:
Do some general performance profiling, come to an understanding of where CPU power is being spent, and what is waiting on disk I/O.
Pay close attention to server specs. Just because the server may be more powerful in some respects does not mean your application will perform better on it.
Once you have this done, select your hardware with your load in mind. And then test on that hardware.
Edit: What we pay attention to at Efficito are the followingfor CPU:
Cache size
Cores
GFLOPS/core
Other performance metrics
For hard drives, we pay attention to the following:
Speed
Type
RAID setup
General burst and sustained throughput
Keep in mind your database-portions are more likely to be I/O bound, meaning you get more from paying attention to hard drive specs than from CPU specs, while your application code will more likely be CPU-bound, meaning better CPU specs give you better performance. Keep in mind we are doing virtualized hosting of ERP software on PostgreSQL and Apache, and so we get to balance both sides usually on the same hardware.

ehcache monitor: installation/configuration

In the documentation http://ehcache.org/documentation/user-guide/monitor there is a phrase:
It is recommended that you install the Monitor on an Operations server separate to production.
Why it is so? What will be if I install it on the production?
And, the second question on which I did not find the answer there - is it really this monitor does not affect the performance of application?
I'll try to explain what I think they mean.
First of all, I don't think the intention is that you not use the Monitor in production. Rather, I think they mean that the Monitor should be installed on a separate server in a production environment. There are at least three good reasons to do this.
The first is one of security. The clients that your production server is servicing shouldn't be able to reach the Monitor's services. By putting it on a separate server (perhaps behind a firewall) you prevent this.
The second is one of landscape simplicity. The Monitor can monitor several servers. By putting it on a separate server, you prevent one application server from being "special" - all the application servers are identical as far as this is concerned. Easier for scaling and maintenance of your landscape.
The third reason is one of performance. Calls to the Monitor won't impact the application server/s. This is as it should be.
As for the second part of your question- obviously, adding ehcache monitoring will affect performance to some extent. Probably it's meant to only incur a minimal overhead, but nothing is completely without cost. But if you end up optimizing the caches, it will probably be worth it.
I found this paragraph detailing how often the Monitor samples:
Memory is estimated by sampling. The first 15 puts or updates are measured and then every 100th put or update
(this is from the statistics section of the Monitor page)

Zookeeper/Chubby -vs- MySql NDB

I have been reading the Paxos paper, the FLP theorem etc. recently and evaluating Apache Zookeeper for a project. I have also been going thru Chubby (Google's distributed locking service) and the various literature on it that is available online. My fundamental usecase for Zookeeper is to implement replication and general coordination for a distributed system.
I was just wondering though, what is the specific advantage that Zookeeper or a Chubby like distributed locking system brings to the table. Basically I am just wondering why I can't just use a MySQL NDB Cluster. I keep hearing that MySQL has a lot of replication issues. I was hoping some with more experience on the subject might shed some light on it.
Thanks in advance..
A simplistic listing of my requirements :
I have a homogeneous distributed system.
I need some means of maintaining consistent state across all my nodes.
My system exposes a service, and interaction with clients will lead to some change in collective state of my system.
High availability is a goal, thus a node going down must not affect the service.
I expect the system to service atleast a couple of 1000 req/sec.
I expect the collective state of the system to be bounded in size (basically inserts/deletes will be transient... but in steady state, i expect lots of updates and reads)
It depends on the kind of data you are managing and the scale and fault tolerance you are going for.
I can answer from the ZooKeeper point of view. Before starting I should mention that ZooKeeper is not a Chubby clone. Specifically it does not do locks directly. It is also designed with different ordering and performance requirements in mind.
In ZooKeeper the entire copy of system state is memory resident. Changes are replicated using an atomic broadcast protocol and synced to disk (using a change journal) by a majority of ZooKeeper servers before being processed. Because of this ZooKeeper has deterministic performance that can tolerate failures as long as a majority of servers are up. Even with a big outage, such as a power failure, as long as a majority of servers come back on line, system state will be preserved. The information stored is ZooKeeper is usually considered the ground truth of the system so such consistency and durability guarantees are very important.
The other things that ZooKeeper gives you have to do with monitoring dynamic coordination state. Ephemeral nodes allow you do to easy failure detection and group membership. The ordering guarantees allow you to do leader election and client side locking. Finally, watches allow you to monitor system state and quickly respond to changes in system state.
So if you need to manage and respond to dynamic configuration, detect failures, elect leaders, etc. ZooKeeper is what you are looking for. If you need to store lots of data or you need a relational model for that data, MySQL is a much better option.
MySQL with Innodb provides a good general purpose solution, and will probably keep up with your performance requirements quite easily on not-too-expensive hardware. It can easily handle many thousands of updates per second on a dual quad-core box with decent disks. The built-in asynchronous replication will get you most of the way there for your availability requirements - but you might lose a few seconds' worth of data if the primary fails. Some of this lost data might be recoverable when the primary is repaired, or might be recoverable from your application logs: whether you can tolerate this is dependent on how your system works. A less lossy - but slower - alternative is to use MySQL Innodb with shared disk between Primary and Failover units: in this case, the Failover unit will take over the disk when the Primary fails with no loss of data -- as long as the Primary did not have some kind of disk catastrophe. If shared disk is not available, DRBD can be used to simulate this by synchronously copying disk blocks to the Failover unit as they are written: this might have an impact on performance.
Using Innodb and one of the replication solutions above will get your data copied to your Failover unit, which is a large part of the recovery problem solved, but extra glue is required to reconfigure your system to bring the Failover unit on-line. This is usually performed with a cluster system like RHCS or Pacemaker or Heartbeat (on Linux) or the MS Cluster stuff for Windows. These systems are toolkits, and you are left to get your hands dirty building them into a solution that will fit your environment. However, for all of these systems there is a brief outage period while the system notices that the Primary has failed, and reconfigures the system to use the Failover unit. This might be tens of seconds: trying to reduce this can make your failure detection system too sensitive, and you might find your system being failed over unnecessarily.
Moving up, MySQL NDB is intended to reduce the time to recovery, and to some extent help scale up your database for improved performance. However, MySQL NDB has a quite narrow range of applicability. The system maps a relational database on to a distributed hash table, and so for complex queries involving multiple joins across tables, there is quite a bit of traffic between the MySQL component and the storage components (the NDB nodes) making complex queries run slow. However, queries that fit well run very fast indeed. I have looked at this product a few times, but my existing databases have been too complicated to fit well and would require a lot of redesign to get good performance. However, if you are at the design stage of a new system, NDB would work well if you can bear its constraints in mind as you go. Also, you might find that you need quite a few machines to provide a good NDB solution: a couple of MySQL nodes plus 3 or more NDB nodes - although the MySQL and NDB nodes can co-exist if your performance needs are not too extreme.
Even MySQL NDB cannot cope with total site loss - fire at the data centre, admin error, etc. In this case, you usually need another replication stream running to a DR site. This will normally be done asynchronously so that connectivity blips on the inter-site link does not stall your whole database. This is provided with NDB's Geographic replication option (in the paid-for telco version), but I think MySQL 5.1 and above can provide this natively.
Unfortunately, I know little about Zookeeper and Chubby. Hopefully someone else can pick up these aspects.

Categories