Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I'm currently rewriting a Ruby on Rails web app in Spring Boot. A big part of the move is for performance.
Whilst developing the app, when I hit run in IntelliJ the first response time is typically around 1s which I assume is JVM startup, after a refresh it'll jump down to 300ms~ then 150ms for 4-5 further requests, after that it settles on 50-75ms for the most part. Randomly though later on I'll get a 150ms response again.
As a JVM novice I'm wondering what factors are at play here in the varying response times? which would be closer to the standard "hot" response times that I could expect in production? I realise I'm unlikely to get an accurate depiction of production performance on my local dev machine but would like to understand the variance seen above so I can at least gauge a little better what affect my incremental changes are having.
As a JVM novice I'm wondering what factors are at play here in the varying response times?
startup:
jit warmup
lazy initialization as part of your application
GC needing to settle on some heap size
steady state:
GC pauses
application behavior, e.g. cache entries expiring every now and then
varying load
JIT deoptimizations/recompilations due to some uncommon paths being taken
thermal CPU throttling, especially on but not exclusive to laptops
For server applications you should ignore the ramp-up behavior and focus on steady state. And guessing what the issue might be will not help, measurements are king.
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
If a distributed computing framework spins up nodes for running Java/ Scala operations then it has to include the JVM in every container. E.g. every Map and Reduce step spawns its own JVM.
How does the efficiency of this instantiation compare to spinning up containers for languages like Python? Is it a question of milliseconds, few seconds, 30 seconds? Does this cost add up in frameworks like Kubernetes where you need to spin up many containers?
I've heard that, much like Alpine Linux is just a few MB, there are stripped down JVMs, but still, there must be a cost. Yet, Scala is the first class citizen in Spark and MR is written in Java.
Linux container technology uses layered filesystems so bigger container images don't generally have a ton of runtime overhead, though you do have to download the image the first time it is used on a node which can potentially add up on truly massive clusters. In general this is not usually a thing to worry about, aside from the well known issues of most JVMs being a bit slow to start up. Spark, however, does not spin up a new container for every operation as you describe. It creates a set of executor containers (pods) which are used for the whole Spark execution run.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I was wondering if a simple program with no threads can run faster on some computers which have many cores? or on a cluster of linux servers?
Recently I have run my algorithm which has to process billions of IP packets on my PC(core i7 with 16GB RAM) and it took 1881 minutes to finish processing. Then I thought its good to run the algorithm on clusters of linux servers each node with 10 processors and 48GB RAM to get the results quicker. However, there is no big difference between the two experiments.
Can someone comments what I am missing?
Unless your algorithm actually makes use of those multiple instances and extra memory, there shouldn't be a lot of difference. Parallel programming is an art of its own, and a "regular", single-threaded program doesn't just change into parallel one by itself.
If you have a single thread of execution more cores, CPU or machines won't help. Only a faster CPU would speed things up, and that only if your process is CPU-bound, and not IO-bound.
First you should check, where your processing time is spent, in CPU, or waiting for IO. If you have a significant amount of CPU usage, you can try to parallelize your work, i.e. split the data into chunks, and have different thread resp. machines process them in parallel.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've been writing some console applications in c++ for working with audio for a little while now and I'm interested in running them on a website. Most of my programs are quite resource-hungry, however, some with execution times of up to 5-10 min, reading and writing several gigabytes to and from disk, and requiring several gigabytes of memory. I've done a few simple php-mysql pages before, but nothing like this, so before i get my hopes up and dive into learning how to get an application running on a website, i figure i should ask a few questions:
Is it even feasible to run a program like this on the web? How would performance on a server compare to my PC?
Do web hosts typically allow a single user to use this kind of memory?
I realize c++ isn't usually the first choice for web programming, but since performance will be critical would it be better than Java?
I know nothing about this, so i'm just trying to get my expectations straight.
This is my opinion:
1 - The user of your web application is probably not going to wait 5-10 min for a response. You can focus on doing the hard-work on another process and your web app later shows the results to your user in some way.
2 - Yes, they allow, but that costs money. You can see Amazon EC2 and Digital Ocean (cheaper).
3 - The programming language in this case (C++ or Java) is not that important. Focus more on your problem, architecture, deferred tasks, batch processing, etc. That will really make a difference.
No, the programming language doesn't much matter. It used to be the case that java was slower than C++ i believe, but that gap has closed pretty much as compilers have improved. If you want to run your applications better, try to design them in such a way that they are very efficient. Looking into Time Complexity may help, if you haven't already done so. The better your time complexity, the faster your program.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm working on 2-level cache (1-st is RAM and 2-nd is FS) in Java. I implemented first level for now.
Could you please clarify:
How to make good testing for my cache implementation?
Which scenarios are typical for using cache in java application?
For now my testing scenario is just 5-20 threads which try to put and get data from cache.
But I suppose that it's not typical case...
You really need to know some statistics about the requests being made in order to properly test it.
You could go ahead and test 3 scenarios where you have a special setup for starters:
Setup: 200 MB RAM, 5 GB FS space.
Test 1: Only do queries that you know are in the RAM.
Test 2: Do a percentage queries that you know are in the RAM, and the other queries must be fetched from the file system.
Test 3: Do 100% requests that you know must be selected from the FS.
If you do these tests, you'll have 3 coordinates that will give you an idea of the performance for each layer. If you want to be more thorough then add 25% and 75% or more increment percentages.
If you have access to some "live testing", meaning real requests that you're trying to speed up, then create some statistics for your requests and base your cache on those instead of trying to solve a general "caching" problem.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Twitter has open sourced their Mysql source code.
This blog post http://engineering.twitter.com/2012/04/mysql-at-twitter.html mentions the different changes.
I have used Mysql as developer executing queries but never had to dig deep into its source code. I tried going through the source code on github https://github.com/twitter/mysql but was overwhelmed.
So thought I would post here and get some help.
I would like to better understand how Twitter's changes has improved
Mysql over the Oracle's version.
If I were to make an apples to apples comaparison between Oracle's version and Twitter's version what are specific advantages disadvantages between the two.
There are many more questions that popped into my head.
I get that this is sort of an advanced DB topic, but I would love learn about it.
They describe the changes as including:
Add additional status variables, particularly from the internals of
InnoDB. This allows us to monitor our systems more effectively and
understand their behavior better when handling production workloads.
Optimize memory allocation on large NUMA systems: Allocate InnoDB's
buffer pool fully on startup, fail fast if memory is not available,
ensure performance over time even when server is under memory
pressure.
Reduce unnecessary work through improved server-side
statement timeout support. This allows the server to proactively
cancel queries that run longer than a millisecond-granularity timeout.
Export and restore InnoDB buffer pool in using a safe and lightweight
method. This enables us to build tools to support rolling restarts of
our services with minimal pain.
Optimize MySQL for SSD-based machines,
including page-flushing behavior and reduction in writes to disk to
improve lifespan.
These are things that are important for a Twitter-scale web site. If you are building something that has to scale to that size - yes, those changes would be helpful. But, if you are building something more like every other mildly-popular web site in the world, you probably don't need to be concerned with their changes. Most large web sites that run MySQL run it straight from the distribution.