Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
I am writing a Java Application that aims to get and persist messages in real time. Beyond that, my application should care about several non-functional requirements like lineage control, transaction control, security, logging, monitoring, and so on.
Each feature is defined as a module and will be controlled within a thread concurring to my hardware resources (disk, memory, CPU, and GPU). As consequence, each time my application evolves, my thread control should evolves too.
To deal with that I am creating a global ExecutorService to manage all threads in my application. Some of these threads are permanent and defined as daemon. My application also control multiple sources. Each one representing a set of all features described above, evolving arbitrarily.
Which are the best practices for that scenery? How to control multiple threads while some will be created and dropped arbitrarily (as daemon or not) and another ones will be executed regularly nested or not to another threads?
Generally, if you do not want to manage creation and disposal of threads, use a thread pool (you can create one with ExecutorService executorService = Executors.newFixedThreadPool(10);).
However, be careful of premature optimisation (i.e. spending effort "optimising" code, at the expense of readability/maintainability, often when the performance gain is irrelevant). General good practice is:
Write clean, readable code.
Run your code and see if you encounter performance issues
Profile your code (e.g. with JProfiler or JVisualVM) to identify performance issues
Refactor your code if there is a significant problem with the way it is.
Remember, in general, developer time is more valuable than CPU time. Obviously there are limits to this, but usually you are best served keeping your code clean and readable.
You can also use this process to tweak parameters (e.g. on a fixed thread pool, there are various thresholds that can be fine-tuned to improve performance).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I'm new to ForkJoinPool framework. Don't quite get it, how is that achieved that each thread in ForkJoinPool is certainly run by separate processor/core to provide real parallelism considering that there are many other threads outside the ForkJoinPool instance in the runtime executing concurrently. I have a clue that it has something to do with Thread Affinity. Can anyone share some ideas/links?
P.S. Of course, I meant the case when number of threads is no greater than Runtime.getRuntime().availableProcessors()
You asked:
how is that achieved that each thread in ForkJoinPool is certainly run by separate processor/core to provide real parallelism
There is no way within plain Java code to make certain cores run certain threads at certain times. A Java programmer has no direct control over parallelism.
When a Java thread is scheduled for execution on a CPU core, and for how long that execution runs, is up to the host OS thread technology being leveraged by your Java implementation.
As for processor affinity, also known as CPU pinning, see How to set a Java thread's cpu core affinity?. Beware the notice in Answer by rdalmeida:
… thread affinity is pointless unless you have previously isolated the core from kernel/user threads and hardware interrupts
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
If a distributed computing framework spins up nodes for running Java/ Scala operations then it has to include the JVM in every container. E.g. every Map and Reduce step spawns its own JVM.
How does the efficiency of this instantiation compare to spinning up containers for languages like Python? Is it a question of milliseconds, few seconds, 30 seconds? Does this cost add up in frameworks like Kubernetes where you need to spin up many containers?
I've heard that, much like Alpine Linux is just a few MB, there are stripped down JVMs, but still, there must be a cost. Yet, Scala is the first class citizen in Spark and MR is written in Java.
Linux container technology uses layered filesystems so bigger container images don't generally have a ton of runtime overhead, though you do have to download the image the first time it is used on a node which can potentially add up on truly massive clusters. In general this is not usually a thing to worry about, aside from the well known issues of most JVMs being a bit slow to start up. Spark, however, does not spin up a new container for every operation as you describe. It creates a set of executor containers (pods) which are used for the whole Spark execution run.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I am trying to understand the multi-threading concepts. Does the support of multi-threading comes from:
1) From Operating system? (OR)
2) Language itself? (like Java if I am correct)
What is the role of CPU, does multi-threading capability is also due to CPU (not considering the multi-core processors)?
Can there be a scenario where OS or CPU isn't supporting the multi-threading but still possible with language itself?
Can anyone help me understand this?
Understanding java's native threads and the jvm has some information.
Basically, multithreading support comes from both the OS and Java in Java's case. This does not happen for, for instance, Python (the standard CPython has just a wrapper around Linux's threads).
https://superuser.com/questions/740611/what-is-the-difference-between-multithreading-and-hyperthreading details what a CPU does in order to multithread.
In theory, yes, a language can be the implementor of the threading. Depending on how you look at it, C doesn't count on the OS, it does it's own threading (mostly because the OS is written in C). The above link also says this.
The language doing its own threading may not be as efficient as OS-level threading, so OS-threading is preferred and commonly present.
A thread is a sequence of instructions that can be managed independently from other such sequences by a scheduler.
Usually, the scheduler is part of the operating system (for example, Linux's Completely Fair Scheduler).
In some approaches (for example, green threads, stackless Python), the scheduler is part of the language or the runtime environment.
In modern computing environments it is usually the case that the number of threads exceeds the number of CPU cores. This is usually handled through time slicing, when threads take turns running on the available hardware. It is the job of the scheduler to manage that.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 6 years ago.
Improve this question
I am using JSP/Serverlets for my upcoming web application project. It is high traffic concurrent users web site. There has been many discussion about performance issues of Java 8 and especially in Streams.
Anyone having specific knowledge about performance of streams and whether its advisable to use in High traffic Web Applications so as to not compromise on latency and response time ?
As a general statement, outside of Java 8 Streams, it is pretty much impossible to answer your question as stated because it depends.
If you've got a method that is called hundreds of times per second then you would need to be very careful about performance. You'd want to tune it the best you could. Conversely, if you've got a method that gets called once a day then you likely wouldn't spend too much time optimizing it.
Streams are a useful tool when used correctly and they are easy to abuse. I've seen developers who thought it was a great idea to read an entire database table and use filtering with streams to effectively do a SQL "where" clause. That's a bad design but it honestly wouldn't be seen in the once-a-day method call.
Don't try to make these blanket "this is good or this is bad" statements. Do a good design and use the tools where they are appropriate. Optimize the parts of your application that need it but don't do pre-mature optimizations - you'll never finish the project.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
Twitter has open sourced their Mysql source code.
This blog post http://engineering.twitter.com/2012/04/mysql-at-twitter.html mentions the different changes.
I have used Mysql as developer executing queries but never had to dig deep into its source code. I tried going through the source code on github https://github.com/twitter/mysql but was overwhelmed.
So thought I would post here and get some help.
I would like to better understand how Twitter's changes has improved
Mysql over the Oracle's version.
If I were to make an apples to apples comaparison between Oracle's version and Twitter's version what are specific advantages disadvantages between the two.
There are many more questions that popped into my head.
I get that this is sort of an advanced DB topic, but I would love learn about it.
They describe the changes as including:
Add additional status variables, particularly from the internals of
InnoDB. This allows us to monitor our systems more effectively and
understand their behavior better when handling production workloads.
Optimize memory allocation on large NUMA systems: Allocate InnoDB's
buffer pool fully on startup, fail fast if memory is not available,
ensure performance over time even when server is under memory
pressure.
Reduce unnecessary work through improved server-side
statement timeout support. This allows the server to proactively
cancel queries that run longer than a millisecond-granularity timeout.
Export and restore InnoDB buffer pool in using a safe and lightweight
method. This enables us to build tools to support rolling restarts of
our services with minimal pain.
Optimize MySQL for SSD-based machines,
including page-flushing behavior and reduction in writes to disk to
improve lifespan.
These are things that are important for a Twitter-scale web site. If you are building something that has to scale to that size - yes, those changes would be helpful. But, if you are building something more like every other mildly-popular web site in the world, you probably don't need to be concerned with their changes. Most large web sites that run MySQL run it straight from the distribution.