Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to reduce the time of part of project that currently takes 2 hrs.
How it has been coded is it goes and take out almost 700,000 UID from one table and pass it to 16 different threads. Each thread then connect to JDBC and fetch a row for a UID one by one. it runs 700,000 query! 50k for each thread
Because it uses 3 to 4 fields of each row my plan is to get the needed fields at first and don't connect to database anymore.
my concerns:
because it fetch a row by UID ( I assume this should be fast) does it improve performance dramatically ?
I need to worry about memory and cache misses and everything, putting 700,000 rows with couple of fields in memory scares me.
Overall do you think this will help to improve the performance or you think it doesn't matter that much. saving 5min because of testing necessary it doesn't worth it.
So do you think I should pursue this path or focus more on logic???
Thanks a lot
As has been suggested in various comments, you should load the records in batches. I don't know how your infrastructure is configured, but if you are using a cloud service, a database round trip can take on the order of hundreds of milliseconds.
However, that may not be your only problem. If you haven't configured connection pooling, it may well be that you are gaining nothing from your multi-threading as each thread waits to grab the database connection. Your pool size should take into account how many connections may be established concurrently (in this case, it sounds like 17 may be your number - 1 for the main thread and 16 for the workers - if I understand your architecture correctly).
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I have a ConcurrentHashMap which stores ID and timestamp when this ID was updated.
This data is to be persistent as application restart should be aware of the previous state of the data when shut-down was called.
I am currently serializing the map to a file on shut-down and loading it back when the application restarts.
However I can foresee that the serialization on shutdown would fail when the disk is full. And this would mean data-loss which is unacceptable.
I thought of using a DB to store the data but then it would add network weight on every update.
The only thing that comes to my mind right now is to serialize the map on every update. This would ensure that most of the data is persistent in case of disk full also even in case of Unexpected shutdown.
I am aware that this is a heavy operation and am open for alternative solutions.
Also note, this map may hold over 1200K entries...
Thanks in advance
If your scenario allowed some data loss then one solution can be
1.Periodically save snapshot of your hashmap so at most there will be a data loss for that interval.
2.For strict scenario you can log your action such that you can replay and get the original value.And as log is adding on end and read less may not be a performance hit.Log base technique used in like zookeper for meta data storage.
3.Or you can persist to some kind of db asynchronously by using queue and process in batch.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a requirement where I am writing a small utility to test apis(ofcourse there are existing tools but it has been decided to write one). I am required to bombard the api, for the same api call, with say 100 threads, around say 100,000 times.
I am using 'PoolingHttpClientConnectionManager' for the making the calls. I am using something as mentioned in the below link:
https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
My question is:
(1) How can I run the above code for 100,000 iterations? Using that many number of threads is obviously a bad idea. Initially thought of using ExecutorService for maintaining thread count and number of jobs to be submitted but it felt redundant.
(2)I read about 'setMaxTotal'(max connections) and 'setDefaultMaxPerRoute'(concurrent connections) but I dont think it will help achieve(1) though I will obviously be required to increase the values.
Please advise. Thanks in advance.
You could use a threadpool and push the workerfunction the required number of times. Then you could even vary the number of workerthreads executing the functions to simulate different loadsituations.
Threadpool tutorial:
https://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
Why don't you use Jmeter for such performance/load testing?
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello stackoverflow i have a doubt, which is the proper way to solve this?
I have this sql sentence:
SELECT *
FROM task
, subject
WHERE task.id_subject = task.id
AND task.id_tasktype = 1
AND subject.id_evaluation = {ALL the ids of table evaluation}
If i want to execute this sentence for every evaluation what is more efficient? a loop/cursor or whatever in SQL (i have basic knowledge of sql) or a regular for each in Java?
It depends on your situation. Basically, if your database server and application server are actually two different computers, then you might decide to run the loop at the server which can handle more pressure. You need to look at some statistics to be able to determine this.
Also, you can implement both solutions and measure the time needed at db server + time needed at application server. If one of your loops is consistently quicker than the other, then it is practically more efficient in the scenario you are running it according to your experiments. Off course, the scenario might change over time.
Generally speaking, people tend to run this loop on the application server (Java), since you might need to execute some things available only there in the future, but if you have a very good reason to run this on the database server, like the case when a trigger should trigger this functionality, then you might decide to run it there.
Basically, you are trying to optimize a loop where you do not necessarily have a problem. If you encounter performance issues, then you might decide to experiment with a few things, including, but not limited to the suggestions shown in your question.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 9 years ago.
Improve this question
I have a three million rows in a database and I need to get all the values in a table as object and operate on those objects ? what is the best possible solution ?
The best solution is to not load all of them.
Why would need to load them all and operate on them?
Maybe you can do a SP (stored procedure) and work on these rows on the DB server.
If you still need to load them all, try to not load all columns of these rows.
Maybe you can use something like paging (if that is applicable to your case).
My answer is maybe too general but so is your question.
As Peter said, don't load them all. Instead, use an iterator, like a database cursor (ResultSet for the rows in a SQL query) to keep track of your place in the data. For any more specific answer, you'll need to give more detail, but you should also consider whether you can use SQL aggregation functions (COUNT, GROUP BY, etc.) to reduce the number of rows your application needs to process.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a query which returns thousands of records (at some point it will). In that query I have some like this
Case when column in(:params1)
then :param2
when column in (:params3)
then :param4
when column in(:params5)
then :param6
when column in(:params7)
then :param8
END ABC
Now the question is what is better to do this in the query or return the column value and do the if\else in the pojo? And why? I tried testing it but currently don't have that much data.
Usually it is better (both because of performance and complexity) to let the database do as much work as possible for you. Doing the work in your application is likely to incur more network traffic than is necessary (which would decrease performance) and the code would have to contain all the nasty logic in it which would add complexity.
Also remember to avoid premature optimization. Try to avoid fixing problems that you don't have yet.
I would recommend letting the database do the work.
Returning thousands of records to the middle tier, operating on them, and shoving the result back into the database makes no sense to me. Why do all that network back and forth?
If you are truly processing that many records, I'd recommend considering letting the database do the work. No network traffic that way.
If not possible, you should make sure you truly need all those records. I'm betting you only think you do.
Writing queries this way seems like another bad idea to me.