Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a requirement where I am writing a small utility to test apis(ofcourse there are existing tools but it has been decided to write one). I am required to bombard the api, for the same api call, with say 100 threads, around say 100,000 times.
I am using 'PoolingHttpClientConnectionManager' for the making the calls. I am using something as mentioned in the below link:
https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
My question is:
(1) How can I run the above code for 100,000 iterations? Using that many number of threads is obviously a bad idea. Initially thought of using ExecutorService for maintaining thread count and number of jobs to be submitted but it felt redundant.
(2)I read about 'setMaxTotal'(max connections) and 'setDefaultMaxPerRoute'(concurrent connections) but I dont think it will help achieve(1) though I will obviously be required to increase the values.
Please advise. Thanks in advance.
You could use a threadpool and push the workerfunction the required number of times. Then you could even vary the number of workerthreads executing the functions to simulate different loadsituations.
Threadpool tutorial:
https://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
Why don't you use Jmeter for such performance/load testing?
Related
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have not used Spring Batch so far and i was wondering if this is maybe the time to break the ice.
I have large csv files with 10k to 30k lines that i need to import into database. Also i would need to do some processing of that data. Like checking some fields if they exist in the db (if not the row should be ignored). I never used Spring Batch so i would need to spend some time getting to know it so i can use it. But also i have a really strict deadline for this.
But is it really worth it for this kind of work? Since it would not be something like scheduled job witch would be done on daily, weekly or monthly basis. It would be done per need. Maybe once in a few months.
So is it "overkill" to use batch processing or it would be fine to just iterate line by line with some buffered reader?
If it is a one-off job for "just" 10k-20k lines, Spring Batch is complete overkill and you are better of writing something smaller, yourself.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 6 years ago.
Improve this question
Hello stackoverflow i have a doubt, which is the proper way to solve this?
I have this sql sentence:
SELECT *
FROM task
, subject
WHERE task.id_subject = task.id
AND task.id_tasktype = 1
AND subject.id_evaluation = {ALL the ids of table evaluation}
If i want to execute this sentence for every evaluation what is more efficient? a loop/cursor or whatever in SQL (i have basic knowledge of sql) or a regular for each in Java?
It depends on your situation. Basically, if your database server and application server are actually two different computers, then you might decide to run the loop at the server which can handle more pressure. You need to look at some statistics to be able to determine this.
Also, you can implement both solutions and measure the time needed at db server + time needed at application server. If one of your loops is consistently quicker than the other, then it is practically more efficient in the scenario you are running it according to your experiments. Off course, the scenario might change over time.
Generally speaking, people tend to run this loop on the application server (Java), since you might need to execute some things available only there in the future, but if you have a very good reason to run this on the database server, like the case when a trigger should trigger this functionality, then you might decide to run it there.
Basically, you are trying to optimize a loop where you do not necessarily have a problem. If you encounter performance issues, then you might decide to experiment with a few things, including, but not limited to the suggestions shown in your question.
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
I want to reduce the time of part of project that currently takes 2 hrs.
How it has been coded is it goes and take out almost 700,000 UID from one table and pass it to 16 different threads. Each thread then connect to JDBC and fetch a row for a UID one by one. it runs 700,000 query! 50k for each thread
Because it uses 3 to 4 fields of each row my plan is to get the needed fields at first and don't connect to database anymore.
my concerns:
because it fetch a row by UID ( I assume this should be fast) does it improve performance dramatically ?
I need to worry about memory and cache misses and everything, putting 700,000 rows with couple of fields in memory scares me.
Overall do you think this will help to improve the performance or you think it doesn't matter that much. saving 5min because of testing necessary it doesn't worth it.
So do you think I should pursue this path or focus more on logic???
Thanks a lot
As has been suggested in various comments, you should load the records in batches. I don't know how your infrastructure is configured, but if you are using a cloud service, a database round trip can take on the order of hundreds of milliseconds.
However, that may not be your only problem. If you haven't configured connection pooling, it may well be that you are gaining nothing from your multi-threading as each thread waits to grab the database connection. Your pool size should take into account how many connections may be established concurrently (in this case, it sounds like 17 may be your number - 1 for the main thread and 16 for the workers - if I understand your architecture correctly).
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I need to run some calculations on a distributed map. But I cannot decide which approach to take.
My calculations will result in a map data structure. where the results will be mapped to their keys. think of it as a word count example. where word is the key and occurrence count is the value.
I have looked into both solutions and as I understand, map reduce fits best in this scenario but i want to keep things simple. and i also cannot see why this is not possible with distributed executor.
Both options are possible. Before we had the generic mapreduce framework people build solutions like this using the ExecutorService implementation.
At the moment, it'll change in the near future, the mr solution doesn't offer a way to write to an IMap directly, so all results are send to the caller first and he would have to store it then.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I have a query which returns thousands of records (at some point it will). In that query I have some like this
Case when column in(:params1)
then :param2
when column in (:params3)
then :param4
when column in(:params5)
then :param6
when column in(:params7)
then :param8
END ABC
Now the question is what is better to do this in the query or return the column value and do the if\else in the pojo? And why? I tried testing it but currently don't have that much data.
Usually it is better (both because of performance and complexity) to let the database do as much work as possible for you. Doing the work in your application is likely to incur more network traffic than is necessary (which would decrease performance) and the code would have to contain all the nasty logic in it which would add complexity.
Also remember to avoid premature optimization. Try to avoid fixing problems that you don't have yet.
I would recommend letting the database do the work.
Returning thousands of records to the middle tier, operating on them, and shoving the result back into the database makes no sense to me. Why do all that network back and forth?
If you are truly processing that many records, I'd recommend considering letting the database do the work. No network traffic that way.
If not possible, you should make sure you truly need all those records. I'm betting you only think you do.
Writing queries this way seems like another bad idea to me.