Import large csv files spring [closed] - java

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
I have not used Spring Batch so far and i was wondering if this is maybe the time to break the ice.
I have large csv files with 10k to 30k lines that i need to import into database. Also i would need to do some processing of that data. Like checking some fields if they exist in the db (if not the row should be ignored). I never used Spring Batch so i would need to spend some time getting to know it so i can use it. But also i have a really strict deadline for this.
But is it really worth it for this kind of work? Since it would not be something like scheduled job witch would be done on daily, weekly or monthly basis. It would be done per need. Maybe once in a few months.
So is it "overkill" to use batch processing or it would be fine to just iterate line by line with some buffered reader?

If it is a one-off job for "just" 10k-20k lines, Spring Batch is complete overkill and you are better of writing something smaller, yourself.

Related

What is the most efficient way to use database? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I'm new to using a database, specifically MySQL. I'm creating a web application for class in which you can look up the name of a book and it'll display the summary of the book. My question is should I send a query to the database that collects all of the books' data on initialization and put them into a HashMap inside a manager class for lookup or should I use a query each time to lookup a specific book information?
It depends on the data transport time I would say. If your average query time times the number of request goes faster than a script to put everything into a HashMap, use queries. Otherwise, use a script that collects everything and puts it into a HashMap.
But if you have thousands of rows, you should use queries, because otherwise you will use too much RAM.

make a GET call 100,000 times [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I have a requirement where I am writing a small utility to test apis(ofcourse there are existing tools but it has been decided to write one). I am required to bombard the api, for the same api call, with say 100 threads, around say 100,000 times.
I am using 'PoolingHttpClientConnectionManager' for the making the calls. I am using something as mentioned in the below link:
https://hc.apache.org/httpcomponents-client-ga/tutorial/html/connmgmt.html
My question is:
(1) How can I run the above code for 100,000 iterations? Using that many number of threads is obviously a bad idea. Initially thought of using ExecutorService for maintaining thread count and number of jobs to be submitted but it felt redundant.
(2)I read about 'setMaxTotal'(max connections) and 'setDefaultMaxPerRoute'(concurrent connections) but I dont think it will help achieve(1) though I will obviously be required to increase the values.
Please advise. Thanks in advance.
You could use a threadpool and push the workerfunction the required number of times. Then you could even vary the number of workerthreads executing the functions to simulate different loadsituations.
Threadpool tutorial:
https://docs.oracle.com/javase/tutorial/essential/concurrency/pools.html
Why don't you use Jmeter for such performance/load testing?

Using SQLite or a File [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I am new to Android development and I am trying to make a Trivia application.
I need to store the data relating to questions somewhere and I am not entirely sure where to store it.
I plan to have multiple people playing so I need each person to have the same questions.
Basically I planned to have a list of categories and within each category I had question objects.
The question objects contained information regarding the question such as the answers and question itself.
However, if I use a database, I believe none of this would be needed due the questions being stored in tables which would represent categories.
In terms of speed what would be better:
to store it in a database
or to read from a file every time the application is loaded and store the data within a data structure?
You almost certainly want a database. Databases are made for fast search and easy insertion/deletion. There's really no advantage to having a file and doing in memory parsing each time.
Aside from performance benefits, here's a simple list of advantages of using SQLite rather than flat file:
You can query items as you wish -- don't need to load all of them and
select which ones you need.
Record deletion is a much less painful process. No rewriting of whole
files into wherever.
Updating a record is as easy as removing or creating one.
Have you ever tried doing cross-referencing lookups on a flat file?
Just.Not.Worth.It.
To summarize, it's every advantage a Database has over a text file.
Answer by josephus

should I use distributed executors or mapreduce in hazelcast [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 7 years ago.
Improve this question
I need to run some calculations on a distributed map. But I cannot decide which approach to take.
My calculations will result in a map data structure. where the results will be mapped to their keys. think of it as a word count example. where word is the key and occurrence count is the value.
I have looked into both solutions and as I understand, map reduce fits best in this scenario but i want to keep things simple. and i also cannot see why this is not possible with distributed executor.
Both options are possible. Before we had the generic mapreduce framework people build solutions like this using the ExecutorService implementation.
At the moment, it'll change in the near future, the mr solution doesn't offer a way to write to an IMap directly, so all results are send to the caller first and he would have to store it then.

How to combine many(1000) png file together in one big file system where we can iterate and get all if needed [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I am putting many images on HDFS. However each one is taking 64MB Block there.As the count of images are very high So wanted to put all image information in one big file. Now this will be feed to mapper to process it faster . What inputformat i can use? Or do i need to use sequencefile concepts ?i am not much sure as how to proceed further could someone please suggest some better way to deal this.
Just throw them all in a Zip.
Really you would be better off using a Database (for example MongoDB) and store them all in there though.

Categories