We can read/write data to sql server using JDBC, but for certain scalability reasons (volume & frequency), we want to use the bulk copy functionality.
According to the documentation here, there is a SQLServerBulkCopy class. There are numerous examples, includind reading from other tables, and reading from files, but there is no example on how to insert an array of rows.
SQLServerBulkCopy has 3 writeToServer methods, which take Rowset, Resultset, and ISQLServerBulkData. Is converting our array/list into one of these classes the only way to do a bulk copy? Is there any other way to do it?
Would be glad of any pointers if you've come across this before.
Is converting our array/list into one of these classes the only way to do a bulk copy? Is there any other way to do it?
Yes. The JDBC driver will also use bulk copy API for batch insert operation.
Related
I have SpringBoot project which will pull a large amount of data from one database, do some kind of transformation on it, and then insert it into a table in a PostgreSQL database. This process will continue for a few billion records so performance is key.
I've been researching trying to find the best way to do this, such as using an ORM or a JDBCTemplate for example. One thing I keep seeing constantly regarding bulk inserts into PostgreSQL is the COPY command. https://www.postgresql.org/docs/current/populate.html
I'm confused because using COPY requires the data to be written into a file, and while I've seen people saying to use it I've yet to come across a case where someone mentions how to get the data into the file. Isn't writing to a file slow? If writing to a file is slow, then the performance gains that COPY does bring, does this make it be like there was no gain at all?
These kind of data migration and conversion is better to handle in Stored procedures. Assuming that the source data is already loaded to postgres ( if not use postgres db utility to load the raw data to some flat table). Then write series of stored procs to transform the data and insert into the destination table.
I have done some complex data migration and i used this approach. If you have to do lot of complex data conversion, write some python script ( which is usually faster than spring boot/data setup), insert the parially converted data, then do some stored procs to do the final conversion.
It is better to keep the business logic to convert/massage data close to the datasource ( in stored procs) instead of pulling data to app server and reinserting them.
Hope it helps.
What is the best approach for saving statistical data on a file using spring framework? is there any available library that offers reading and updating the data on a file? or should I build my own IO code?
I already have a relational database, but don't like the approach of creating an additional table to save the calculated values in different multiple tables with joins, also don't want to add more complexity to the project by using an additional database for just one task like MongoDB.
To understand the complexity of this report, Imagine you are drawing a chart with a total number of daily transactions for a full year with billions of records at any time with a lot of extra information like( total and average with different currencies on different rates).
So, my approach was to generate those data in a file on a regular basis, so later I don't need to generate them again once requested, only accumulate the new dates if available to the file
Is this approach fine? and what is the best library to do that in an efficient way?
Update
I found this answer useful for why sometimes people prefer using flat files rather than the relational or non-relational one
Is it faster to access data from files or a database server?
I would preferet to use MongoDB for such purposes, but if you need simple approach, you can write your data to csv\excel file.
Just using I\O
List<String> data = new ArrayList<>();
data.add("head1;head2;head3");
data.add("a;b;c");
data.add("e;f;g");
data.add("9;h;i");
Files.write(Paths.get("my.csv"), data);
That is all)
How to convert your own object, to such string 'filed1;field2' I think you know.
Also you can use apache-poi csv library, but I think this is way much faster.
Files.write(Paths.get("my.csv"), data, StandardOpenOption.APPEND);
If you want to append data to existed file, there are many different options in StandardOpenOption.
For reading you should use Files.readAllLines(Paths.get("my.csv")); it will return you list of strings.
Also you can read lines in range.
But if you need to retrieve one column, or update two columns where, and so on. You should read about MongoDB or other not relational databases. It is difficult write about MongoDB here, you should read documentation.
Enjoy)
I found a library that can be used to write/read CSV files easily and can be mapped to objects as well Jackson data formats
Find an example with spring
We have an application that runs with any of IBM Informix, MySQL and Oracle, and we are using Java with Hibernate to connect to the database. We will store XML, CSV and other text-based files inside the database (clob column). The entities in Java are byte[] objects.
One feature request to the application is now to "grep" content inside the data. So I need to find all files with a specific content.
On regular char/varchar fields I can use like '%xyz%', but this is not working on byte[] / blobs.
The first approach was to load each entity, cast the byte[] into a string and use the contains method in Java. If the use enters any filter parameters on other (non-clob) columns, I will apply those filters before testing the clob in order to reduce the number of blobs I have to scan.
That worked quite well for 100 files (clobs) and as long as the application and database are on the same server. But I think it will get really slow if I have 1.000.000 files inside the database and the database is not always in the same network. So I think that is not a good idea.
My next thought was creating a database procedure. But I am not quite sure if this is possible for Informix, MySQL and Oracle. And I am not sure if this is possible.
The last but not favored method is to store the content of the data not inside a clob. Maybe I can use a different datatype for that?
Does anyone has a good idea how to realize that? I need a solution for all three DBMS. The application knows on what kind of DBMS it is connected to. So it would be okay, if I have three different solutions (one for each DBMS).
I am completely open to changing what kind of datatype I use (BLOB, CLOB ...) — I can modify that as I want.
Note: the clobs will range from about 5 KiB to about 500 KiB, with a maximum of 1 MiB.
Look into Apache Lucene or other text indexing library.
https://en.wikipedia.org/wiki/Lucene
http://en.wikipedia.org/wiki/Full_text_search
If you go with a DB specific solution like Oracle Text Search you will have to implement a custom solution for each database. I know from experience that Oracle Text search takes significant time to learn and involves a lot of tweaking to get just right.
Also, if you use a DB solution you would receive different results in each DB even if the data sets were the same (each DB would have it's own methods of indexing and retrieving the data).
By going with a 3rd party solution like Lucene -- you only have to learn one solution and results will be consistent regardless of the Db.
What is the best way to implement the following scenario?
I need to call/query a data base table containing millions of records from a java application. Then for each records in the table, my application should call a third party API and get a status field as response. Then my application should again update each row in the table with the information (status) from the API.
Note - I am trying to figure out a method to do this in the best possible way. I understand that querying all the records together is not the best way forward.
Do not try to eat the elephant in one bite. Chunk it. Heard of pagination? Use it. See here: MySQL pagination without double-querying?
you can use oracle feature such as SQL loader, Data pumping Called via JDBC or script..
Databases are not designed to update millions of records via Java API repeatedly. This can take many minutes. If this is not enough, you may need to use a dataset embedded in Java (either caching or replacing your database)
I have to design a web application to retrieve data from a huge single table with 40 columns and several thousands of rows for select query and few rows/columns for updation.
Can you please suggest me that for faster performance, use of Hibernate is feasible or not as i only have single table and do not have any joins ?
Or should i use jdbc dao ?
database : sql server 2008
java 7
If you use Hibernate right, there's no problem in fetching an arbitrarily large result set. Just avoid from queries (use select ... from ... queries) and use ScrollableResults. If you use plain JDBC, you'll be able to get started quicker because Hibernate needs to be configured first, you need to write the mapping file, etc. but later on it might pay off since the code you write will be much simpler. Hibernate is very good at taking the boilerplate out of client code.
If you want to retrieve several thousand records and pagination is not possible then It might be a performance issue. Because hibernate will create an object against everyone and store it in its persistence context. If you create too many objects, it uses up a lot of memory. For these type of operations JDBC is better. For similar discussion see Hibernate performance issues using huge databases