I have a table with 50 columns and I want to insert all items in a HashMap variable into it (HashMap keys and table column names are the same).
How can I do that without writing 50 lines of code?
Get the key set for the HashMap. Iterate that key set to build a String containing your insert statement. Use the resulting String to create a PreparedStatement. Then iterate that key set again to set parameters by name using the Objects you retrieve from the HashMap.
You might have to write a few extra lines of special-case code if any of your values are of a Class that the JDBC driver isn't sure how to map.
I'd suggest you bite the dust and simply write a method that will do the dirty work for you containing 50 lines of parameter setting code. This isn't so bad, and you only have to write it once. I hope you aren't that lazy ;-)
And by the way, isn't 50 columns in a table a bit much? Perhaps a normalization process could help and lower complexity of your database and the code that will manipulate it.
Another way to go is to use an ORM like Hibernate, or a more lightweight approach like Spring JDBC template.
Call map.keySet() to get the name of all columns.
Create an INSERT statement by iterating the key set.
The column is from an item (a key) in the key set.
The data is from map.get(key).
Related
I am using JDBI to iterate through a resultset via streams. Currently mapToMap is causing problems when there is a column with the same name in the result. What I need is just the values without the column names.
Is there a way to map the results to an Object list/array? The docs does not have an example for this. I would like to have something like
query.mapTo(List<Object>.class).useStream(s -> { . . .})
First of all - what kind of use case would allow you not care at all about the column name but only the values? I am genuinely curious
If it does make sense, it is trivial to implement a RowMapper<List<Object>> in your case, which runs through all the columns by index and puts the results of rs.getObject(i) into a list.
We have a large set of data (bulk data) that needs to be checked if the record is existing in the database.
We are using SQL Server2012/JPA/Hibernate/Spring.
What would be an efficient or recommended way to check if a record exists in the database?
Our entity ProductCodes has the following fields:
private Integer productCodeId // this is the PK
private Integer refCode1 // ref code 1-5 has a unique constraint
private Integer refCode2
private Integer refCode3
private Integer refCode4
private Integer refCode5
... other fields
The service that we are creating will be given a file where each line is a combination of refCode1-5.
The task of the service is to check and report all lines in the file that are already existing in the database.
We are looking at approaching this in two ways.
Approach1: Usual approach.
Loop through each line and call the DAO to query the refCode1-5 if existing in the db.
//psuedo code
for each line in the file
call dao. pass the refCode1-5 to query
(select * from ProductCodes where refCode1=? and refCode2=? and refCode3=? and refCode4=? and refCode5=?
given a large list of lines to check, this might be inefficient since we will be invoking the DAO xxxx number of times. If the file say consists of 1000 lines to check, this will be 1000 connections to the DB
Approach2: Query all records in the DB approach
We will query all records in the DB
Create a hash map with concatenated refCode1-5 as keys
Loop though each line in the file validating against the hashmap
We think this is more efficient in terms of DB connection since it will not create 1000 connections to the DB. However, if the DB table has for example 5000 records, then hibernate/jpa will create 5000 entities in memory and probably crash the application
We are thinking of going for the first approach since refCode1-5 has a unique constraint and will benefit from the implicit index.
But is there a better way of approaching this problem aside from the first approach?
try something like a batch select statement for say 100 refCodes instead of doing a single select for each refCode.
construct a query like
select <what ever you want> from <table> where ref_code in (.....)
Construct the select projection in a way that not just gives you wnat you want but also the details of ref_code. Teh in code you can do a count or multi-threaded scan of resultset if DB said you got less refCodes that the number you codes you entered in query.
You can try to use the concat operator.
select <your cols> from <your table> where concat(refCode1, refCode2, refCode3, refCode4, refCode5) IN (<set of concatenation from your file>);
I think this will be quite efficient and it may be worth to try to see if pre-sorting the lines and playing with the num of concatenation taken each times bring you some benefits.
I would suggest you create a temp table in your application where all records from file are stored initially with batch save, and later you run a query joining new temp table and productCodes table to achieve filtering how you like. In this way you are not locking productCodes table many times to check individual row as SqlServer locks rows on select statement as well.
I am not able to find any satisfying solution so asking here.
I need to compare data of two large tables(~50M) with the same schema definition in JAVA.
I can not use order by clause while getting the resultset object and records might be not in order in both of the tables.
Can anyone help me what can be the right way to do it?
You could extract the data of the first DB table into a text file, and create a while loop on the resultSet for the 2nd table. As you iterate through the ResultSet do a search/verify against the text file. This solution works if memory is of concern to you.
If not, then just use a HashMap to hold the data for the first table and do the while loop and look up the records of the 2nd table from the HashMap.
This really depends on what you mean by 'compare'? Are you trying to see if they both contain the exact same data? Find rows in one not in the other? Find rows with the same primary keys that have differing values?
Also, why do you have to do this in Java? Regardless of what exactly you are trying to do, it's probably easier to do with SQL.
In Java, you'll want to create an class that represents the primary key for the tables, and a second classthat represents the rest of the data, which also includes the primary key class. If you only have a single column as the primary key, then this is easier.
We'll call P the primary key class, and D the rest.
Map map = new HashMap();
Select all of the rows from the first table, and insert them into the hash map.
Query all of the rows in the second table.
For each row, create a P object.
Use that to see what data was in the first table with the same Key.
Now you know if both tables contained the same row, and you can compare the non-key values from both both.
Like I said, this is much much easier to do in straight SQL.
You basically do a full outer join between the two tables. How exactly that join looks depends on exactly what you are trying to do.
I have recently started taking much interest in CQL as I am thinking to use Datastax Java driver. Previously, I was using column family instead of table and I was using Astyanax driver. I need to clarify something here-
I am using the below column family definition in my production cluster. And I can insert any arbitrary columns (with its value) on the fly without actually modifying the column family schema.
create column family FAMILY_DATA
with key_validation_class = 'UTF8Type'
and comparator = 'UTF8Type'
and default_validation_class = 'BytesType'
and gc_grace = 86400;
But after going through this post, it looks like- I need to alter the schema every time whenever I am getting a new column to insert which is not what I want to do... As I believe CQL3 requires column metadata to exist...
Is there any other way, I can still add arbitrary columns and its particular value if I am going with Datastax Java driver?
Any code samples/example will help me to understand better.. Thanks..
I believe in CQL you solve this problem using collections.
You can define the data type of a field to be a map, and then insert arbitrary numbers of key-value pairs into the map, that should mostly behave as dynamic columns did in traditional Thrift.
Something like:
CREATE TABLE data ( data_id int PRIMARY KEY, data_time long, data_values map );
INSERT INTO data (data_id, data_time, data_values) VALUES (1, 21341324, {'sum': 2134, 'avg': 44.5 });
Here is more information.
Additionally, you can find the mapping between the CQL3 types and the Java types used by the DataStax driver here.
If you enable compact storage for that table, it will be backwards compatible with thrift and CQL 2.0 both of which allow you to enter dynamic column names.
You can have as many columns of whatever name you want with this approach. The primary key is composed of two things, the first element which is the row_key and the remaining elements which when combined as a set form a single column name.
See the tweets example here
Though you've said this is in production already, it may not be possible to alter a table with existing data to use compact storage.
What is the best practice to update a table record most effectively (in my case with a primary key), when not all values are present?
Imagine:
PRIMARY_KEY1, COLUMN_2, COLUMN_3, COLUMN_4, COLUMN_5, COLUMN_6, ...
I always get tuples like (PRIMARY_KEY1, COLUMN_5, COLUMN_4) or (PRIMARY_KEY1, COLUMN_2, COLUMN_6, COLUMN_3) and want to just update them in the fastest way possible without having a database lookup for all other values.
Since I have to to this very fast, I would like to use something like batches for prepared statements in order to prevent massive database requests.
Thanks for all replies!
You can 'cheat' by expecting SQL to fill in the values at row-access time. Eg, this type of statement:
UPDATE MyTable SET (column_1, column_2, ..., column_6)
= (COLAESCE(#suppliedValue1, column_1),
COLAESCE(#suppliedValue2, column_2),
...,
COLAESCE(#suppliedValue6, column_6))
WHERE primary_Key1 = #primaryKey
Then, when filling out the parameters, just leave anything unsupplied null... and you should be good.
you are not required to update the entire row in SQL. just use UPDATEs SET syntax.
UPDATE table SET COLUMN_5 = 'foo', COLUMN_4 = 'goo' WHERE PRIMARY_KEY1 = 'hoo';
See this post here,
JDBC batch insert performance
Read it. Then look on the right column of the page under related links for other similar posts
You should find all the answers you need in no time.