I'm writing the backend for a java http server for a class project and I have to insert a few records into a database using jdbc. The maximum number of insertions I have at one time is currently 122, which takes a whopping 18.7s to execute, about 6.5 insertions per second. This is outrageously slow, since the server needs to be able to respond to the request that inserts the records in less than 5s, and a real server would be expected to be many times faster. I'm pretty sure that this has something to do with the code or my declaration of the table schema, but I can't seem to find the bottleneck anywhere. The table schema looks like this:
CREATE TABLE Events (
ID varchar(38) primary key,
ownerName varchar(32) not null,
personID varchar(38) not null,
latitude float not null,
longitude float not null,
country varchar(64) not null,
city varchar(128) not null,
eventType varchar(8) not null,
year int not null,
foreign key (ownerName)
references Users (userName)
on delete cascade
on update cascade,
foreign key (ID)
references People (ID)
on delete cascade
on update cascade
);
and the code to perform the insertions is the following function
public class EventAccessor {
private Connection handle;
...
public void insert(Event event) throws DataInsertException {
String query = "insert into Events(ID,ownerName,personID,latitude,longitude,country,"
+ "city,eventType,year)\nvalues(?,?,?,?,?,?,?,?,?)";
try (PreparedStatement stmt = handle.prepareStatement(query)) {
stmt.setString(1, event.getID());
stmt.setString(2, event.getUsername());
stmt.setString(3, event.getPersonID());
stmt.setDouble(4, event.getLatitude());
stmt.setDouble(5, event.getLongitude());
stmt.setString(6, event.getCountry());
stmt.setString(7, event.getCity());
stmt.setString(8, event.getType());
stmt.setInt(9, event.getYear());
stmt.executeUpdate();
} catch (SQLException e) {
throw new DataInsertException(e.getMessage(), e);
}
}
}
Where Event is a class that holds an entry for the schema and DataInsertionException is a simple exception defined elsewhere in the API. I was instructed to use PreparedStatement because it's apparently "more safe" that using a Statement, but I have the choice to switch, so if it's faster I'll gladly change the code. The function that I use to insert the 122 entries is actually a wrapper for an array of Event objects that looks like this
void insertEvents(Event[] events) throws DataInsertException {
for (Event e : events) {
insert(e);
}
}
I'm willing to try anything to improve performance at this point.
I disabled auto commits on the JDBC connection with connection.setAutoCommit(false) and performance increased by over 1000x. New benchmarks show that inserting 122 records was complete in a mere in 0.008265739s, a speed of about 14,000 insertions per second, which is closer to what I was expecting.
Related
Good Day, I posted this question previously but it seems I am not clear enough so I will try to be as detailed as possible here about my situation.
I need to implement a solution to do a daily extraction of data from some CSV files and using only JDBC insert this data into a production environment database tables.
I have to insert into 2 tables
Tables :
Table1 (
[func] [varchar](8) NOT NULL,
[Ver] [smallint] NOT NULL,
[id] [varchar](32) NOT NULL,
[desc] [varchar](300) NOT NULL,
[value] [float] NOT NULL,
[dtcreated] [date] NOT NULL,
[dtloaded] [date] NULL,
CONSTRAINT [Table1_PK] PRIMARY KEY CLUSTERED
(
[func] ASC,
[ver] ASC,
[id] ASC,
[desc] ASC,
[dtcreated] ASC
);
table2 (
[id] [varchar](32) NOT NULL,
[f1] [varchar](50) NOT NULL,
[f2] [varchar](32) NOT NULL,
[f3] [varchar](6) NULL,
[f4] [varchar](3) NULL,
[f5] [varchar](3) NULL,
[f6] [varchar](32) NULL,
[DtStart] [date] NOT NULL,
[DtEnd] [date] NOT NULL,
[dtcreated] [date] NOT NULL,
[dtloaded] [date] NULL,
CONSTRAINT [table2_PK] PRIMARY KEY CLUSTERED
(
[id] ASC,
[DtStart] DESC,
[DtEnd] DESC
)
Table1 has a size of 400+GB with 6,500+ Million Records.
Table2 has a size of 30+GB with about 5 Million Records.
In table1 I need to process and insert 1.5 Million records.
In table2 I need to process and update/insert 1.1 Million records, this is done using a merge-when-matched query.
I need to be able to do these 2 processes without interruption of usage of these tables.
my code does the following
public void processFile(String fileLocation) throws IOException, SQLException{
try {
SqlClient sqlClient = SqlClient.from(DriverClassName.SQLSERVER, DriverConnectionString.barra());
Connection connection = sqlClient.getConnection();
PreparedStatement pstmt = connection.prepareStatement(getSql());
File file = new File(fileLocation);
try (BufferedReader br = new BufferedReader(new FileReader(file))) {
int lnproc = 0;
int batchCount = 0;
String line;
while (((line = br.readLine()) != null) {
String[] parts = line.split(",");
pstmt.clearParameters();
.....//Process parts and add them to the preparestatement
pstmt.addBatch();
batchCount++;
if(batchCount>=batchSize){
batchCount = 0;
try {
pstmt.executeBatch();
}catch (BatchUpdateException ex){
}
}
}
try {
pstmt.executeBatch();
}catch (BatchUpdateException ex){
}
}
connection.commit();
connection.close();
} catch (ClassNotFoundException | InstantiationException | IllegalAccessException e) {
}
}
because of the huge amount of records to insert in each table, i can generate dfferent locks on the tables that can afect the production environment.
I have done some research and I have multiple strategies I am thinking of using
create batches of max 5k inserts and commit them to prevent lock-escalation
committing after every record to prevent locks and
transactions logs.
I would like to pick the brains of the community about what you think could be the best strategy to use in this case.
And any recomendations you can make me.
After looking into it, the best solution i found for the following,
First as stated in the comments, I read the whole file loaded it into memory in a java structure.
After loading the file I then iterated the java stricter and started to load each record in the batch. At the same time I kept a counter on each item I add the batch.
When the counter hits 5000, I do a commit on the batch, reset the counter to 0 and keep adding to the following items I either hit 5000 again or reached the end of the iteration.
By doing this I am preventing MSSQL from creating a lock on the table, and the table can still be used by the other processes and applications.
I am making a javafx (intelliJ with java jdk 11) app using SQLite version 3.30.1 with DB Browser for SQLite.
I have a table called "beehives" and each beehive can have diseases (stored in the table "diseases").
this is my "beehives" table:
CREATE TABLE "beehives" (
"number" INTEGER NOT NULL,
"id_apiary" INTEGER NOT NULL DEFAULT -2,
"date" DATE,
"type" TEXT,
"favorite" BOOLEAN DEFAULT 'false',
PRIMARY KEY("number","id_apiary"),
FOREIGN KEY("id_apiary") REFERENCES "apiaries"("id") ON DELETE SET NULL
);
this is my "diseases" table:
CREATE TABLE "diseases" (
"id" INTEGER NOT NULL,
"id_beehive" INTEGER NOT NULL,
"id_apiary" INTEGER NOT NULL,
"disease" TEXT NOT NULL,
"treatment" TEXT NOT NULL,
"start_treat_date" DATE NOT NULL,
"end_treat_date" DATE,
PRIMARY KEY("id"),
FOREIGN KEY("id_beehive","id_apiary") REFERENCES "beehives"("number","id_apiary") ON UPDATE CASCADE
);
this is my "apiaries" table in case you need it:
CREATE TABLE "apiaries" (
"id" INTEGER NOT NULL,
"name" TEXT NOT NULL,
"address" TEXT,
PRIMARY KEY("id")
);
Everything works fine, but when I update a beehive (for example when I update the "number", which is the primary key in beehives table) the diseases does not update the number. The result is that the diseases get some kind of disconnected since the beehive change his "number" correctly, but the disease doesn't update it. There is no error message.
My java method that calls the update is:
public void updateBeehiveInDB(Beehives newBeehive,Beehives oldBeehive){
try {
s = "UPDATE beehives SET number=?, id_apiary=?, date=?, type=?, favorite=? WHERE number=? and id_apiary=? ";
preparedStatement = connection.prepareStatement(s);
preparedStatement.setInt(1, newBeehive.getNumber());
preparedStatement.setInt(2, newBeehive.getId_apiary());
preparedStatement.setDate(3, newBeehive.getDate());
preparedStatement.setString(4, newBeehive.getType());
preparedStatement.setBoolean(5, newBeehive.isFavorite());
preparedStatement.setInt(6, oldBeehive.getNumber());
preparedStatement.setInt(7,oldBeehive.getId_apiary());
int i = preparedStatement.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
}
}
I tried to check if foreign keys are "on" following the SQLite documentation here, but my English is not good enough and I am using DB Manager. So no idea how to check if this is on, or how to turn it on manually.
What can I do to update the diseases "id_beehives" when I update "number" on beehives table?
The problem was that i am using a composite foreign key and i need to implement it correctly on other tables too even if i was not using them yet in this new project. Was very hard to find the problem because intellij normally show all the SQL error messages, but in this case, it was not showing anything. But when i tried to do the SQL sentence manually in the DB Browser, there i got an error message and was able to fix it.
Also had to activate foreign key on the connection:
public Connection openConnection() {
try {
String dbPath = "jdbc:sqlite:resources/db/datab.db";
Class.forName("org.sqlite.JDBC");
SQLiteConfig config = new SQLiteConfig();
config.enforceForeignKeys(true);
connection = DriverManager.getConnection(dbPath,config.toProperties());
return connection;
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (SQLException e) {
e.printStackTrace();
}
return null;
}
(source: hostingpics.net)
how can I add a new customer or supplier?, last time I was using this class for one table "customer":
Code:
public int addnewcustomer(){
int idcust;
DBConnection eConnexion = new DBConnection();
try {
//Statement state = eConnexion.getConnexion().createStatement();
String sql = "INSERT INTO customer(name_cust, num_cust, adress_cust, city_cust , tel_cust, ref_cust)";
sql+= "VALUES (?,?,?,?,?,?)";
PreparedStatement insertQuery = eConnexion.getConnexion().prepareStatement(sql);
insertQuery.setString(1,Name_cust);
insertQuery.setString(2,Num_cust);
insertQuery.setString(3,Adress_cust);
insertQuery.setString(4,City_cust );
insertQuery.setString(5,Tel_cust);
insertQuery.setString(6,Ref_cust);
insertQuery.executeUpdate();
} catch (SQLException e) {
e.printStackTrace();
//JOptionPane.showMessageDialog(null,"Erreur:the addition is not performed with Succee!");
idcust = 0;
}
eConnexion.closeConnection();
idcust= Services.getLastInsertedId("customer","id_customer");
return idcust;
}
Currently, I attach all tables with new table "person". All tables now extend "person", I tried to add new customer with super variables "person" but I'm stuck in filling foreign key "id_pers FK".
First you need to persist a person into your database. After a successful(!) persist, you can query for the id the database used to insert the data. Most databases also provide a method to directly retrieve the used id after an insert.
After you have successfully persisted the person you can use the id for the foreign key column.
You may consider using a transaction for these actions, as there should never be a person persisted without a customer/employee whatever extending the persons data.
With a transaction, you can rollback the previous actions, for example if something goes wrong during the insertion of the customer.
I have a webapp that I'm trying to set up an SQLite database with. At this point, I have it very simple to build a foundation. At this point there are only two tables. One table uses a foreign key constraint to point to the other table. The problem I am having is when I try to insert data, I always receive the error Error processing SQL: could not execute statement due to a constraint failure (19 constraint failed) -- Code: 6. Code 6, apparently, means the table is locked. How can it be locked if I can successfully insert values into it? Confused...
My code...
I set up the tables with this:
// Create a system table, if it doesn't exist
db.transaction(function(tx){
tx.executeSql('CREATE TABLE IF NOT EXISTS system(systemID TEXT PRIMARY KEY, numZones INT NULL, numHeads INT NULL)', [],nullHandler,errorHandler);
},errorHandler, successCallBack);
// Create a work order table, if it doesn't exist
db.transaction(function(tx){
tx.executeSql('CREATE TABLE IF NOT EXISTS wo(woID_id TEXT PRIMARY KEY, woType TEXT NOT NULL, systemID_fk TEXT NOT NULL, FOREIGN KEY (systemID_fk) REFERENCES system(systemID))', [],nullHandler,errorHandler);
},errorHandler, successCallBack);
Presumably now I will have two tables, one having a field that points to the other table. I am pulling in a JSon feed, parsing it, and trying to put it into these two tables. Here's the code for that parsing:
function GetSystems(){
// First we see if there are credentials stored. If not, we don't try to retrieve the work orders.
db.transaction(function(transaction){
transaction.executeSql('SELECT * FROM Creds;',[], function(transaction, result) {
// If the user hasn't entered their creds yet, we create a new record, otherwise update the current one.
if(result.rows.length != 0){
var row;
db.transaction(function(transaction){
transaction.executeSql('SELECT * FROM Creds where id=1;',[],function(transaction, result) {
$.getJSON(baseURL + "get-wos/?callback=?", { username:result.rows.item(0).username, password:result.rows.item(0).password }, function(data) {
$.each(data, function(i, obj) {
db.transaction(function(transaction){
transaction.executeSql('INSERT INTO system(systemID, numZones, numHeads) VALUES (?, null, null)', [obj.systemID], nullHandler, errorHandler);
transaction.executeSql('INSERT INTO wo (woID, woType, systemID_fk) ' +
'VALUES ((SELECT systemID FROM system WHERE systemID = ' + obj.systemID + '), ?, ?)',
[obj.woID, obj.woType], nullHandler, errorHandler);
})
});
});
});
});
}
});
});
}
When I run the above code, the systems are loaded properly but the wos are not. My research into this issue tells me that I might be having a few issues. One suggestion is that there may already be data in the table. I fixed that by having a drop tables function to clear out the database entirely (I use Chrome dev tools to investigate the db).
So really, I'm not sure what I'm doing wrong. Is my syntax incorrect for inserting a foreign key constraint?
Solved
I stumbled upon this thread and #havexz mentioned the variable in the insertion didn't have quotes around it. I looked at mine and it had the same problem. Here's my edited insert to add a record with a foreign key. Notice the systemID='" instead of the original which was simply systemID=". I was missing the single quotes around my variable.
db.transaction(function(transaction){
transaction.executeSql("INSERT INTO wo (woID, woType, systemID_fk) " +
"VALUES (?, ?, (SELECT systemID FROM system WHERE systemID='" + obj.systemID + "'))", [obj.woID, obj.woType], nullHandler, errorHandler);
});
Is the order of the parameters correct for the INSERT wo? It looks like you're putting the systemID into the woID field rather than systemID_fk which has the constraint. Should it be:
transaction.executeSql('INSERT INTO wo (woID, woType, systemID_fk) ' +
'VALUES (?, ?, (SELECT systemID FROM system WHERE systemID = ' + obj.systemID + '))',
[obj.woID, obj.woType], nullHandler, errorHandler);
Here is the problem: At my company we have a large database that we want to perform some automated operations in it. To test that we got a small sample of that data about 6 10MB sized csv files. We want to use H2 to test the results of our program in it. H2 Seemed to work fine with our previous cvs though they were, at most, 1000 entries long. When it comes to any of our 10MB files the command
insert into myschema.mytable (select * from csvread('mycsvfile.csv'));
reports a failure because one of the registries is supposedly duplicated and offends our primary key constraints.
Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
insert into myschema.mytable (select * from csvread('src/test/resources/h2/data/mycsvfile.csv')) [23001-148] 23001/23001
Breaking the mycsvfile.csv into smaller pieces I was able to see that the problem starts to appear after about 10000 rows inserted(though the number varies depending on what data I used). I could however insert more than 10000 rows if I broke the file into pieces and then ran the command individually. But even if I manage to insert all that data manually I need an automated method to fill the database.
Since running the command would not give me the row that was causing the problem I guessed that the problem could be some cache in the csvread routine.
Then I created a small java program that could insert the data in the H2 database manually. No matter whether I batched the commands, closed and opened the connection for 1000 rows h2 reported that I was trying to duplicate an entry in the database.
org.h2.jdbc.JdbcSQLException: Unique index or primary key violation: "PRIMARY_KEY_6 ON MYSCHEMA.MYTABLE(DATETIME, LARGENUMBER, KIND)"; SQL statement:
INSERT INTO myschema.mytable VALUES ( '1997-10-06 01:00:00.0',25485116,1.600,0,18 ) [23001-148]
Doing a normal search for that registry using emacs I can find that the registry is not duplicated as the datetime column is unique in the whole dataset.
I cannot give that data for you to test since the company sells that information. But here is how my table definition is like.
create table myschema.mytable (
datetime timestamp,
largenumber numeric(8,0) references myschema.largenumber(largecode),
value numeric(8,3) not null,
flag numeric(1,0) references myschema.flag(flagcode),
kind smallint references myschema.kind(kindcode),
primary key (datetime, largenumber, kind)
);
This is how our csv looks like:
datetime,largenumber,value,flag,kind
1997-06-11 16:45:00.0,25485116,0.710,0,18
1997-06-11 17:00:00.0,25485116,0.000,0,18
1997-06-11 17:15:00.0,25485116,0.000,0,18
1997-06-11 17:30:00.0,25485116,0.000,0,18
And the java code that would fill our test database(forgive my ugly code, I got desperate :)
private static void insertFile(MyFile file) throws SQLException {
int updateCount = 0;
ResultSet rs = Csv.getInstance().read(file.toString(), null, null);
ResultSetMetaData meta = rs.getMetaData();
Connection conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
rs.next();
while (rs.next()) {
Statement stmt = conn.createStatement();
StringBuilder sb = new StringBuilder();
for (int i = 0; i < meta.getColumnCount(); i++) {
if (i == 0)
sb.append("'" + rs.getString(i + 1) + "'");
else
sb.append(rs.getString(i + 1));
sb.append(',');
}
updateCount++;
if (sb.length() > 0)
sb.deleteCharAt(sb.length() - 1);
stmt.execute(String.format(
"INSERT INTO myschema.mydatabase VALUES ( %s ) ",
sb.toString()));
if (updateCount == 1000) {
conn.close();
conn = DriverManager.getConnection(
"jdbc:h2:tcp://localhost/mytestdatabase", "sa", "pass");
updateCount = 0;
}
}
if (!conn.isClosed()) {
conn.close();
}
rs.close();
}
I'll be glad to provide more information if requested.
EDIT
#Randy I always check if the database is clean before running the command and in my java program I have a routine to delete all data from a file that fails to be inserted.
select * from myschema.mytable where largenumber = 25485116;
DATETIME LARGENUMBER VALUE FLAG KIND
(no rows, 8 ms)
The only thing that I can think of is that there is a trigger on the table that sets the timestamp to "now". Although that would not explain why you are successful with a few rows, it would explain why the primary key is being violated.