I have a rest service that take xml with 400_000 records, each record contain the following fields: code,type,price.
In DB (MySql )I have table named PriceData with 2_000_000 rows. The purpose of this rest is: select all PriceDatas from DB according to code,type from XML, replace price of each PriceData with price from XML, if there is no PriceData with this code,type create new with provided price.
Now it work as : select one PriceData from DB accroding to first record from XML, set new price or create new PriceData, save PriceData and these steps repeats 400_000 times.(It takes about 5 minutes)
I want to speed up this process.
First try:
Select 1000 elements step by step from PriceData, and when all elements will be selected update them:
Code:
private void updateAll(final List<XmlData> prices/*data from xml*/) {
int end= 1000;
int begin= 0;
final List<PriceData> models = new ArrayList<>();
while(end != prices.size() || begin !=end){
models.addAll(dao.findByPrices(prices.subList(begin,end)));
begin = end;
end +=1000;
}
final Map<String,XmlData> xmlData= prices.stream()
.collect(Collectors.toMap(this::keyForPriceDate,e->e));
final Map<String,PriceData> modelMap = models.stream()
.collect(Collectors.toMap(this::keyForRowModel,e->e));
final List<PriceData> modelsToSave = new ArrayList<>();
for(final String key : xmlData.keySet()){
final XmlData price = xmlData.get(key);
PriceData model = modelMap.get(key);
if(model == null){
model = onEmptyPriceData(price);
}
model.setPrice(price.getPrice());
modelsToSave.add(model);
}
modelService.saveAll(modelsToSave);
}
I convert two lists to maps to know does PriceData exist (keys for xmlData and modelMap created as (code+type))
findByPrices method create query in following format
select * from PriceData where (code =123 and type ='qwe') or (...)//and this `Or` repeats 1000 times
Now it takes 2 minutes.
Second try:
Select all PriceData from db (2 millions)
and use the algorithm above
It takes 3 minutes. First try is better but in future my rest can take 500_000 and I want to know which try will be better in this scenario or maybe there is the better way to do this task.
My select method
public List<PriceData> findBy(final List<XmlData> selectData) {
final StringBuilder query = new StringBuilder("SELECT * from PriceData ");
query.append("WHERE \n");
final Iterator<PriceRowSelectData> selectDataIterator = selectData.iterator();
while(selectDataIterator.hasNext()){
final PriceRowSelectData data = selectDataIterator.next();
query.append("( \n")
.append("productCode = "+ data.getProductId()+" \n")
.append(" AND type = "+ data.getPriceind()+" \n")
.append(" ) \n");
if(selectDataIterator.hasNext()){
query.append("OR \n");
}
}
final SearchResult<PriceRowModel> searchRes = search(query.toString());
/*
Here i use custom mapper that map list of result to my object
*/
return searchRes.getResult();
}
You should use the MySQL INSERT ... ON DUPLICATE KEY UPDATE statement, combined with JDBC batch processing. This of course assumes that code,type is the primary key, or at least a unique index.
private void updateAll(final List<XmlData> prices) throws SQLException {
String sql = "INSERT INTO PriceData (code, type, price)" +
" VALUES (?,?,?)" +
" ON DUPLICATE KEY" +
" UPDATE price = ?";
try (PreparedStatement stmt = this.conn.prepareStatement(sql)) {
int batchSize = 0;
for (XmlData price : prices) {
if (batchSize == 1000) { // flush batch every 1000
stmt.executeBatch();
batchSize = 0;
}
stmt.setInt (1, price.getCode());
stmt.setString (2, price.getType());
stmt.setBigDecimal(3, price.getPrice());
stmt.setBigDecimal(4, price.getPrice());
stmt.addBatch();
batchSize++;
}
if (batchSize != 0)
stmt.executeBatch();
}
}
You can twiddle the batch size, but not flushing will use a lot of memory. I think 1000 statements per batch is good, but I have no numbers backing that.
Related
I have a List<String> of categories and for each category, I want to add them to my WHERE clause by combining with AND operator like: SELECT question_id FROM question WHERE category = categ1 AND category = categ2 AND category = ...
Since the size of the categories list is changing, I cannot do something like this:
String sql = "SELECT question_id FROM question WHERE category = ? AND category = ?";
jdbcTemplate.query(sql, stringMapper, "categ1", "categ2");
How can I achieve what I want?
Either check if JDBC Template from Spring handle that for you using a syntax which could be something like (from the doc, I don't think it does)
SELECT question_id FROM question WHERE category in (?...)
Or write your own query with the problems that may arise:
List<Object> parameters = new ArrayList<>(categories.size());
StringBuilder sb = new StringBuilde("SELECT question_id FROM question WHERE 1=1");
if (!categories.isEmpty()) {
if (categories.size() == 1) {
sb.append(" and category = ?");
} else {
sb.append(" and category in ");
sb.append(categories.stream()
.map(ignored -> "?")
.collect(joining(", ", "(", ")")));
sb.append(")");
}
parameters.addAll(categories);
}
Object[] paramArray = parameters.toArray();
jdbcTemplate.query(sb.toString(), stringMapper, paramArray);
Notes:
some security/quality tool may report SQL issues because you are writing a dynamic SQL.
Oracle put a limit on 1000 elements per IN. You would have to partition categories per group of 1000 (or less).
I used a stream() in a more or less strange fashion in order to generate the "?". If you use commons-lang3, you can replace it by "(" + StringUtils.repeat("?", ", ", categories.size()) + ")" (the example in the javadoc was probably done with this kind of use).
if you only have category as single criteria, you may probably remove the 1=1 as well as the and.
I believe this may work for you:
// The SQL Query
String sql = "SELECT question_id FROM question";
// Create the WHERE clause based on the number of items in List...
StringBuilder whereClause = new StringBuilder(" WHERE ");
StringBuilder ps = new StringBuilder("");
for (int i = 0; i < categories.size(); i++) {
if (!ps.toString().isEmpty()) {
ps.append(" AND ");
}
ps.append("category = ?");
}
whereClause.append(ps.toString()).append(";");
//Append the WHERE clause string to the SQL query string
sql = sql + whereClause.toString();
//System.out.println(sql);
/* Convert the categories List to an Object[] Array so as to
pass in as varArgs to the jdbcTemplate.query() method. */
Object[] psArgs = categories.toArray(new Object[categories.size()]);
jdbcTemplate.query(sql, stringMapper, psArgs);
I have a parent entity called OrderEvent and child entity called PreCondition. One OrderEvent can have many PreConditions(>= 200). I need to save 100000 OrderEvent + 100000 * 200 PreCondition. I used Repository.save(list Of OrderEvents) and save into DB for every 1000 records. It takes approx 30secs to insert 1000 OrderEvents.
It takes almost an hour to save all 100000 OrderEvents.
Is there any way to bring down below 2 mins?
Tried save entities method of repository
public void parseOrder(String path, String collectionName) throws ParseException {
BufferedReader reader;
Connection conn = (Connection) em.unwrap(java.sql.Connection.class);
System.out.println(conn);
try {
reader = new BufferedReader(new FileReader(
path));
String line = reader.readLine();
String jobNumber = line.substring(0, 7).trim();
String recordType = line.substring(7, 9).trim();
Integer len = line.length();
preId = 0L;
postId = 0L;
eventId = 0L;
OrderEvent orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);
Integer count = 1;
Integer batch = 0;
long startTime = System.nanoTime();
List<OrderEvent> list = new ArrayList<OrderEvent>();
while (line != null) {
line = reader.readLine();
if (line == null) {
continue;
}
jobNumber = line.substring(0, 7).trim();
recordType = line.substring(7, 9).trim();
len = line.length();
if (recordType.equals("0H")) {
count++;
batch++;
if (batch.equals(1000)) {
orderRepository.save(list);
list.clear();
long estimatedTime = System.nanoTime() - startTime;
System.out.println("Processed " + batch + " records in " + estimatedTime / 1_000_000_000. + " second(s).");
batch = 0;
startTime = System.nanoTime();
}
list.add(orderEvent);
//orderRepository.saveAndFlush(orderEvent);
orderEvent = this.paraseHeader(line,len,jobNumber,collectionName);
} else if (recordType.equals("2F")) {
this.paraseFeature(line,len,jobNumber,orderEvent);
}
}
reader.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private OrderEvent paraseHeader (String line,Integer len,String jobNumber,String collectionName) throws ParseException {
String model = line.substring(9, 16).trim();
String processDate = line.substring(len-11,len-3).trim();
String formattedProcessDate = processDate.substring(0,4) + "-" +
processDate.substring(4,6) +"-" + processDate.substring(6,8) + " 00:00:00";
//eventId++;
OrderEvent orderEvent = new OrderEvent(jobNumber,UUID.randomUUID().toString(),collectionName,
formatter.parse(formattedProcessDate));
// preId++;
//postId++;
orderEvent.fillPrecondition("Model", "Stimulus", "OP_EQ", model);
orderEvent.fillPostcondition("Add_Fact","Coded","Response","True");
return orderEvent;
}
private void paraseFeature (String line,Integer len, String jobNumber, OrderEvent orderEvent) {
// preId++;
String feature = line.substring(len-7,len).trim();
orderEvent.fillPrecondition("Feature", "Stimulus", "OP_EQ", feature);
}
This usually depends on the database setup e.g. what is the latency to the client, what are the indexes on the tables, how queries are locking the table and so on.
Make sure that you understand how much time is spent in network operations. It could be the limiting factor, especially if your database sits on the other side of the world.
First establish what is the latency between the client and the database server. If it's 10 ms than inserting this row by row would be: 100,000 * 200 * 10ms = 200000s ~ 56h. This is very slow so make sure you are using batch inserts with JDBC.
Sometimes the insertion process can be significantly speed up by creating a shadow table:
Create new tables that are identical to OrderEvents and PreCondition tables. Some RDBMS allow for CREATE TABLE ... AS SELECT ... FROM ... syntax.
Disable foreign keys and indexes on the shadow tables.
Bulk insert all the data.
Enable foreign keys and indexes on shadow tables. This will hopefully ensure that imported data was correct.
Insert from shadow tables into the actual tables e.g. by running INSERT INTO ... SELECT ... FROM ....
Delete shadow table.
However the best option would be to skip JDBC and switch to bulk load utility provided by your database e.g. Oracle DB has External Tables and SQL*Loader. These tools are specifically designed to ingest large quantities of data efficiently while JDBC is a general purpose interface.
In c# I can use SqlBulkCopy for this type of tasks.
Maybe in java there is an equivalent API..
Something like this: com.microsoft.sqlserver.jdbc.SQLServerBulkCopy
Something like that better to do using DB server BULK-processing operation.
Yes, it totally different process, but it will takes seconds. not even minutes.
Unfortunatelly HOWTO is very depended on SQL-Server
MS SQL: BULK INSERT: https://learn.microsoft.com/en-us/sql/t-sql/statements/bulk-insert-transact-sql?view=sql-server-2017
PostgreSQL: COPY: https://www.postgresql.org/docs/current/sql-copy.html
I am trying to create an ordering application in Java where the user places an order, specifying the products. The combination of the order's timestamp and customer's id is unique. What I'm trying to do is to check for new orders every few seconds(or milliseconds), fetch the new orders and then create the corresponding objects. I have an ArrayList of orders in which, each Order has an ArrayList of Products, a timestamp and a customer id.
I send queries to the DB every 700ms in order to retrieve the new orders and create the Objects.
My problem is that sometimes, the query doesn't return the new order, that is, the new entries in the DB. I tried various values for the milliseconds but if it's to small, the program creates two or even more Order objects. I want to exactly one order Object for every new entry.
Here's the code:
DBConnect db = new DBConnect();
ScheduledExecutorService executorService =
Executors.newSingleThreadScheduledExecutor();
String today = new SimpleDateFormat("YYYY-MM-dd").format(new Date());
today = today + " 00:00:00";
ArrayList<Order> orders = new ArrayList<Order>();
db.connect();
executorService.scheduleAtFixedRate(new Runnable()
{
#Override
public void run()
{
String now = new SimpleDateFormat("YYYY-MM-dd HH:mm:ss").format(new Date());
try
{
String query = "SELECT DISTINCT last_edit, customer_id, employee_username, closed\r\n" +
"FROM orders\r\n" +
"WHERE last_edit = '" + now + "' ORDER BY last_edit";
ResultSet rs = db.getStatement().executeQuery(query);
while(rs.next())
orders.add(new Order(rs.getString("last_edit"), rs.getString("customer_id"), rs.getString("employee_username"), rs.getString("closed")));
}
catch(Exception ex)
{
System.out.println(ex);
}
try
{
String query = "SELECT last_edit, product.id AS product_id,\r\n" +
"quantity_weight,\r\n" +
"orders.price\r\n" +
"FROM customer, product, orders\r\n" +
"WHERE orders.customer_id = customer.id\r\n" +
"AND orders.product_id = product.id\r\n" +
"AND last_edit = '" + now + "' ORDER BY last_edit";
ResultSet rs = db.getStatement().executeQuery(query);
for(Order order: orders)
{
while(rs.next())
if(order.getLast_edit().equals(rs.getString("last_edit")))
{
order.getProducts().add(new Product(rs.getString("product_id"), rs.getString("quantity_weight"), rs.getString("orders.price")));
System.out.println(order.getLast_edit());
}
rs.beforeFirst();
}
}
catch (Exception ex)
{
System.out.println(ex);
}
}
}, 0, 700, TimeUnit.MILLISECONDS);
Why are you using equality = insteed of >= condition is a mistery to me. Besides that you relay on system timestamp insteed of timestamp of last newest order. In edge cases multiple orders can have the same timestamp and this could get you it a trouble. I assume that you ahve autoincrement primary key set in your table, so you should rely on id (your pk), that is guaranteed to be unique, insteed of timestamp. Your process would look like this:
Fetch list of all orders - process them and take max PK from selected orders
SELECT (whatever) FROM (somewhere) WHERE order.id > lastSavedId
Process new orders, save max id and wait whatever you want
Go to 2. repeat
I need to perfom data base insertion in my spring boot webapp. What I have so far is a list of sport competitions and there respective informations.
one competition can hold at different time the same day.
Fist thing annoying me is how do I store this times is DB. Does a new table needed here ? (I created a date table)
I searched around and I still don't figure out how to combine the insertion of
competition info and its dates in my insert function at same time.
My insert function need some works, I need somme help.
this answer is good, but it's did not fill my requirements
My data base schema:
CREATE TABLE competition (
competition_id integer PRIMARY KEY,
nom varchar(128) NOT NULL,
);
CREATE TABLE date (
id integer PRIMARY KEY,
date_time timestamptz,
competition_id integer REFERENCES competition (competition_id)
);
Json data:
{
"id": "420",
"name": "SOCCER",
"dates": [
"2016-05-12T03:00:00.000Z"
"2016-05-12T04:00:00.000Z"
"2016-05-12T05:00:00.000Z"
]
},
{
"id": "220",
"name": "BASKETBALL",
"dates": [
"2016-05-12T03:00:00.000Z"
"2016-05-12T04:00:00.000Z"
]
}
My competition Class:
public class Competition{
private int id;
private String name;
private String[] dates;
// setters ... getters
}
Function to insert data :
private static final String INSERT_STMT =
" insert into competition (id, name)"
+ " values (?, ?)"
;
public int insert(Competition competition) {
return jdbcTemplate.update(conn -> {
PreparedStatement ps = conn.prepareStatement(INSERT_STMT);
ps.setInt(1, competition.getId());
ps.setString(2, competition.getName());
return ps;
});
// insert also in date table ???
}
At first if you need data consistency then you should wrap your insert statements with transaction. For insert data into multiple tables you should execute multiple insert statements, the same as you doing with sql. If you need return updated rows count you can create wrapper class where you can store it and return it.
First I would make the ID in the Date-Table Auto-Increment, so you don't have to give an ID for every date and go with this Base-Query:
private static final String INSERT_DATES = "INSERT INTO date (date_time, competition_id) VALUES ";
And then build the Statement like this:
public int insert(Competition competition) {
// All local variables must be final because the lambdas will be executed at a undefined time
final int id = competition.getId();
final String name = competition.getName();
final String[] dates = competition.getDates();
final String dateValueStr = String.join(", ", Collections.nCopies(dates.length, "(?, ?)"));
// Execute Updates
int updatedRows1 = jdbcTemplate.update(conn -> {
PreparedStatement ps = conn.prepareStatement(INSERT_STMT);
ps.setInt(1, id);
ps.setString(2, name);
return ps;
});
if (updatedRows1 < 1)
{
// Something went wrong
return -1;
}
int updatedRows2 = jdbcTemplate.update(conn -> {
PreparedStatement ps = conn.prepareStatement(INSERT_DATES + dateValueStr);
int idx = 1;
for (String date : dates)
{
ps.setString(idx, date); // date_time
idx++;
ps.setInt(idx, competitionID); // competition_id
idx++;
}
return ps;
});
if (updatedRows2 < 1)
{
// Something went wrong
// Rollback the first insert, idk how its made
return -1;
}
return updatedRows1 + updatedRows2;
}
I'm creating a program that outputs data from a database into a pie chart and a table. I'm 'trying' to use the model view controller design. My problem is that the result set i am making will have multiple values which i need to extract out, but i don't know how to store them. My code at the moment is:
PreparedStatement returnDealerData = con.prepareStatement
("SELECT * FROM project.standard_dealer_link WHERE DealerID = "+ this.DealerID +
"and Quarter =" + this.Quarter + "and Year ="+ this.Year + ";");
ResultSet dealerRS = returnDealerData.executeQuery();
while (dealerRS.next()) {
this.DealerID = dealerRS.getInt("DealerID");
this.StandardID = dealerRS.getInt("StandardID");
this.Achieved = dealerRS.getString("Achieved");
this.NSCWaiver = dealerRS.getString("NSC_Waiver");
this.Override = dealerRS.getString("Override");
this.Quarter = dealerRS.getInt("Quarter");
this.Year = dealerRS.getInt("Year");
}
The data that changes will be StandardID through to Override. So what i need to do is for each iteration i need to store the values somehow so i can then put it into the pie chart and table.
I think what is tripping me up is that i used this method for a result set that only contained one set of values.
any help it much appreciated :).
Please iterate the values and store each row into the separate data object. So, we need to construct the List of objects.
List<DataObject> list = new ArrayList<DataObject>();
while (dealerRS.next()) {
DataObject obj = new DataObject();
obj.setDealerID(dealerRS.getInt("DealerID"));
obj.setStandardID(dealerRS.getInt("StandardID"));
obj.setAchieved(dealerRS.getString("Achieved"));
obj.setNSCWaiver(dealerRS.getString("NSC_Waiver"));
obj.setOverride(dealerRS.getString("Override"));
obj.setQuarter(dealerRS.getInt("Quarter"));
obj.setYear(dealerRS.getInt("Year"));
list.add(obj);
}