I currently got a performance issue with an application that uses many sql select.
The programming language is java and I'm using a mysql database. It contains about 10 million records.
What it needs to do is to find records in a database with zipcode and house number as parameters. When it does not find a record, it needs to do a query with only the zipcode and get the record with the lowest house number. When the zipcode cannot be found in the database the application needs to deal with this.
Thus the code for doing single queries looks like this:
Statement select = "select * from zipcode_addresses where zipcode = ? and houseNo =?";
ResultSet rs = select.executeQuery();
if(rs.next()) {
dealWithResult(rs);
}
else {
Statement alternativeSelect = "select * from zipcode_addresses where zipcode = ? group by houseNo having min(houseNo)";
ResultSet rs = alternativeSelect.executeQuery();
if(rs.next()) {
dealWithResult(rs);
} else {
System.err.println("Could not find zipcode :" + zipcode);
}
}
Is there a proper way of doing batch select queries which deals with data not being found?
Thanks!
Update
The table structure is the following:
+-----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| zipcode | varchar(6) | NO | PRI | NULL | |
| house_no | int(11) | NO | PRI | NULL | |
| sanddcode | varchar(45) | NO | | NULL | |
| depot | varchar(3) | NO | | NULL | |
| network_point | varchar(6) | NO | | NULL | |
| region | varchar(3) | NO | | NULL | |
| seq | int(11) | NO | | NULL | |
| cluster_id | varchar(1) | NO | | NULL | |
| strand_id | int(11) | NO | | NULL | |
| strand_props_id | int(11) | NO | | NULL | |
| version_id | int(11) | NO | PRI | NULL | |
+-----------------+-------------+------+-----+---------+-------+
Primary key on version id, zipcode and house_no
Index on zipcode and house_no and another index on zipcode, both using BTREE index.
The application might sometimes be used to do 1 million distinctive select queries at which point it just takes too long.
Your code snippet doesn't show how your statements are being prepared. If your statements are being called numerous times then you should take a look at the PreparedStatement object:
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Your statements can then be cached to reduce future overhead.
you could create a stored procedure with two parameters, and use the House Number as optional, or just let that the procedure finds if exists or don't.
A lot depends on the usage pattern. How many queries you run, how often there is a ZIP code miss, etc. First off, I would use PreparedStatements where posssible. I am not that familiar with MySQL, but they are usually cached and reused by the connection-database, that will help with performance. Next, If ZIP code misses were common, I would probably build an in memory cache of ZIP codes to short circuit doing 3 queries on a miss. After that, I might make a view that's ZIP + house number. Going further depends more on how your appliation works, but these things would help.
The 'group by' in your second SQL query is unnecessary and killing performance. For maximum performance, replace this select (the second one in your code) ...
select * from zipcode_addresses where zipcode = ?
group by houseNo having min(houseNo)
with this ...
select min(houseNo) from zipcode_addresses where zipcode = ?
Also, ensure you have an index on zipcode + houseNo (which it looks like you have - from the updated post).
Related
I am wondering is it possible to select only the default value of empty column?
I have absolutely empty table and I want just to select one of the columns default value - it is important for my JAVA app which is filling the table.
Thanks.
You can get the default from the INFORMATION_SCHEMA.COLUMNS
select COLUMN_DEFAULT
from INFORMATION_SCHEMA.COLUMNS
where TABLE_SCHEMA='your_db' and TABLE_NAME='your_table' and COLUMN_NAME='your_column'
You can define a default value for a column when you create a table, if you just want MySQL to insert it automatically:
create table my_table (i INT DEFAULT 1);
But if you mean you want the default value which is stored in the DB dictionary, you can get it by this query:
SELECT Column_Default
FROM Information_Schema.Columns
WHERE Table_Schema = 'yourSchema'
AND Table_Name = 'yourTableName'
AND Column_Name = 'yourColumnName'
I can only think of two ways:
Inserting a row
Insert a row without specifying a value for that column
Select the column from that row; it will have the default value of the column
Delete the row
...probably all in a transaction so nothing else sees it.
Using describe (explain)
The describe command (aka explain) describes objects in the system, including tables. So if you do explain YourTable, you'll get back information about the table, including its default values.
Here's an example from that linked documentation:
mysql> DESCRIBE City;
+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
| Field | Type | Null | Key | Default | Extra |
+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
| Id | int(11) | NO | PRI | NULL | auto_increment |
| Name | char(35) | NO | | | |
| Country | char(3) | NO | UNI | | |
| District | char(20) | YES | MUL | | |
| Population | int(11) | NO | | 0 | |
+−−−−−−−−−−−−+−−−−−−−−−−+−−−−−−+−−−−−+−−−−−−−−−+−−−−−−−−−−−−−−−−+
So you can extract the default from the Default column in the returned rows.
Ah, of course, there's a third way, see slaakso's answer for it.
I'm working with a non normalized 3rd party database meaning I cannot change the schema. I'm trying to map the tables to JPA entities using Hibernate 5.1
There are 2 simple tables A and B:
| A_ID(pk) | | B_ID(pk) |
------------- -------------
| 1 | | 1 |
------------- | 2 |
-------------
Table C has a composite primary key and has a Many-To-One relation to Table A:
| A_ID(pk&fk) | QUANTITY(pk) | VALID_FROM(pk) |
---------------------------------------------------
| 1 | 1 | 2017-05-21 |
| 1 | 1 | 2018-01-01 |
| 1 | 2 | 2017-05-21 |
Table D has a composite primary key:
| A_ID(pk&fk) | QUANTITY(pk) | VALID_FROM(pk) | B_ID(pk&fk) |
--------------------------------------------------------------------
| 1 | 1 | 2018-01-21 | 1 |
| 1 | 2 | 2018-01-21 | 1 |
| 1 | 2 | 2018-05-01 | 2 |
the VALID_FROM column is not part of the join condition between the tables and can take up any value.
I'm trying to set up a relation between Table C and D but because of the VALID_FORM primary key component they cannot be modelled with Many-To-One. And since there is no join table they cannot be modelled with Many-To-Many either.
The best solution would be to create a view like
CREATE VIEW C_NORM AS
SELECT DISTINCT A_ID, QUANTITY
FROM TABLE_C;
which would produce view C_NORM:
| A_ID(pk&fk) | QUANTITY(pk) |
----------------------------------
| 1 | 1 |
| 1 | 2 |
Creating the C_NORM entity on this view could have
a One-To-Many relation with Table C
and another One-To-Many relation with Table D
but I cannot change the schema thus I cannot create a new view.
Is there any way to define an entity as a class with annotations that is basically based on a native SQL query rather than a view or table in the DB?
No that's not possible and it doesn't make sense.
Entities are for update, insert and delete. If you don't want to do any of these operations you shouldn't use entities.
You can use the #SqlResultSetMapping to map a result of a native query to a class
Query q = em.createNativeQuery(
"SELECT c.id, c.name, COUNT(o) as orderCount, AVG(o.price) AS avgOrder " +
"FROM Customer c " +
"JOIN Orders o ON o.cid = c.id " +
"GROUP BY c.id, c.name",
"CustomerDetailsResult");
#SqlResultSetMapping(name="CustomerDetailsResult",
classes={
#ConstructorResult(targetClass=com.acme.CustomerDetails.class,
columns={
#ColumnResult(name="id"),
#ColumnResult(name="name"),
#ColumnResult(name="orderCount"),
#ColumnResult(name="avgOrder", type=Double.class)})
})
Or alternatively use QLRM: https://github.com/simasch/qlrm
I have a ResultSet table shown below:
+------------+--------------------+--------------------+---------+-----------------------+
| test_date | upload_kbps | download_kbps | latency | network_operator_name |
+------------+--------------------+--------------------+---------+-----------------------+
| 2017-04-02 | 19.12741903076923 | 44.614721153846155 | 32.1250 | Alcatel |
| 2017-03-31 | 18.30683616557377 | 44.294387978142076 | 34.7432 | Alcatel |
| 2017-03-31 | 20.643555595555555 | 50.99801587301587 | 32.1640 | Vodafone |
I want to modify the ResultSet for further use where while I add a row into the ResultSet like so:
+------------+--------------------+--------------------+---------+-----------------------+
| test_date | upload_kbps | download_kbps | latency | network_operator_name |
+------------+--------------------+--------------------+---------+-----------------------+
| 2017-04-02 | 19.12741903076923 | 44.614721153846155 | 32.1250 | Alcatel |
| 2017-04-02 | 0 | 0 | 0 | Vodafone |
| 2017-03-31 | 18.30683616557377 | 44.294387978142076 | 34.7432 | Alcatel |
| 2017-03-31 | 20.643555595555555 | 50.99801587301587 | 32.1640 | Vodafone |
The logic behind this is to basically add a null row for that telecom where on that day, a speedtest was not done for it. For further clarification: the reason i need to do this is because the table in MySQL db does not record a row/entry for tests not done, hence the lack of a row in my original ResultSet, hence the need for me to add a 'NULL/0' row to reflect the lack of test for that telco, on that day. I don't have direct access to that database to modify the entries currently so this was the best I can think of.
Any idea how I can do this? Appreciate the help!
It sounds like you want to add rows to the ResultSet after the fact. AFAIK, we can't do that. Instead we need to construct our SQL query so that it will produce the "extra" rows we need.
So if we have a table named "test" and
SELECT * FROM test
ORDER BY test_date DESC, network_operator_name
produces
test_date upload_kbps download_kbps latency network_operator_name
---------- ---------------- ---------------- ------- ---------------------
2017-04-02 19.1274190307692 44.6147211538461 32.125 Alcatel
2017-03-31 18.3068361655737 44.294387978142 34.7432 Alcatel
2017-03-31 20.6435555955555 50.9980158730158 32.164 Vodafone
then we can start with a query to produce a row for every combination of test_date and network_operator_name
SELECT test_date, network_operator_name
FROM
(SELECT DISTINCT network_operator_name FROM test) unique_operators
CROSS JOIN
(SELECT DISTINCT test_date FROM test) unique_dates
which gives us
test_date network_operator_name
---------- ---------------------
2017-03-31 Alcatel
2017-03-31 Vodafone
2017-04-02 Alcatel
2017-04-02 Vodafone
Then we can LEFT JOIN that query with the actual table
SELECT
required_rows.test_date,
COALESCE(test.upload_kbps, 0) AS upload_kbps,
COALESCE(test.download_kbps, 0) AS download_kbps,
COALESCE(test.latency, 0) AS latency,
required_rows.network_operator_name
FROM
(
SELECT test_date, network_operator_name
FROM
(SELECT DISTINCT network_operator_name FROM test) unique_operators
CROSS JOIN
(SELECT DISTINCT test_date FROM test) unique_dates
) required_rows
LEFT JOIN
test
ON required_rows.test_date = test.test_date
AND required_rows.network_operator_name = test.network_operator_name
ORDER BY required_rows.test_date DESC, required_rows.network_operator_name
producing
test_date upload_kbps download_kbps latency network_operator_name
---------- ---------------- ---------------- ------- ---------------------
2017-04-02 19.1274190307692 44.6147211538461 32.125 Alcatel
2017-04-02 0 0 0 Vodafone
2017-03-31 18.3068361655737 44.294387978142 34.7432 Alcatel
2017-03-31 20.6435555955555 50.9980158730158 32.164 Vodafone
You can use NULLIF() function of mysql. In the function you provide the variable that you test for whether speedtest was done or not. Suppose, latency would have zero when speedtest is not done on that day.
Then NULLIF(latency,0) would be the value for the column latency in your insert command. And so for the other columns you need to fill with NULL for certain conditions. This function returns NULL if the 1st argument matches with second argument. Otherwise gives the actual value of the 1st argument.
How to get the Name of the DB field causing ConstraintViolationException while inserting in to Database in hibernate.
I have the Table Like
mysql> desc Mytable;
+-------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+-------------+------+-----+---------+----------------+
| id | bigint(11) | NO | PRI | NULL | auto_increment |
| name | varchar(20) | YES | UNI | NULL | |
| city | varchar(20) | YES | UNI | NULL | |
+-------+-------------+------+-----+---------+----------------+
And records inthe table are
mysql> select * from Mytable;
+----+--------+-------+
| id | name | city |
+----+--------+-------+
| 1 | SATISH | BLORE |
+----+--------+-------+
1 row in set (0.00 sec)
Now, im trying to insert
"RAMESH","BLORE" through hibernate.
It is throwing ConstraintViolationException due to "BLORE" (CITY) already Exist.
if im trying to insert.
"SATISH","MLORE" through hibernate
It is throwing ConstraintViolationException due to "SATISH" (NAME) already Exist.
My question is
how to get fieldName who is causing the exception ConstraintViolationException through Hibernate.
Since there might be other constraints that could be violated (e.g. combined keys), you only have the name of the constraint that is violated, which in your case might just be the column name (however, I'm not entirely sure about that). You can get the name of the violated constraint by calling getConstraintName() on the ConstraintViolationException.
I want to store millions of time series entries (long time, double value) with Java. (Our monitoring system is currently storing every entry in a large mySQL table but performance is very bad.)
Are there time series databases implemented in java out there?
checkout http://opentsdb.net/ as used by StumbleUpon?
checkout http://square.github.com/cube/ as used by square
I hope to see additional suggestions in this thread.
The performance was bad because of wrong database design. I am using mysql and the table had this layout:
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
| fk_category | smallint(6) | NO | PRI | NULL | |
| method | enum('min','max','avg','sum','none') | NO | PRI | none | |
| time | timestamp | NO | PRI | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
| value | float | NO | | NULL | |
| accuracy | tinyint(1) | NO | | 0 | |
+-------------+--------------------------------------+------+-----+-------------------+-----------------------------+
My fault was an inapproriate index. After adding a multi column primary key all my queries are lightning fast:
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| job | 0 | PRIMARY | 1 | fk_category | A | 18 | NULL | NULL | | BTREE | | |
| job | 0 | PRIMARY | 2 | method | A | 18 | NULL | NULL | | BTREE | | |
| job | 0 | PRIMARY | 3 | time | A | 452509710 | NULL | NULL | | BTREE | | |
+-------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
Thanks for all you answers!
You can take a look at KDB. It's primarily used by financial companies to fetch market time series data.
What do you need to do with the data and when?
If you are just saving the values for later, a plain text file might do nicely, and then later upload to a database.