How to ensure that a ResultSet includes rows for "missing" observations - java

I have a ResultSet table shown below:
+------------+--------------------+--------------------+---------+-----------------------+
| test_date | upload_kbps | download_kbps | latency | network_operator_name |
+------------+--------------------+--------------------+---------+-----------------------+
| 2017-04-02 | 19.12741903076923 | 44.614721153846155 | 32.1250 | Alcatel |
| 2017-03-31 | 18.30683616557377 | 44.294387978142076 | 34.7432 | Alcatel |
| 2017-03-31 | 20.643555595555555 | 50.99801587301587 | 32.1640 | Vodafone |
I want to modify the ResultSet for further use where while I add a row into the ResultSet like so:
+------------+--------------------+--------------------+---------+-----------------------+
| test_date | upload_kbps | download_kbps | latency | network_operator_name |
+------------+--------------------+--------------------+---------+-----------------------+
| 2017-04-02 | 19.12741903076923 | 44.614721153846155 | 32.1250 | Alcatel |
| 2017-04-02 | 0 | 0 | 0 | Vodafone |
| 2017-03-31 | 18.30683616557377 | 44.294387978142076 | 34.7432 | Alcatel |
| 2017-03-31 | 20.643555595555555 | 50.99801587301587 | 32.1640 | Vodafone |
The logic behind this is to basically add a null row for that telecom where on that day, a speedtest was not done for it. For further clarification: the reason i need to do this is because the table in MySQL db does not record a row/entry for tests not done, hence the lack of a row in my original ResultSet, hence the need for me to add a 'NULL/0' row to reflect the lack of test for that telco, on that day. I don't have direct access to that database to modify the entries currently so this was the best I can think of.
Any idea how I can do this? Appreciate the help!

It sounds like you want to add rows to the ResultSet after the fact. AFAIK, we can't do that. Instead we need to construct our SQL query so that it will produce the "extra" rows we need.
So if we have a table named "test" and
SELECT * FROM test
ORDER BY test_date DESC, network_operator_name
produces
test_date upload_kbps download_kbps latency network_operator_name
---------- ---------------- ---------------- ------- ---------------------
2017-04-02 19.1274190307692 44.6147211538461 32.125 Alcatel
2017-03-31 18.3068361655737 44.294387978142 34.7432 Alcatel
2017-03-31 20.6435555955555 50.9980158730158 32.164 Vodafone
then we can start with a query to produce a row for every combination of test_date and network_operator_name
SELECT test_date, network_operator_name
FROM
(SELECT DISTINCT network_operator_name FROM test) unique_operators
CROSS JOIN
(SELECT DISTINCT test_date FROM test) unique_dates
which gives us
test_date network_operator_name
---------- ---------------------
2017-03-31 Alcatel
2017-03-31 Vodafone
2017-04-02 Alcatel
2017-04-02 Vodafone
Then we can LEFT JOIN that query with the actual table
SELECT
required_rows.test_date,
COALESCE(test.upload_kbps, 0) AS upload_kbps,
COALESCE(test.download_kbps, 0) AS download_kbps,
COALESCE(test.latency, 0) AS latency,
required_rows.network_operator_name
FROM
(
SELECT test_date, network_operator_name
FROM
(SELECT DISTINCT network_operator_name FROM test) unique_operators
CROSS JOIN
(SELECT DISTINCT test_date FROM test) unique_dates
) required_rows
LEFT JOIN
test
ON required_rows.test_date = test.test_date
AND required_rows.network_operator_name = test.network_operator_name
ORDER BY required_rows.test_date DESC, required_rows.network_operator_name
producing
test_date upload_kbps download_kbps latency network_operator_name
---------- ---------------- ---------------- ------- ---------------------
2017-04-02 19.1274190307692 44.6147211538461 32.125 Alcatel
2017-04-02 0 0 0 Vodafone
2017-03-31 18.3068361655737 44.294387978142 34.7432 Alcatel
2017-03-31 20.6435555955555 50.9980158730158 32.164 Vodafone

You can use NULLIF() function of mysql. In the function you provide the variable that you test for whether speedtest was done or not. Suppose, latency would have zero when speedtest is not done on that day.
Then NULLIF(latency,0) would be the value for the column latency in your insert command. And so for the other columns you need to fill with NULL for certain conditions. This function returns NULL if the 1st argument matches with second argument. Otherwise gives the actual value of the 1st argument.

Related

Database schema for leave report with changing structure

I have a requirement to show leave history and forecast. The data is received weekly in a report which I need to store in a table. I can use any DB supported by Java.
A sample of the data looks like this:
To be able to show past totals by department I need to store the data that comes out in the report each week.
How to store the forecast data, as the data structure of the report keeps changing. In the sample above the last 12 columns are the 12 months following the date the report was run. Next month the first column will be October etc.
I have create a fiddle here
I have considered just storing the last 4 weeks of reports (each report in a separate table) and inserting work group totals into a separate totals table where each row would represent a department and its totals.
If there is a better way - what sort of data structure/schema should I use?
I can think of 3 approaches:
You can add a date and forecast column and then get rid of the columns that are named after month/years. It's like transpose action in Excel. Additionally, since Dept, Leave_Balance, projected_balance_6m will not be in the same grain as the new columns, I'd create a new table. Example rows from the new table would be like:
+------------+-----------+----------+
| EmployeeID | YearMonth | Forecast |
+------------+-----------+----------+
| 456 | 201901 | 0 |
| 456 | 201902 | 5 |
+------------+-----------+----------+
Again in a new table, you can add a year column and make the forecast column names to resemble months. This wouldn't be continuous as your current solution but easier to handle in the BI software.
+------------+------+-----+-----+-----+-----+-----+-----+
| EmployeeID | Year | Jan | Feb | Mar | Apr | May | Jun |
+------------+------+-----+-----+-----+-----+-----+-----+
| 456 | 2019 | 0 | 0 | 0 | 0 | 0 | 0 |
| 456 | 2020 | 0 | 5 | 0 | 6 | 0 | 0 |
| 123 | 2020 | 0 | 0 | 1 | 0 | 0 | 0 |
+------------+------+-----+-----+-----+-----+-----+-----+
Other approach could be to rename columns relative to current date. Here, cur is SEPT19, cur+1 is OCT19 and so on. This solution will have the least impact but, drawback of this approach is, it is not clear when you last updated the table, and what cur value is actually. So, that information should be made available somewhere.
+-----+------+-------+---------------+--------------+-----+-------+-------+
| ID | Name | Dept | Leave_Balance | p_balance_6m | cur | cur+1 | cur+2 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
| 456 | Mary | Sales | 32.3 | 45.6 | 0 | 0 | 0 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
I like the first and second solutions more because they are more self contained. Your choice would depend on how much you want to rely on BI software (Tableau, Qlikview etc).

Define an entity via a custom SQL query in Hibernate 5.1

I'm working with a non normalized 3rd party database meaning I cannot change the schema. I'm trying to map the tables to JPA entities using Hibernate 5.1
There are 2 simple tables A and B:
| A_ID(pk) | | B_ID(pk) |
------------- -------------
| 1 | | 1 |
------------- | 2 |
-------------
Table C has a composite primary key and has a Many-To-One relation to Table A:
| A_ID(pk&fk) | QUANTITY(pk) | VALID_FROM(pk) |
---------------------------------------------------
| 1 | 1 | 2017-05-21 |
| 1 | 1 | 2018-01-01 |
| 1 | 2 | 2017-05-21 |
Table D has a composite primary key:
| A_ID(pk&fk) | QUANTITY(pk) | VALID_FROM(pk) | B_ID(pk&fk) |
--------------------------------------------------------------------
| 1 | 1 | 2018-01-21 | 1 |
| 1 | 2 | 2018-01-21 | 1 |
| 1 | 2 | 2018-05-01 | 2 |
the VALID_FROM column is not part of the join condition between the tables and can take up any value.
I'm trying to set up a relation between Table C and D but because of the VALID_FORM primary key component they cannot be modelled with Many-To-One. And since there is no join table they cannot be modelled with Many-To-Many either.
The best solution would be to create a view like
CREATE VIEW C_NORM AS
SELECT DISTINCT A_ID, QUANTITY
FROM TABLE_C;
which would produce view C_NORM:
| A_ID(pk&fk) | QUANTITY(pk) |
----------------------------------
| 1 | 1 |
| 1 | 2 |
Creating the C_NORM entity on this view could have
a One-To-Many relation with Table C
and another One-To-Many relation with Table D
but I cannot change the schema thus I cannot create a new view.
Is there any way to define an entity as a class with annotations that is basically based on a native SQL query rather than a view or table in the DB?
No that's not possible and it doesn't make sense.
Entities are for update, insert and delete. If you don't want to do any of these operations you shouldn't use entities.
You can use the #SqlResultSetMapping to map a result of a native query to a class
Query q = em.createNativeQuery(
"SELECT c.id, c.name, COUNT(o) as orderCount, AVG(o.price) AS avgOrder " +
"FROM Customer c " +
"JOIN Orders o ON o.cid = c.id " +
"GROUP BY c.id, c.name",
"CustomerDetailsResult");
#SqlResultSetMapping(name="CustomerDetailsResult",
classes={
#ConstructorResult(targetClass=com.acme.CustomerDetails.class,
columns={
#ColumnResult(name="id"),
#ColumnResult(name="name"),
#ColumnResult(name="orderCount"),
#ColumnResult(name="avgOrder", type=Double.class)})
})
Or alternatively use QLRM: https://github.com/simasch/qlrm

Join using Criteria API without foreign constraint

I'm really new to the Criteria API and I do not know how I can create a join query for the following situation. I already looked into the Oracle documentation of the Criteria API, but I could not make the examples work.
Say you have the following two tables in your database.
Item Export
---------------------------- ---------------------------
| ItemId | DateUpdated | | ItemId | ExportDate |
---------------------------- ---------------------------
| 1 | 02/02/2016 | | 1 | 02/02/2016 |
---------------------------- ---------------------------
| 2 | 03/02/2016 | | 2 | 03/02/2016 |
---------------------------- ---------------------------
| 3 | 06/02/2016 | | 3 | 05/02/2016 |
---------------------------- ---------------------------
| 4 | 07/02/2016 |
----------------------------
The corresponding entity classes are exact representations of the tables.
The query should join Item with Export, but there is no foreign key from Export.ItemId to Item.ItemId. Further, as a result the query should select Item with ItemId 3, because Export.ExportDate is before Item.DateAdded, and Item with ItemId 4, because the id is not in Export.
How can I do that?

Java Collections Optimization

I'm working on a java application that connects to database to fetch some records, processes each records and updates record back to the db table.
Following is my db schema (with sample data):
Table A: Requests
| REQUESTID | STATUS |
-------------------------
| 1 | PENDING|
| 2 | PENDING|
Table B: RequestDetails
| DETAILID | REQUESTID | STATUS | USERID |
---------------------------------------------
| 1 | 1 | PENDING | RA1234 |
| 2 | 1 | PENDING | YA7266 |
| 3 | 2 | PENDING | KAJ373 |
Following is my requirement:
1) Fetch Request along with pending status along with request data from both tables
I'm using below query for this:
SELECT Requests.REQUEST_ID as "RequestID",RequestDetails.USERID as "UserID",RequestDetails.DETAILID as "DetailID"
FROM Requests Requests
JOIN RequestDetails RequestDetails
ON (Requests.REQUESTID=RequestDetails.REQUESTID AND Requests.REQUEST_STATUS='PENDING' AND RequestDetails.STATUS='PENDING')
2) I'm using a HashMap<String, List<HashMap<String,String>> to store all the values
3) Iterate over each request and get details List<HashMap<String,String>>
Perform action for each detail record and update status
4) After all detail records are processed for a request, update status of the request on requests table
The end state should be something like this:
Table A: Requests
| REQUESTID | STATUS |
-------------------------
| 1 | PENDING|
| 2 | PENDING|
Table B: RequestDetails
| DETAILID | REQUESTID | STATUS | USERID |
---------------------------------------------
| 1 | 1 | PENDING | RA1234 |
| 2 | 1 | PENDING | YA7266 |
| 3 | 2 | PENDING | KAJ373 |
My question is: the collection I'm using is quite complex ("HashMap<String, List<HashMap<String,String>>"). Is there any other efficient way to do this?
Thank you,
Sash
I think you should use class something like,
Class RequestDetails{
int detailId;
int statusId;
String status;
String userId;
}
instead of map HashMap<String, List<HashMap<String,String>> you should use HashMap<String, RequestDetails>That has advantages like, code simplicity and also when you working with huge data and you need to modify string it is always better to avoid using String data-type as it is immutable and decrease your performance.
Hope this helps.
Above all that and what Darshan suggested, you must override the hashCode and equals method too, the reason is its the basic contract while dealing with HashMap and it will also increase the performance too.

Many SQL select queries in java with possibility of not finding

I currently got a performance issue with an application that uses many sql select.
The programming language is java and I'm using a mysql database. It contains about 10 million records.
What it needs to do is to find records in a database with zipcode and house number as parameters. When it does not find a record, it needs to do a query with only the zipcode and get the record with the lowest house number. When the zipcode cannot be found in the database the application needs to deal with this.
Thus the code for doing single queries looks like this:
Statement select = "select * from zipcode_addresses where zipcode = ? and houseNo =?";
ResultSet rs = select.executeQuery();
if(rs.next()) {
dealWithResult(rs);
}
else {
Statement alternativeSelect = "select * from zipcode_addresses where zipcode = ? group by houseNo having min(houseNo)";
ResultSet rs = alternativeSelect.executeQuery();
if(rs.next()) {
dealWithResult(rs);
} else {
System.err.println("Could not find zipcode :" + zipcode);
}
}
Is there a proper way of doing batch select queries which deals with data not being found?
Thanks!
Update
The table structure is the following:
+-----------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+-------------+------+-----+---------+-------+
| zipcode | varchar(6) | NO | PRI | NULL | |
| house_no | int(11) | NO | PRI | NULL | |
| sanddcode | varchar(45) | NO | | NULL | |
| depot | varchar(3) | NO | | NULL | |
| network_point | varchar(6) | NO | | NULL | |
| region | varchar(3) | NO | | NULL | |
| seq | int(11) | NO | | NULL | |
| cluster_id | varchar(1) | NO | | NULL | |
| strand_id | int(11) | NO | | NULL | |
| strand_props_id | int(11) | NO | | NULL | |
| version_id | int(11) | NO | PRI | NULL | |
+-----------------+-------------+------+-----+---------+-------+
Primary key on version id, zipcode and house_no
Index on zipcode and house_no and another index on zipcode, both using BTREE index.
The application might sometimes be used to do 1 million distinctive select queries at which point it just takes too long.
Your code snippet doesn't show how your statements are being prepared. If your statements are being called numerous times then you should take a look at the PreparedStatement object:
http://docs.oracle.com/javase/tutorial/jdbc/basics/prepared.html
Your statements can then be cached to reduce future overhead.
you could create a stored procedure with two parameters, and use the House Number as optional, or just let that the procedure finds if exists or don't.
A lot depends on the usage pattern. How many queries you run, how often there is a ZIP code miss, etc. First off, I would use PreparedStatements where posssible. I am not that familiar with MySQL, but they are usually cached and reused by the connection-database, that will help with performance. Next, If ZIP code misses were common, I would probably build an in memory cache of ZIP codes to short circuit doing 3 queries on a miss. After that, I might make a view that's ZIP + house number. Going further depends more on how your appliation works, but these things would help.
The 'group by' in your second SQL query is unnecessary and killing performance. For maximum performance, replace this select (the second one in your code) ...
select * from zipcode_addresses where zipcode = ?
group by houseNo having min(houseNo)
with this ...
select min(houseNo) from zipcode_addresses where zipcode = ?
Also, ensure you have an index on zipcode + houseNo (which it looks like you have - from the updated post).

Categories