Java Collections Optimization - java

I'm working on a java application that connects to database to fetch some records, processes each records and updates record back to the db table.
Following is my db schema (with sample data):
Table A: Requests
| REQUESTID | STATUS |
-------------------------
| 1 | PENDING|
| 2 | PENDING|
Table B: RequestDetails
| DETAILID | REQUESTID | STATUS | USERID |
---------------------------------------------
| 1 | 1 | PENDING | RA1234 |
| 2 | 1 | PENDING | YA7266 |
| 3 | 2 | PENDING | KAJ373 |
Following is my requirement:
1) Fetch Request along with pending status along with request data from both tables
I'm using below query for this:
SELECT Requests.REQUEST_ID as "RequestID",RequestDetails.USERID as "UserID",RequestDetails.DETAILID as "DetailID"
FROM Requests Requests
JOIN RequestDetails RequestDetails
ON (Requests.REQUESTID=RequestDetails.REQUESTID AND Requests.REQUEST_STATUS='PENDING' AND RequestDetails.STATUS='PENDING')
2) I'm using a HashMap<String, List<HashMap<String,String>> to store all the values
3) Iterate over each request and get details List<HashMap<String,String>>
Perform action for each detail record and update status
4) After all detail records are processed for a request, update status of the request on requests table
The end state should be something like this:
Table A: Requests
| REQUESTID | STATUS |
-------------------------
| 1 | PENDING|
| 2 | PENDING|
Table B: RequestDetails
| DETAILID | REQUESTID | STATUS | USERID |
---------------------------------------------
| 1 | 1 | PENDING | RA1234 |
| 2 | 1 | PENDING | YA7266 |
| 3 | 2 | PENDING | KAJ373 |
My question is: the collection I'm using is quite complex ("HashMap<String, List<HashMap<String,String>>"). Is there any other efficient way to do this?
Thank you,
Sash

I think you should use class something like,
Class RequestDetails{
int detailId;
int statusId;
String status;
String userId;
}
instead of map HashMap<String, List<HashMap<String,String>> you should use HashMap<String, RequestDetails>That has advantages like, code simplicity and also when you working with huge data and you need to modify string it is always better to avoid using String data-type as it is immutable and decrease your performance.
Hope this helps.

Above all that and what Darshan suggested, you must override the hashCode and equals method too, the reason is its the basic contract while dealing with HashMap and it will also increase the performance too.

Related

Database schema for leave report with changing structure

I have a requirement to show leave history and forecast. The data is received weekly in a report which I need to store in a table. I can use any DB supported by Java.
A sample of the data looks like this:
To be able to show past totals by department I need to store the data that comes out in the report each week.
How to store the forecast data, as the data structure of the report keeps changing. In the sample above the last 12 columns are the 12 months following the date the report was run. Next month the first column will be October etc.
I have create a fiddle here
I have considered just storing the last 4 weeks of reports (each report in a separate table) and inserting work group totals into a separate totals table where each row would represent a department and its totals.
If there is a better way - what sort of data structure/schema should I use?
I can think of 3 approaches:
You can add a date and forecast column and then get rid of the columns that are named after month/years. It's like transpose action in Excel. Additionally, since Dept, Leave_Balance, projected_balance_6m will not be in the same grain as the new columns, I'd create a new table. Example rows from the new table would be like:
+------------+-----------+----------+
| EmployeeID | YearMonth | Forecast |
+------------+-----------+----------+
| 456 | 201901 | 0 |
| 456 | 201902 | 5 |
+------------+-----------+----------+
Again in a new table, you can add a year column and make the forecast column names to resemble months. This wouldn't be continuous as your current solution but easier to handle in the BI software.
+------------+------+-----+-----+-----+-----+-----+-----+
| EmployeeID | Year | Jan | Feb | Mar | Apr | May | Jun |
+------------+------+-----+-----+-----+-----+-----+-----+
| 456 | 2019 | 0 | 0 | 0 | 0 | 0 | 0 |
| 456 | 2020 | 0 | 5 | 0 | 6 | 0 | 0 |
| 123 | 2020 | 0 | 0 | 1 | 0 | 0 | 0 |
+------------+------+-----+-----+-----+-----+-----+-----+
Other approach could be to rename columns relative to current date. Here, cur is SEPT19, cur+1 is OCT19 and so on. This solution will have the least impact but, drawback of this approach is, it is not clear when you last updated the table, and what cur value is actually. So, that information should be made available somewhere.
+-----+------+-------+---------------+--------------+-----+-------+-------+
| ID | Name | Dept | Leave_Balance | p_balance_6m | cur | cur+1 | cur+2 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
| 456 | Mary | Sales | 32.3 | 45.6 | 0 | 0 | 0 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
I like the first and second solutions more because they are more self contained. Your choice would depend on how much you want to rely on BI software (Tableau, Qlikview etc).

How do I run a spark sql aggregator cumulatively?

I am currently working on a project with spark datasets (in Java) where I have to create a new column derived from an accumulator run over all the previous rows.
I have been implementing this using a custom UserDefinedAggregationFunction over a Window from unboundedPreceding to currentRow.
This goes something like this:
df.withColumn("newColumn", customAccumulator
.apply(columnInputSeq)
.over(customWindowSpec));
However, I would really prefer to use a typed Dataset for type safety reasons and generally cleaner code. i.e: perform the same operation with an org.apache.spark.sql.expressions.Aggregator over a Dataset<CustomType>. The problem here is I have looked through all the documentation and can't work out how to make it behave in the same way as above (i.e. I can only get a final aggregate over the whole column rather than a cumulative state at each row).
Is what I am trying to do possible and if so, how?
Example added for clarity:
Initial table:
+-------+------+------+
| Index | Col1 | Col2 |
+-------+------+------+
| 1 | abc | def |
| 2 | ghi | jkl |
| 3 | mno | pqr |
| 4 | stu | vwx |
+-------+------+------+
Then with example aggregation operation:
First reverse the accumulator, prepend Col1 append Col2 and return this value, also setting it as the accumulator.
+-------+------+------+--------------------------+
| Index | Col1 | Col2 | Accumulator |
+-------+------+------+--------------------------+
| 1 | abc | def | abcdef |
| 2 | ghi | jkl | ghifedcbajkl |
| 3 | mno | pqr | mnolkjabcdefihgpqr |
| 4 | stu | vwx | sturpqghifedcbajklonmvwx |
+-------+------+------+--------------------------+
Using a UserDefinedAggregateFunction I have been able to produce this but with an Aggregator I can only get the last row.
You don't
My source for this is a friend who has been working on an identical problem to this and has now concluded it's impossible

Join using Criteria API without foreign constraint

I'm really new to the Criteria API and I do not know how I can create a join query for the following situation. I already looked into the Oracle documentation of the Criteria API, but I could not make the examples work.
Say you have the following two tables in your database.
Item Export
---------------------------- ---------------------------
| ItemId | DateUpdated | | ItemId | ExportDate |
---------------------------- ---------------------------
| 1 | 02/02/2016 | | 1 | 02/02/2016 |
---------------------------- ---------------------------
| 2 | 03/02/2016 | | 2 | 03/02/2016 |
---------------------------- ---------------------------
| 3 | 06/02/2016 | | 3 | 05/02/2016 |
---------------------------- ---------------------------
| 4 | 07/02/2016 |
----------------------------
The corresponding entity classes are exact representations of the tables.
The query should join Item with Export, but there is no foreign key from Export.ItemId to Item.ItemId. Further, as a result the query should select Item with ItemId 3, because Export.ExportDate is before Item.DateAdded, and Item with ItemId 4, because the id is not in Export.
How can I do that?

Create dynamic classes / objects to be included in a list

I have a xlsx file, that has some tabs with different data. I want to be able to save each row of a tab in a list. The first thing that comes to mind is a list of lists, but I was wondering if there is another way. I'd like to save that information in a object, with all its benefits, but can't think of a way to generate/create such diverse objects on the fly. The data in the xlsx is diverse and ideally the program is agnostic of any content.
So instead of e.g. create a list for each row, than put that list in another list for each tab and each tab in another list, I'd like to store the information that each row represents in a single object and just have a list of different objects.
A small graphic to visualize the problem :
+--------------------------------------------------------------------+
|LIST |
| |
| +------------------+ +------------------+ +-----------------+ |
| | Class1 | | Class2 | | Class3 | |
| |------------------| |------------------| |-----------------| |
| | var1 | | var1 | | var5 | |
| | var2 | | var2 | | var6 | |
|... | var3 | | | | var7 |...|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| +------------------+ +------------------+ +-----------------+ |
| |
+--------------------------------------------------------------------+
How about a generic class Row which will contain all the information in a row from your file. Then you simply create a list of Rows. Methods for the Row can allow you to get each column.
Without knowing more about the data, you will not be able to write classes to encapsulate it. You could "dynamically" create classes to create new source code. But then the question is, how would you use the new classes?
Well since you want to avoid a "list of lists" kind of solution there would be another way.
This might not be very efficient or fast but I don't have any experience with it, so maybe it isn't too bad. Here's the idea:
For each Row:
Use javassist to create as many fields as needed dynamically that contain each cell's information. Then create an instance of this class and store it in your list of rows. You could also add a field with information about this particular row (e.g. how many fields there are or their names or types or whatever you might need).
The number of fields or methods could also be determined using Reflection.
To get started with javassist there's a tutorial here.
Besides that I don't think there's much to do that does not involve some sort of List<List<SomeType>>

How do I selectively update columns in a table when using LOAD DATA INFILE?

I am trying to load data from a text file into a MySQL table, by calling MySQL's LOAD DATA INFILE from a Java process. This file can contain some data for the current date and also for previous days. The table can also contain data for previous dates. The problem is that some of the columns in the file for previous dates might have changed. But I don't want to update all of these columns but only want the latest values for some of the columns.
Example,
Table
+----+-------------+------+------+------+
| id | report_date | val1 | val2 | val3 |
+----+-------------+------+------+------+
| 1 | 2012-12-01 | 10 | 1 | 1 |
| 2 | 2012-12-02 | 20 | 2 | 2 |
| 3 | 2012-12-03 | 30 | 3 | 3 |
+----+-------------+------+------+------+
Data in Input file:
1|2012-12-01|10|1|1
2|2012-12-02|40|4|4
3|2012-12-03|40|4|4
4|2012-12-04|40|4|4
5|2012-12-05|50|5|5
Table after the load should look like
mysql> select * from load_infile_tests;
+----+-------------+------+------+------+
| id | report_date | val1 | val2 | val3 |
+----+-------------+------+------+------+
| 1 | 2012-12-01 | 10 | 1 | 1 |
| 2 | 2012-12-02 | 40 | 4 | 2 |
| 3 | 2012-12-03 | 40 | 4 | 3 |
| 4 | 2012-12-04 | 40 | 4 | 4 |
| 5 | 2012-12-05 | 50 | 5 | 5 |
+----+-------------+------+------+------+
5 rows in set (0.00 sec)
Note that column val3 values are not updated. Also I need to do this for large files as well, some files can be >300Megs or more, and so it needs to be a scalable solution.
Thanks,
Anirudha
It would be good to use LOAD DATA INFILE with REPLACE option, but in this case records will be dropped and added again, so old val3 values will be lost.
Try to load data into temporary table, then update your table from temp. table using INSERT ... SELECT/UPDATE or INSERT ... ON DUPLICATE KEY UPDATE statements.

Categories