I have a requirement to show leave history and forecast. The data is received weekly in a report which I need to store in a table. I can use any DB supported by Java.
A sample of the data looks like this:
To be able to show past totals by department I need to store the data that comes out in the report each week.
How to store the forecast data, as the data structure of the report keeps changing. In the sample above the last 12 columns are the 12 months following the date the report was run. Next month the first column will be October etc.
I have create a fiddle here
I have considered just storing the last 4 weeks of reports (each report in a separate table) and inserting work group totals into a separate totals table where each row would represent a department and its totals.
If there is a better way - what sort of data structure/schema should I use?
I can think of 3 approaches:
You can add a date and forecast column and then get rid of the columns that are named after month/years. It's like transpose action in Excel. Additionally, since Dept, Leave_Balance, projected_balance_6m will not be in the same grain as the new columns, I'd create a new table. Example rows from the new table would be like:
+------------+-----------+----------+
| EmployeeID | YearMonth | Forecast |
+------------+-----------+----------+
| 456 | 201901 | 0 |
| 456 | 201902 | 5 |
+------------+-----------+----------+
Again in a new table, you can add a year column and make the forecast column names to resemble months. This wouldn't be continuous as your current solution but easier to handle in the BI software.
+------------+------+-----+-----+-----+-----+-----+-----+
| EmployeeID | Year | Jan | Feb | Mar | Apr | May | Jun |
+------------+------+-----+-----+-----+-----+-----+-----+
| 456 | 2019 | 0 | 0 | 0 | 0 | 0 | 0 |
| 456 | 2020 | 0 | 5 | 0 | 6 | 0 | 0 |
| 123 | 2020 | 0 | 0 | 1 | 0 | 0 | 0 |
+------------+------+-----+-----+-----+-----+-----+-----+
Other approach could be to rename columns relative to current date. Here, cur is SEPT19, cur+1 is OCT19 and so on. This solution will have the least impact but, drawback of this approach is, it is not clear when you last updated the table, and what cur value is actually. So, that information should be made available somewhere.
+-----+------+-------+---------------+--------------+-----+-------+-------+
| ID | Name | Dept | Leave_Balance | p_balance_6m | cur | cur+1 | cur+2 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
| 456 | Mary | Sales | 32.3 | 45.6 | 0 | 0 | 0 |
+-----+------+-------+---------------+--------------+-----+-------+-------+
I like the first and second solutions more because they are more self contained. Your choice would depend on how much you want to rely on BI software (Tableau, Qlikview etc).
Related
I am currently working on a project with spark datasets (in Java) where I have to create a new column derived from an accumulator run over all the previous rows.
I have been implementing this using a custom UserDefinedAggregationFunction over a Window from unboundedPreceding to currentRow.
This goes something like this:
df.withColumn("newColumn", customAccumulator
.apply(columnInputSeq)
.over(customWindowSpec));
However, I would really prefer to use a typed Dataset for type safety reasons and generally cleaner code. i.e: perform the same operation with an org.apache.spark.sql.expressions.Aggregator over a Dataset<CustomType>. The problem here is I have looked through all the documentation and can't work out how to make it behave in the same way as above (i.e. I can only get a final aggregate over the whole column rather than a cumulative state at each row).
Is what I am trying to do possible and if so, how?
Example added for clarity:
Initial table:
+-------+------+------+
| Index | Col1 | Col2 |
+-------+------+------+
| 1 | abc | def |
| 2 | ghi | jkl |
| 3 | mno | pqr |
| 4 | stu | vwx |
+-------+------+------+
Then with example aggregation operation:
First reverse the accumulator, prepend Col1 append Col2 and return this value, also setting it as the accumulator.
+-------+------+------+--------------------------+
| Index | Col1 | Col2 | Accumulator |
+-------+------+------+--------------------------+
| 1 | abc | def | abcdef |
| 2 | ghi | jkl | ghifedcbajkl |
| 3 | mno | pqr | mnolkjabcdefihgpqr |
| 4 | stu | vwx | sturpqghifedcbajklonmvwx |
+-------+------+------+--------------------------+
Using a UserDefinedAggregateFunction I have been able to produce this but with an Aggregator I can only get the last row.
You don't
My source for this is a friend who has been working on an identical problem to this and has now concluded it's impossible
I am working on an algorithm, using SQL and JAVA, concerning big datasets.
In SQL I have a table with all the data and I want to use as much of SQL queries as possible before loading it into JAVA.
I generate random datasets (in Java), consisting exclusively of integers between 1 and 40001 and then insert them into a MySQL table.
The rows can be of different lengths, with a maximum of 30 items/records (this includes the ID). So normally the amount of columns is 30 (so COL1, COL2, COL3,......COL30) but this amount will also be random at some point
What I want to do is count the occurrence of every distinct item in a table/dataset and put them in a new table with their count. This however is tricky since I want to count it over the entire table, not just one column. How do I do this?
To specify:
Take this table for example (this is a very small one in comparison with my usual tables):
ID | COL1 | COL2 | COL3 | COL4 | COL5 |
---------------------------------------
1 | 8 | 35 | 42 | 12 | 27 |
2 | 22 | 42 | 35 | 8 | NULL |
3 | 18 | 22 | 8 | NULL | NULL |
4 | 42 | 12 | 27 | 35 | 8 |
5 | 18 | 27 | 12 | 22 | NULL |
What I want to extract from this table is this:
Item | Count
-------------
8 | 3
35 | 3
40 | 1
12 | 3
27 | 3
22 | 3
42 | 2
43 | 1
18 | 2
It is also the case that an item can't be in the same row more than once, if that helps.
Can anyone help me? Or can it just simply not be done in SQL? Would it be better to do this in JAVA, performance-wise?
Thanks in advance!
You can do this by unpivoting the data and then aggregating:
select col, count(*)
from (select col1 as col from t union all
select col2 from t union all
. . .
select col30 from t
) t
group by col;
If you don't have a known set of columns, then you will need to use dynamic SQL.
i'm new in java programming, and i need help to filter my records from an array list weekly(first day of every week [Monday]) in the current year.
i have showed all my records from an ArrayList in a simple table result:
+----+-------+-----------+-------+-------------------+
| ID | Name | LastName | Email | Registration Date |
+----+-------+-----------+-------+-------------------+
| 1 | Name1 | LastName1 | Email | 01-01-2017 |
+----+-------+-----------+-------+-------------------+
| 2 | Name2 | LastName2 | Email | 05-02-2017 |
+----+-------+-----------+-------+-------------------+
| 3 | Name3 | LastName3 | Email | 15-02-2017 |
+----+-------+-----------+-------+-------------------+
| 4 | Name4 | LastName4 | Email | 18-03-2017 |
+----+-------+-----------+-------+-------------------+
| 5 | Name5 | LastName5 | Email | 22-04-2017 |
+----+-------+-----------+-------+-------------------+
| 6 | Name6 | LastName6 | Email | 15-05-2017 |
+----+-------+-----------+-------+-------------------+
| 7 | Name7 | LastName7 | Email | 26-06-2017 |
+----+-------+-----------+-------+-------------------+
| 8 | Name8 | LastName8 | Email | 26-06-2017 |
+----+-------+-----------+-------+-------------------+
this the result i need to show using Registration date like a filter to filter all my records weekly:
+--------------------+
|Week from 02-01-2017|
+----+-------+-------+---+-------+-------------------+
| ID | Name | LastName | Email | Registration Date |
+----+-------+-----------+-------+-------------------+
| 1 | Name1 | LastName1 | Email | 02-01-2017 |
+----+-------+-----------+-------+-------------------+
| 2 | Name2 | LastName2 | Email | 05-01-2017 |
+----+-------+-----------+-------+-------------------+
+--------------------+
|Week from 13-02-2017|
+----+-------+-------+---+-------+-------------------+
| ID | Name | LastName | Email | Registration Date |
+----+-------+-----------+-------+-------------------+
| 3 | Name3 | LastName3 | Email | 15-02-2017 |
+----+-------+-----------+-------+-------------------+
+--------------------+
|Week from 13-03-2017|
+----+-------+-------+---+-------+-------------------+
| 4 | Name4 | LastName4 | Email | 18-03-2017 |
+----+-------+-----------+-------+-------------------+
+--------------------+
|Week from 17-03-2017|
+----+-------+-------+---+-------+-------------------+
| 5 | Name5 | LastName5 | Email | 22-04-2017 |
+----+-------+-----------+-------+-------------------+
+--------------------+
|Week from 15-05-2017|
+----+-------+-------+---+-------+-------------------+
| 6 | Name6 | LastName6 | Email | 15-05-2017 |
+----+-------+-----------+-------+-------------------+
| 7 | Name6 | LastName6 | Email | 19-05-2017 |
+----+-------+-----------+-------+-------------------+
+--------------------+
|Week from 26-06-2017|
+----+-------+-------+---+-------+-------------------+
| 8 | Name7 | LastName7 | Email | 26-06-2017 |
+----+-------+-----------+-------+-------------------+
| 9 | Name8 | LastName8 | Email | 29-06-2017 |
+----+-------+-----------+-------+-------------------+
The date in "week from dd-MM-yyyy" mean all Mondays in the current year when my program is executed can be executed in 2026 and filter the records weekly.
this is the code i developed:
//from an Arraylist instantiated:
List<CertificatMajCrBean> lines = new ArrayList<CertificatMajCrBean>();
with the elements i want to add:
lines.add(records);
//i show the result like that:
System.out.println("+---------+----------------+---------------------+------------------+-------------------+");
System.out.println("| ID | Name | LastName | Email | Registration Date |");
System.out.println("+---------+----------------+---------------------+------------------+-------------------+");
for (CertificatMajCrBean line: lignes){
System.out.println("| "+numRec+" | "+line.getName()+" | "+line.getLastName()+" | "+line.getEmail() +" | "+line.getRegistrationDate()+" |");
System.out.println("+---------+----------------+---------------------+------------------+---------------------------+");
numRec ++;
}
into this iteration i want to add my filter.
Any help to make this algorithm real, many thanks !
I suggest that in the name of good style and of breaking a problem into smaller pieces first we separate the computation of your result from the printing of it. To that end we need a data structure that can hold a number of lists of beans, one list per week. I suggest:
Map<LocalDate, List<CertificatMajCrBean>> byWeek = new TreeMap<>();
Most often one would use a HashMap rather than a TreeMap. You may, but the advantage of a TreeMap is that it will keep the map sorted by date, which will help us when we want to print the weekly lists in chronological order.
For the computation, iterate through your ArrayList the way you are probably already doing. For each CertificatMajCrBean from the list, get the registration date. I am assuming this is a LocalDate as I suggested in a comment. Find the Monday where the week begins from bean.getRegistrationDate().with(TemporalAdjusters.previousOrSame(DayOfWeek.MONDAY)). I know it looks complicated, but it’s lovely terse. As the method name suggests, it finds the previous Monday of the registration date or the same date if it is already a Monday. Next, check if this Monday is already a key in the map. It’s easiest to use byWeek.get(mondayOfRegWeek). This will return null the first time because there isn’t already a map entry. In this case, create a new ArrayList, store your bean into it and put in into the map: byWeek.put(mondayOfRegWeek, weeklyBeans); where weeklyBeans is your newly created list. If byWeek.get(mondayOfRegWeek) returned a list, just add the bean to it and you’re done. After you’ve processed all your beans from your list, your map is finished.
For printing the result use a for loop over the entry set of the map:
for (Map.Entry<LocalDate, List<CertificatMajCrBean>> entry : byWeek.entrySet()) {
// print beans from one week here
}
Inside the for loop use entry.getKey() to get the Monday where the week begins. If you just print the date, it will come out as 2017-06-26. To format it in some other way, you may use a DateTimeFormatter. To get the list of beans for the week, use entry.getValue(). Again you may use a loop inside the loop to print each bean.
If some of it calls for clarification, you may want to check the API documentation first, or else revert in a comment here.
If you had been advanced, I would have told you to use a Stream and Collectors.groupingBy() for the computation. If you are new to programming you probably don’t want to try that just yet.
Summary
Reading a one day after DST entry and changing non-date field also changes the date field to minus 1 day (read Feb 22 persisted as Feb 21)
Background
postgres 9.2.7
jboss 4.2.3
centos
DST final date: Feb 22 2015 (At feb 22 00:00 we back to feb 21 23:00)
Foo
+--------------------------------------+
| id | eventDate(date)| name |
+-------------------------+------------|
| 1 | 2015-02-22 | Iorek |
| 2 | 2015-02-22 | Lyra |
| 3 | 2015-02-22 | Stanis |
| 4 | 2015-02-22 | Asriel |
| 5 | 2015-02-23 | Will |
+--------------------------------------+
Foo.java (Pojo)
java.util.Date for eventDate
Problem Scenario
Using javaweb application: Loading an Foo object using HQL with id 1, for example, changing only the name from Iorek to Compass and persisting it, will also change the eventDate to -1 day. New table will look like below:
Foo after change
+--------------------------------------+
| id | eventDate(date)| name |
+-------------------------+------------|
| 1 | 2015-02-21 | Compass |
| 2 | 2015-02-22 | Lyra |
| 3 | 2015-02-22 | Stanis |
| 4 | 2015-02-22 | Asriel |
| 5 | 2015-02-22 | Will |
+--------------------------------------+
If I name change an id 5 entry for example nothing happens with the eventDate. It only happens with entries in day 2015-02-22 (transition from DST to Normal)
Attempts to reproduce
This scenario only happens in production environment and was not confirmed in development.
Pointing development java to production database (read-only access) I could confirm that the timestamp is read properly and date is read as 2015-02-22 (unable to test write of a changed entry at day 22)
I am trying to load data from a text file into a MySQL table, by calling MySQL's LOAD DATA INFILE from a Java process. This file can contain some data for the current date and also for previous days. The table can also contain data for previous dates. The problem is that some of the columns in the file for previous dates might have changed. But I don't want to update all of these columns but only want the latest values for some of the columns.
Example,
Table
+----+-------------+------+------+------+
| id | report_date | val1 | val2 | val3 |
+----+-------------+------+------+------+
| 1 | 2012-12-01 | 10 | 1 | 1 |
| 2 | 2012-12-02 | 20 | 2 | 2 |
| 3 | 2012-12-03 | 30 | 3 | 3 |
+----+-------------+------+------+------+
Data in Input file:
1|2012-12-01|10|1|1
2|2012-12-02|40|4|4
3|2012-12-03|40|4|4
4|2012-12-04|40|4|4
5|2012-12-05|50|5|5
Table after the load should look like
mysql> select * from load_infile_tests;
+----+-------------+------+------+------+
| id | report_date | val1 | val2 | val3 |
+----+-------------+------+------+------+
| 1 | 2012-12-01 | 10 | 1 | 1 |
| 2 | 2012-12-02 | 40 | 4 | 2 |
| 3 | 2012-12-03 | 40 | 4 | 3 |
| 4 | 2012-12-04 | 40 | 4 | 4 |
| 5 | 2012-12-05 | 50 | 5 | 5 |
+----+-------------+------+------+------+
5 rows in set (0.00 sec)
Note that column val3 values are not updated. Also I need to do this for large files as well, some files can be >300Megs or more, and so it needs to be a scalable solution.
Thanks,
Anirudha
It would be good to use LOAD DATA INFILE with REPLACE option, but in this case records will be dropped and added again, so old val3 values will be lost.
Try to load data into temporary table, then update your table from temp. table using INSERT ... SELECT/UPDATE or INSERT ... ON DUPLICATE KEY UPDATE statements.