Spark and non-denormalized tables

Spark and non-denormalized tables - java

I know Spark works much better with denormalized tables, where all the needed data is in one line. I wondering, if it is not the case, it would have a way to retrieve data from previous, or next, rows.
Example:
Formula:
value = (value from 2 year ago) + (current year value) / (value from 2 year ahead)
Table
+-------+-----+
| YEAR|VALUE|
+-------+-----+
| 2015| 100 |
| 2016| 34 |
| 2017| 32 |
| 2018| 22 |
| 2019| 14 |
| 2020| 42 |
| 2021| 88 |
+-------+-----+
Dataset<Row> dataset ...
Dataset<Results> results = dataset.map(row -> {
int currentValue = Integer.valueOf(row.getAs("VALUE")); // 2019
// non sense code just to exemplify
int twoYearsBackValue = Integer.valueOf(row[???].getAs("VALUE")); // 2016
int twoYearsAheadValue = Integer.valueOf(row[???].getAs("VALUE")); // 2021
double resultValue = twoYearsBackValue + currentValue / twoYearsAheadValue;
return new Result(2019, resultValue);
});
Results[] results = results.collect();
Is it possible to grab these values (that belongs to other rows) without changing the table format (no denormalization, no pivots ...) and also without collecting the data, or does it go totally against Spark/BigData principles?

Related

Join two entries rows ( Start time and End time) as a single row in Talend

I have Data coming from a MS SQL Database, it is concerning the the working hours of employees.
The problem is that, the start time and the end time are stored as 2 different entries, so when the employee comes, he scans his badge and this is considered arrival time, and when he leaves, he scans his badge again and this is considered departure time.
There is one column that helps to make the difference between the start and the end time (CodeNr column : B1 = StartTime, B2 = EndTime)
so this is how my Table looks like
Now i need this data as a single entry, in Talend oder from the Database,
so that should looks like
What to use in order to achieve this please (specially in Talend and when to complicate than in MS SQL)?

CREATE TABLE EmployeeWorkLoad(
EmployeeNr bigint,
Year int,
Month int,
Day int,
Hour int,
Minute int,
CodeNr char(2)
)
Insert into [EmployeeWorkLoad] ( [EmployeeNr],[Year],[Month] ,[Day],[Hour], [Minute] ,[CodeNr]) Values (1,2020,1,4,8,30,'B1'),
(1,2020,1,4,16,45,'B2'),
(1,2020,1,6,8,15,'B1'),
(1,2020,1,6,16,45,'B2'),
(2,2020,3,2,8,10,'B1'),
(2,2020,3,2,16,5,'B2')
GO
6 rows affected
WITH CTE AS (
select EmployeeNr,Year,Month,Day,
MAX(CASE WHEN CodeNr='B1' THEN Hour END) AS StartHour,
MAX(CASE WHEN CodeNr = 'B1' THEN Minute END) AS StartMinute,
MAX(CASE WHEN CodeNr = 'B2' THEN Hour END) AS EndHour,
MAX(CASE WHEN CodeNr = 'B2' THEN Minute END) AS EndMinute
from EmployeeWorkLoad
group by EmployeeNr,Year,Month,Day )
SELECT * , ABS(EndHour-StartHour) AS DutationHour
,ABS(IIF(EndMinute <StartMinute, EndMinute+60, EndMinute)- StartMinute) AS DurationMinute
FROM
CTE
GO
EmployeeNr | Year | Month | Day | StartHour | StartMinute | EndHour | EndMinute | DutationHour | DurationMinute
---------: | ---: | ----: | --: | --------: | ----------: | ------: | --------: | -----------: | -------------:
1 | 2020 | 1 | 4 | 8 | 30 | 16 | 45 | 8 | 15
1 | 2020 | 1 | 6 | 8 | 15 | 16 | 45 | 8 | 30
2 | 2020 | 3 | 2 | 8 | 10 | 16 | 5 | 8 | 55
db<>fiddle here

SearchRequest in RootDSE

I have to following function to query users from an AD server:
public List<LDAPUserDTO> getUsersWithPaging(String filter)
{
List<LDAPUserDTO> userList = new ArrayList<>();
try(LDAPConnection connection = new LDAPConnection(config.getHost(),config.getPort(),config.getUsername(),config.getPassword()))
{
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB,filter, null);
ASN1OctetString resumeCookie = null;
while (true)
{
searchRequest.setControls(
new SimplePagedResultsControl(100, resumeCookie));
SearchResult searchResult = connection.search(searchRequest);
for (SearchResultEntry e : searchResult.getSearchEntries())
{
LDAPUserDTO tmp = new LDAPUserDTO();
tmp.distinguishedName = e.getAttributeValue("distinguishedName");
tmp.name = e.getAttributeValue("name");
userList.add(tmp);
}
LDAPTestUtils.assertHasControl(searchResult,
SimplePagedResultsControl.PAGED_RESULTS_OID);
SimplePagedResultsControl responseControl =
SimplePagedResultsControl.get(searchResult);
if (responseControl.moreResultsToReturn())
{
resumeCookie = responseControl.getCookie();
}
else
{
break;
}
}
return userList;
} catch (LDAPException e) {
logger.error(e.getExceptionMessage());
return null;
}
}
However, this breaks when I try to search on the RootDSE.
What I've tried so far:
baseDN = null
baseDN = "";
baseDN = RootDSE.getRootDSE(connection).getDN()
baseDN = "RootDSE"
All resulting in various exceptions or empty results:
Caused by: LDAPSDKUsageException(message='A null object was provided where a non-null object is required (non-null index 0).
2020-04-01 10:42:22,902 ERROR [de.dbz.service.LDAPService] (default task-1272) LDAPException(resultCode=32 (no such object), numEntries=0, numReferences=0, diagnosticMessage='0000208D: NameErr: DSID-03100213, problem 2001 (NO_OBJECT), data 0, best match of:
''
', ldapSDKVersion=4.0.12, revision=aaefc59e0e6d110bf3a8e8a029adb776f6d2ce28')

So, I really spend a lot of time with this. It is possible to kind of query the RootDSE, but it's not that straight forward as someone might think.
I mainly used WireShark to see what the guys at Softerra are doing with their LDAP Browser.
Turns out I wasn't that far away:
As you can see, the baseObject is empty here.
Also, there is one additional Control with the OID LDAP_SERVER_SEARCH_OPTIONS_OID and the ASN.1 String 308400000003020102.
So what does this 308400000003020102 more readable: 30 84 00 00 00 03 02 01 02 actually do?
First of all, we decode this into something, we can read - in this case, this would be the int 2.
In binary, this gives us: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
As we know from the documentation, we have the following notation:
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 |
|---|---|---|---|---|---|---|---|---|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|----|-------|-------|
| x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | x | SSFPR | SSFDS |
or we just take the int values from the documentation:
1 = SSFDS -> SERVER_SEARCH_FLAG_DOMAIN_SCOPE
2 = SSFPR -> SERVER_SEARCH_FLAG_PHANTOM_ROOT
So, in my example, we have SSFPR which is defined as follows:
For AD DS, instructs the server to search all NC replicas except
application NC replicas that are subordinate to the search base, even
if the search base is not instantiated on the server. For AD LDS, the
behavior is the same except that it also includes application NC
replicas in the search. For AD DS and AD LDS, this will cause the
search to be executed over all NC replicas (except for application NCs
on AD DS DCs) held on the DC that are subordinate to the search base.
This enables search bases such as the empty string, which would cause
the server to search all of the NC replicas (except for application
NCs on AD DS DCs) that it holds.
NC stands for Naming Context and those are stored as Operational Attribute in the RootDSE with the name namingContexts.
The other value, SSFDS does the following:
Prevents continuation references from being generated when the search
results are returned. This performs the same function as the
LDAP_SERVER_DOMAIN_SCOPE_OID control.
So, someone might ask why I even do this. As it turns out, I got a customer with several sub DCs under one DC. If I tell the search to handle referrals, the execution time is pretty high and too long - therefore this wasn't really an option for me. But when I turn it off, I wasn't getting all the results when I was defining the BaseDN to be the group whose members I wanted to retrieve.
Searching via the RootDSE option in Softerra's LDAP Browser was way faster and returned the results in less then one second.
I personally don't have any clue why this is way faster - but the ActiveDirectory without any interface of tool from Microsoft is kind of black magic for me anyway. But to be frank, that's not really my area of expertise.
In the end, I ended up with the following Java code:
SearchRequest searchRequest = new SearchRequest("", SearchScope.SUB, filter, null);
[...]
Control globalSearch = new Control("1.2.840.113556.1.4.1340", true, new ASN1OctetString(Hex.decode("308400000003020102")));
searchRequest.setControls(new SimplePagedResultsControl(100, resumeCookie, true),globalSearch);
[...]
The used Hex.decode() is the following: org.bouncycastle.util.encoders.Hex.
A huge thanks to the guys at Softerra which more or less put my journey into the abyss of the AD to an end.

You can't query users from the RootDSE.
Use either a domain or if you need to query users from across domains in a forest use the global catalog (running on different ports, not the default 389 / 636 for LDAP(s).
RootDSE only contains metadata. Probably this question should be asked elsewhere for more information but first read up on the documentation from Microsoft, e.g.:
https://learn.microsoft.com/en-us/windows/win32/ad/where-to-search
https://learn.microsoft.com/en-us/windows/win32/adschema/rootdse
E.g.: namingContexts attribute can be read to find which other contexts you may want to query for actual users.
Maybe start with this nice article as introduction:
http://cbtgeeks.com/2016/06/02/what-is-rootdse/

How to put a list of data table in a list of objects

I have a data table in my feature file, which I want to convert to a list of objects. The problem is data table has headers, which are supposed to be set in the value of objects. As an example:
| ANNOTATION_TYPE_ID | ANNOTATION_SUBTYPE_ID | PAGE_NB | LEFT_NB | TOP_NB | WIDTH_NB | HEIGHT_NB | FONTSIZE_NB | COLOR_X | ANNOTATION_TEXT_X |
| 1 | 1 | 1 | 400 | 200 | 88 | 38 | 15 | FFFFFF | TEST Annotation |
| 2 | 2 | 1 | 150 | 150 | 88 | 38 | 20 | FFFFF0 | TEST Annotation |
This I want to convert to a list of objects as List annotations where Annotation is a class and the headers of the above data table are essentially the field variables inside the class.
What is the efficient way to do this?
The moment I convert data table to list (List<String> annotationList = annotation.asList(String.class)), it becomes a big set and how to group them is what I am struggling with?

One approach would be to look at this as a list of annotations, with each annotation having a set of key/value pairs based upon each row in your file. It would look like List of HashMaps where each HashMap key is the row header value and the value is the row value. This may not be the most efficient approach depending upon your usage. Here's sample code that was able to parse the data you provided - it produces a List with two items, each has a HashMap with the number of key/values for the columns above. Good luck.
public static void main(String[] args) {
Path filePath = Path.of("C:\\tmp\\anno");
if (!Files.exists(filePath)) {
System.out.println("File does not exist at '" + filePath + "'");
System.exit(1);
}
List<HashMap<String, String>> annotations = new ArrayList<HashMap<String, String>>();
try {
List<String> annoFile = Files.readAllLines(filePath);
List<String> headers = Arrays.asList(annoFile.remove(0).split("\\|"));
headers.forEach(System.out::println);
while (annoFile.size() > 0) {
List<String> rowValues = Arrays.asList(annoFile.remove(0).split("\\|"));
HashMap<String, String> annotation = new HashMap<>();
for (int i = 0; i < headers.size(); i++) {
if (rowValues.size() >= i) {
annotation.put(headers.get(i).strip(), rowValues.get(i).strip());
}
}
annotations.add(annotation);
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

MySQL query to fetch list of data using logical operations

The following are the list of different kinds of books that customers read in a library. The values are stored with the power of 2 in a column called bookType.
I need to fetch list of books with the combinations of persons who read
only Novel Or only Fairytale Or only BedTime Or both Novel + Fairytale
from the database with logical operational query.
Fetch list for the following combinations :
person who reads only novel(Stored in DB as 1)
person who reads both novel and fairy tale(Stored in DB as 1+2 = 3)
person who reads all the three i.e {novel + fairy tale + bed time} (stored in DB as 1+2+4 = 7)
The count of these are stored in the database in a column called BookType(marked with red in fig.)
How can I fetch the above list using MySQL query
From the example, I need to fetch users like novel readers (1,3,5,7).

The heart of this question is conversion of decimal to binary and mysql has a function to do just - CONV(num , from_base , to_base );
In this case from_base would be 10 and to_base would be 2.
I would wrap this in a UDF
So given
MariaDB [sandbox]> select id,username
-> from users
-> where id < 8;
+----+----------+
| id | username |
+----+----------+
| 1 | John |
| 2 | Jane |
| 3 | Ali |
| 6 | Bruce |
| 7 | Martha |
+----+----------+
5 rows in set (0.00 sec)
MariaDB [sandbox]> select * from t;
+------+------------+
| id | type |
+------+------------+
| 1 | novel |
| 2 | fairy Tale |
| 3 | bedtime |
+------+------------+
3 rows in set (0.00 sec)
This UDF
drop function if exists book_type;
delimiter //
CREATE DEFINER=`root`#`localhost` FUNCTION `book_type`(
`indec` int
)
RETURNS varchar(255) CHARSET latin1
LANGUAGE SQL
NOT DETERMINISTIC
CONTAINS SQL
SQL SECURITY DEFINER
COMMENT ''
begin
declare tempstring varchar(100);
declare outstring varchar(100);
declare book_types varchar(100);
declare bin_position int;
declare str_length int;
declare checkit int;
set tempstring = reverse(lpad(conv(indec,10,2),4,0));
set str_length = length(tempstring);
set checkit = 0;
set bin_position = 0;
set book_types = '';
looper: while bin_position < str_length do
set bin_position = bin_position + 1;
set outstring = substr(tempstring,bin_position,1);
if outstring = 1 then
set book_types = concat(book_types,(select trim(type) from t where id = bin_position),',');
end if;
end while;
set outstring = book_types;
return outstring;
end //
delimiter ;
Results in
+----+----------+---------------------------+
| id | username | book_type(id) |
+----+----------+---------------------------+
| 1 | John | novel, |
| 2 | Jane | fairy Tale, |
| 3 | Ali | novel,fairy Tale, |
| 6 | Bruce | fairy Tale,bedtime, |
| 7 | Martha | novel,fairy Tale,bedtime, |
+----+----------+---------------------------+
5 rows in set (0.00 sec)
Note the loop in the UDF to walk through the binary string and that the position of the 1's relate to the ids in the look up table;
I leave it to you to code for errors and tidy up.

Removing null elements and keeping non-null elements together on a list in jasper reports

I am using JRBeanCollectionDataSource as datasource for a subreport. Each record in the list contains elements with either null or non-null value . This is my POJO:
public class PayslipDtl {
private String earningSalaryHeadName;
private double earningSalaryHeadAmount;
private String deductionSalaryHeadName;
private double deductionSalaryHeadAmount;
String type;
public PayslipDtl(String salaryHeadName,
double salaryHeadAmount, String type) {
if(type.equalsIgnoreCase("Earning")) {
earningSalaryHeadName = salaryHeadName;
earningSalaryHeadAmount = salaryHeadAmount;
} else {
deductionSalaryHeadName = salaryHeadAmount;
deductionSalaryHeadAmount = salaryHeadAmount;
}
}
//getters and setters
}
Based on the "type", the list is populated as such: {"Basic", 4755, null, 0.0}, {"HRA", 300, null, 0.0}, {null, 0.0, "Employee PF", 925}, {"Medical Allowance", 900, null, 0.0} and so on...
After setting isBlankWhenNull to true and using "Print when" expression, the record is displayed as such:
|Earning |Amount|Deduction |Amount|
--------------------|------|---------------------|------|
| Basic | 4755 | | |
| HRA | 300 | | |
| | | Employee PF | 925 |
| Medical Allowance | 900 | | |
| Fuel Reimbursement| 350 | | |
| | | Loan | 1000 |
---------------------------------------------------------
I want it to be displayed as such:
|Earning |Amount|Deduction |Amount|
--------------------|------|---------------------|------|
| Basic | 4755 | Employee PF | 925 |
| HRA | 300 | Loan | 1000 |
| Medical Allowance | 900 | | |
| Fuel Reimbursement| 350 | | |
---------------------------------------------------------
Setting isRemoveLineWhenBlank to true doesn't work since it is not the entire row which is blank but only a subset of elements of a row that is null.
Is it possible in Jasper?
I am using iReport Designer 5.0.1 with compatibility set to JasperReports3.5.1.

Use a List component for the deduction/amount, here you have a video tutorial on how to do this.
Then deduction and amount fields on the list component need the following options Blank when null and Remove line when blank.
If this still gives you blank lines, try putting both fields on a frame inside the list and mark those options for the frame too.

Only one good solution is, you have to create separate table as:
table employeeED:
srno int,
Earning varchar(50),
EarnAmount Double,
Deduction varchar(50)
DedAmount Double
then you have to insert all earnings in earning side and update all deductions in deductions side.
int i=1;
rs.first();
while(rs.next())
{
if(rs.getString("type").equals("Earning"))
Insert into employeeEd (srno, Earning,EarnAmount) values (i, rs('earning'), rs('eamt'))
}
int j=1;
rs.first();
while(rs.next())
{
if(rs.getString("type").equals("deduction"))
update employeeEd set Deductions='"+rs('earning')+"', DedAmount=" + rs('eamt') + " where srno="+j)
j++;
}
then use employeeED table as datasource.
100% working.

We Keep Coding

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Spark and non-denormalized tables - java

Related

Join two entries rows ( Start time and End time) as a single row in Talend

SearchRequest in RootDSE

How to put a list of data table in a list of objects

MySQL query to fetch list of data using logical operations

Removing null elements and keeping non-null elements together on a list in jasper reports

Categories

Resources