How to join two frames' rows in H2O? - java

I am implementing my own algorithm in H2O's Java source code (under package h2o-algos).
How can I join two frames' rows (i.e. vectors) in H2O given H2O Java methods?
For instance, given two Frame A and B
Frame A:
| Id | Name |
| -------- | -------------- |
| 123 | John |
| 456 | Bob |
Frame B:
| Id | Name |
| -------- | -------------- |
| 789 | Alice |
I want the resultant Frame C to be:
| Id | Name |
| -------- | -------------- |
| 123 | John |
| 456 | Bob |
| 789 | Alice |
Is there a way to do this faster then: making new vectors, than create a new frame from the new vectors? I have read the documentation and found that the Frame::append() method would create new columns, not joining rows.

This functionality is called "row binding", it is not exposed as an API method. It is, however, available as a Rapids expression (simple scheme-like language). You can follow this example to row-bind 2 H2O Frames: https://github.com/h2oai/h2o-3/blob/master/h2o-core/src/test/java/water/rapids/ast/prims/mungers/AstRBindTest.java#L40 In a nutshell, if you have 2 frames with keys A and B you would run water.rapids.Rapids.exec("rbind A B").getFrame()

Related

multiple columns in parent to use same child table to save data in JPA

I have a requirement where I have to save employee detail to a child table, employee data remains the same for multiple parent records (columns as well), I want to use the existing child table to insert if not present and update if already present, how I can do that using Spring JPA / hibernate?
Parent Table
|id|project_id|owner_emp_id|developer_emp_id|lead_emp_id|tester_emp_id|
|:---- |:------:| -----:|:---- |:------:| -----:|
| 1 | 100 | emp_10 | emp_20 | emp_20 | emp_30 |
| 2 | 200 | emp_11 | emp_21 | emp_22 | emp_30 |
employee child table
|emp_id|first_name|olast_name|email|phone|
|:---- |:------:| -----:|:---- |:------:|
| emp_10 |..| .. | .. | .. |
| emp_20 | .. | .. | .. | .. |
| emp_30 | .. | .. | .. | .. |
| emp_11 | .. | .. | .. | .. |
| emp_21 | .. | .. | .. | .. |
| emp_22 | .. | .. | .. | .. |
from the above example when saving the child table during 1st record emp_20 data need to save only once. similarly, the 2nd record insert emp_30 already presents an update or ignores saving it, how to do this in JPA?

How do I run a spark sql aggregator cumulatively?

I am currently working on a project with spark datasets (in Java) where I have to create a new column derived from an accumulator run over all the previous rows.
I have been implementing this using a custom UserDefinedAggregationFunction over a Window from unboundedPreceding to currentRow.
This goes something like this:
df.withColumn("newColumn", customAccumulator
.apply(columnInputSeq)
.over(customWindowSpec));
However, I would really prefer to use a typed Dataset for type safety reasons and generally cleaner code. i.e: perform the same operation with an org.apache.spark.sql.expressions.Aggregator over a Dataset<CustomType>. The problem here is I have looked through all the documentation and can't work out how to make it behave in the same way as above (i.e. I can only get a final aggregate over the whole column rather than a cumulative state at each row).
Is what I am trying to do possible and if so, how?
Example added for clarity:
Initial table:
+-------+------+------+
| Index | Col1 | Col2 |
+-------+------+------+
| 1 | abc | def |
| 2 | ghi | jkl |
| 3 | mno | pqr |
| 4 | stu | vwx |
+-------+------+------+
Then with example aggregation operation:
First reverse the accumulator, prepend Col1 append Col2 and return this value, also setting it as the accumulator.
+-------+------+------+--------------------------+
| Index | Col1 | Col2 | Accumulator |
+-------+------+------+--------------------------+
| 1 | abc | def | abcdef |
| 2 | ghi | jkl | ghifedcbajkl |
| 3 | mno | pqr | mnolkjabcdefihgpqr |
| 4 | stu | vwx | sturpqghifedcbajklonmvwx |
+-------+------+------+--------------------------+
Using a UserDefinedAggregateFunction I have been able to produce this but with an Aggregator I can only get the last row.
You don't
My source for this is a friend who has been working on an identical problem to this and has now concluded it's impossible

How to map one-to-many relationships in myBatis?

So I have the following table that I must map to Java Objects:
+---------+-----------+---------------------+---------------------+--------+
| task_id | attribute | lastModified | activity | row_id |
+---------+-----------+---------------------+---------------------+--------+
| 1 | 1 | 2016-08-23 21:05:09 | first activity | 1 |
| 1 | 3 | 2016-08-23 21:08:28 | connect to db | 2 |
| 1 | 3 | 2016-08-23 21:08:56 | create web services | 3 |
| 1 | 4 | 2016-08-23 21:08:56 | data dump | 4 |
| 1 | 5 | 2016-08-23 21:08:56 | test cases | 5 |
| 1 | 6 | 2016-08-23 21:08:57 | dao object | 6 |
| 1 | 7 | 2016-08-23 21:08:57 | buy streetfood | 7 |
| 2 | 6 | 2016-08-23 21:08:57 | drink coke | 8 |
| 2 | 6 | 2016-08-23 21:09:00 | drink tea | 9 |
| 2 | 1 | 2016-08-23 21:12:30 | make tea | 10 |
| 2 | 2 | 2016-08-23 21:13:32 | charge phone | 11 |
| 2 | 3 | 2016-08-23 21:13:32 | shower | 12 |
| 2 | 4 | 2016-08-23 21:13:32 | sleep | 13 |
+---------+-----------+---------------------+---------------------+--------+
Here, each Task object( identified by the task_id column) has multiple attribute objects. These attribute objects have the lastModified, and activity fields. So far my approach has been to create a Row object have each row of the table mapped to a Row object via myBatis. Then do some Java-side processing to sort everything out. Is there a way to directly map this table via myBatis annotations and/or xml so that the 2 Task objects are created with each of them having a list of populated Atttribute objects inside?
Here is mybatis document:http://www.mybatis.org/mybatis-3/sqlmap-xml.html .May be you can use mybatis collection to solve your problem.

Set any field once for all the tests in Fitnesse table

I want to set one field in the fitnesse table, only once for all the tests. For example I want to set Operator as + for all the tests in the table.
Below is the regular table.
!|CalculatorFixture |
|Value1|Operator|Value2|calculate?|
|3.0 |+ |5.0 |8.0 |
|2.0 |* |3.5 |7.0 |
I want something like:
!| CalculatorFixture |
|Operator |
|+ |
|Value1|Value2|calculate?|
|3.0 |5.0 |8.0 |
|6.0 |3.0 |9.0 |
|5.0 |2.0 |7.0 |
Any Idea how can I achieve this in the fixture or in the fitnesse table?
FYI, I am using Slim: !define TEST_SYSTEM {slim}
You can set a Java static field in a previous table fixture and then access it in the CalculatorFixture.
You can also pass 'constructor parameters' to scenarios by using having or given as first cell after the scenario name (from FitNesse's tests)
|scenario | myDivision _ _ _|numerator, denominator, quotient|
|setNumerator | #numerator |
|setDenominator | #denominator|
|check | quotient| #quotient |
| myDivision | having |numerator| 12|
| denominator|quotient|
| 3 |4.0 |
| 6 |2.0 |
| 4 |3.0 |

Java: Transaction processing system

I have the tables accounts and action. accounts needs to be modified according to the instruction stored in action.
In action each row contains an account-id, an action (i=insert, u=update, d=delete, x=invalid operation) and an amount by which to update the account.
On an insert, if the account already exists, an update should be done
instead
On an update, if the account does not exist, it is created by an
insert
On a delete, if the row does not exist, no action is taken
Input
accounts:
+---id----value--+
| 1 | 1000 |
| 2 | 2000 |
| 3 | 1500 |
| 4 | 6500 |
| 5 | 500 |
+----------------+
action:
+---account_id---o---new_value---status---+
| 3 | u | 599 | |
| 6 | i | 2099 | |
| 5 | d | | |
| 7 | u | 1599 | |
| 1 | i | 399 | |
| 9 | d | | |
| 10 | x | | |
+-----------------------------------------+
Output
accounts:
+---id----value--+
| 1 | 399 |
| 2 | 800 |
| 3 | 599 |
| 4 | 1400 |
| 6 | 20099 |
| 7 | 1599 |
+----------------+
action:
+---account_id---o---new_value-------------------status----------------+
| 3 | u | 599 | Update: Success |
| 6 | i | 20099 | Update: Success |
| 5 | d | | Delete: Success |
| 7 | u | 1599 | Update: ID not founds. Value inserted |
| 1 | i | 399 | Insert: Acc exists. Updated instead |
| 9 | d | | Delete: ID not found |
| 10 | x | | Invalid operation: No action taken |
+----------------------------------------------------------------------+
I am experienced with Java and JDBC, but unfortunately I just don't know, how to start here.
Do I need an additional table? Do I have to use triggers?
I've seen two techniques for an upsert. With the first technique, within a transaction, you test first to see if the row exists, and use the results to determine whether to perform an insert or an update. With the second technique, you try performing an update and verify the number of records updated (JDBC gives you this). If it's zero, then you do an insert, if one, then you're done.

Categories