we have a messaging logs table and we are using this table to provide a search UI which lets to search messages by id or status or auditor or date. Table audit looks like below
+-----------+----------+---------+---------------------+
| messageId | auditor | status | timestamp |
+-----------+----------+---------+---------------------+
| 10 | program1 | Failed | 2020-08-01 10:00:00 |
| 11 | program2 | success | 2020-08-01 10:01:10 |
| 12 | program3 | Failed | 2020-08-01 10:01:15 |
+-----------+----------+---------+---------------------+
Since in a given date range we could have many messages matching the criteria so we added pagination for the query. Now as a new feature we are adding another table with one to many relation which contain tags as the possible reasons for the failure. The table failure_tags will look like below
+-----------+----------+-------+--------+
| messageId | auditor | type | cause |
+-----------+----------+-------+--------+
| 10 | program1 | type1 | cause1 |
| 10 | program1 | type1 | cause2 |
| 10 | program1 | type2 | cause3 |
+-----------+----------+-------+--------+
Now for a general search query for a status = 'Failed' and using left join with the other table will retrieve 4 rows as below
+-----------+----------+-------+--------+---------------------+
| messageId | auditor | type | cause | timestamp |
+-----------+----------+-------+--------+---------------------+
| 10 | program1 | type1 | cause1 | 2020-08-01 10:00:00 |
| 10 | program1 | type1 | cause2 | 2020-08-01 10:00:00 |
| 10 | program1 | type2 | cause3 | 2020-08-01 10:00:00 |
| 12 | program3 | | | 2020-08-01 10:01:15 |
+-----------+----------+-------+--------+---------------------+
The requirement is to since the 3 rows of messageId 10 belongs to same message the requirement is to merge the rows into 1 in json response, so the response will have only 2 elements
[
{
"messageId": "10",
"auditor": "program1",
"failures": [
{
"type": "type1",
"cause": [
"cause1",
"cause2"
]
},
{
"type": "type2",
"cause": [
"cause3"
]
}
],
"date": "2020-08-01 10:00:00"
},
{
"messageId": "12",
"auditor": "program3",
"failures": [],
"date": "2020-08-01 10:01:15"
}
]
Because of this merge for a pagination request of 10 elements after fetching from the database and merging would result in less than 10 results.
The 1 solution, I could think of is after merging, if its less than page size, initiate a search again do the combining process and take the top 10 elements. Is there any better solution to get all the results in 1 query instead of going twice or more to DB ?
We use generic spring - JDBC not the JPA.
Related
I have a task to take a look in a database (SAP iDoc) that has specific values in it derived by segments. I have to export an xml at the end of the mapping that has a subcomponent that can have more than one row. My problem is that we have a component that has two values that are separated by a qualifier.
Every transaction looks like so:
+----------+-----------+--------+
| QUALF_1 | BETRG_dc | DOCNUM |
+----------+-----------+--------+
| 001 | 20 | xxxxxx |
| 001 | 22 | xxxxxx |
+----------+-----------+--------+
+---------+-----------+-----------+
| QUALF_2 | BETRG_pr | DOCNUM |
+---------+-----------+-----------+
| 013 | 30 | xxxxxx |
| 013 | 40 | xxxxxx |
+---------+-----------+-----------+
My problem is that when joined with the built in transformations we have a geometrical progression like so
+---------+-----------+-----------+
| DOCNUM | BETRG_dc | BETRG_pr |
+---------+-----------+-----------+
| xxxxxx | 20 | 30 |
| xxxxxx | 20 | 40 |
| xxxxxx | 22 | 30 |
| xxxxxx | 22 | 40 |
+---------+-----------+-----------+
As you can see only the first and last rows are correct.
The problem comes from the fact that if BETRG_dc is 0 the whole segment is not being sent so a filter transformation fails.
What i found out is the the segment number of QUALF_1 and QUALF_2 are sequencial. So QUALF_1 is for example 48 and QUALF_2 is 49.
Can you help me create a JAVA transformation that adds a row for a missing QUALF_1.
Here is a table of requirements:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 013 | 20 | 48 |
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
I want the transformation to take a look and if we have a source like this:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
To go ahead and insert a row with the segment id 48 and a value for BETRG of "0".
I have tried every transformation i can.
The expected output should be like this:
+-------+-------+---------------+
| QUALF | BETRG | SegmentNumber |
+-------+-------+---------------+
| 013 | 0 | 48 |
| 001 | 150 | 49 |
| 013 | 15 | 57 |
| 001 | 600 | 58 |
+-------+-------+---------------+
You should join both of the table in a joiner transformation.
use Left(master) outer join and then take it into a target. then map the BETRG column from the right table to the target and the rest of the columns from the left table.
what happens is when ever there is no match BETRG will be empty. take it into a expression and see if the value is null or empty and change it to 0 or what value you wish.
Here is what i have created but unfortunately for now it works on a row level only and not on the whole data. I am working on making the code run properly:
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
if(QUALF.equals("001"))
{
segment_new=(SegmentNumber - 1);
}
int colCount=1;
myList.add(SegmentNumber);
System.out.println("SegmentNumber_out: " + segment_new);
if(Arrays.asList(myList).contains(segment_new)){
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
QUALF_out="013";
BETRG_out="0";
SegmentNumber_out=segment_new;
generateRow();
} else {
QUALF_out = QUALF;
BETRG_out= BETRG;
SegmentNumber_out= SegmentNumber;
generateRow();
}
Here is what works:
import java.util.*;
private ArrayList<String> myList2 = new ArrayList<String>();
QUALF_out = QUALF;
BETRG_out = BETRG;
SegmentNumber_out = SegmentNumber;
DOCNUM = DOCNUM;
array_for_search = QUALF + ParentSegmentNumber + DOCNUM ;
myList2.add(array_for_search);
System.out.println("myList: " + myList2);
System.out.println("Array: " + myList2.contains("910" + ParentSegmentNumber + DOCNUM));
if(!myList2.contains("910" + ParentSegmentNumber + DOCNUM)){
QUALF_out="910";
BETRG_out="0";
}
generateRow();
I'm new in Spark Java API. I want to filter my Dataset where a column is not a number. My dataset ds1 is something like this.
+---------+------------+
| account| amount |
+---------+------------+
| aaaaaa | |
| aaaaaa | |
| bbbbbb | |
| 123333 | |
| 555555 | |
| 666666 | |
I want return a datset ds2 like this:
+---------+------------+
| account| amount |
+---------+------------+
| 123333 | |
| 555555 | |
| 666666 | |
I tried this but id doesn't work for me.
ds2=ds1.select("account"). where(dsFec.col("account").isNaN());
Can someone please guides me with a sample spark expression to resolve this.
You can define a udf function to check whether the string in account column is numeric or not as
UDF1 checkNumeric = new UDF1<String, Boolean>() {
public Boolean call(final String account) throws Exception {
return StringUtils.isNumeric(account);
}
};
sqlContext.udf().register("numeric", checkNumeric, DataTypes.BooleanType);
and then use callUDF function to call the udf function as
df.filter(callUDF("numeric", col("account"))).show();
which should give you
+-------+------+
|account|amount|
+-------+------+
| 123333| |
| 555555| |
| 666666| |
+-------+------+
Just cast and check if result is null:
ds1.select("account").where(dsFec.col("account").cast("bigint").isNotNull());
One way to do this:
Scala Equivalent:
import scala.util.Try
df.filter(r => Try(r.getString(0).toInt).isSuccess).show()
+-------+------+
|account|amount|
+-------+------+
| 123333| |
| 555555| |
| 666666| |
+-------+------+
Or You can use the same using Java's try catch:
df.map(r => (r.getString(0),r.getString(1),{try{r.getString(0).toInt; true
}catch {
case runtime: RuntimeException => {
false}
}
})).filter(_._3 == true).drop("_3").show()
+------+---+
| _1| _2|
+------+---+
|123333| |
|555555| |
|666666| |
+------+---+
I have an Apache Spark Dataframe of the following format
| ID | groupId | phaseName |
|----|-----------|-----------|
| 10 | someHash1 | PhaseA |
| 11 | someHash1 | PhaseB |
| 12 | someHash1 | PhaseB |
| 13 | someHash2 | PhaseX |
| 14 | someHash2 | PhaseY |
Each row represents a phase that happens in a procedure that consists of several of these phases. The ID column represents a sequential order of phases and the groupId column shows which phases belong together.
I want to add a new column to the dataframe: previousPhaseName. This column should indicate the previous different phase from the same procedure. The first phase of a process (the one with the minimum ID) will have null as previous phase. When a phase occurs twice or more, the second (third...) occurrence will have the same previousPhaseName For example:
df =
| ID | groupId | phaseName | prevPhaseName |
|----|-----------|-----------|---------------|
| 10 | someHash1 | PhaseA | null |
| 11 | someHash1 | PhaseB | PhaseA |
| 12 | someHash1 | PhaseB | PhaseA |
| 13 | someHash2 | PhaseX | null |
| 14 | someHash2 | PhaseY | PhaseX |
I am not sure how to implement this. My first approach would be:
create a second empty dataframe df2
for each row in df:
find the row with groupId = row.groupId, ID < row.ID, and maximum id
add this row to df2
join df1 and df2
Partial Solution using Window Functions
I used Window Functionsto aggregate the Name of the previous phase, the number of previous occurrences (not necessarily in a row) of the current phase in the group and the information whether the current and previous phase names are equal:
WindowSpec windowSpecPrev = Window
.partitionBy(df.col("groupId"))
.orderBy(df.col("ID"));
WindowSpec windowSpecCount = Window
.partitionBy(df.col("groupId"), df.col("phaseName"))
.orderBy(df.col("ID"))
.rowsBetween(Long.MIN_VALUE, 0);
df
.withColumn("prevPhase", functions.lag("phaseName", 1).over(windowSpecPrev))
.withColumn("phaseCount", functions.count("phaseId").over(windowSpecCount))
.withColumn("prevSame", when(col("prevPhase").equalTo(col("phaseName")),1).otherwise(0))
df =
| ID | groupId | phaseName | prevPhase | phaseCount | prevSame |
|----|-----------|-----------|-------------|------------|----------|
| 10 | someHash1 | PhaseA | null | 1 | 0 |
| 11 | someHash1 | PhaseB | PhaseA | 1 | 0 |
| 12 | someHash1 | PhaseB | PhaseB | 2 | 1 |
| 13 | someHash2 | PhaseX | null | 1 | 0 |
| 14 | someHash2 | PhaseY | PhaseX | 1 | 0 |
This is still not what I wanted to achieve but good enough for now
Further Ideas
To get the the name of the previous distinct phase I see three possibilities that I have not investigated thoroughly:
Implement an own lagfunction that does not take an offset but recursively checks the previous line until it finds a value that is different from the given line. (Though I don't think it's possible to use own analytic window functions in Spark SQL)
Find a way to dynamically set the offset of the lag function according to the value of phaseCount. (That may fail if the previous occurrences of the same phaseName do not appear in a single sequence)
Use a UserDefinedAggregateFunction over the window that stores the ID and phaseName of the first given input and seeks for the highest ID with different phaseName.
I was able to solve this problem in the following way:
Get the (ordinary) previous phase.
Introduce a new id that groups phases that occur in sequential order. (With help of this answer). This takes two steps. First checking whether the current and previous phase names are equal and assigning a groupCount value accordingly. Second computing a cumulative sum over this value.
Assign the previous phase of the first row of a sequential group to all its members.
Implementation
WindowSpec specGroup = Window.partitionBy(col("groupId"))
.orderBy(col("ID"));
WindowSpec specSeqGroupId = Window.partitionBy(col("groupId"))
.orderBy(col("ID"))
.rowsBetween(Long.MIN_VALUE, 0);
WindowSpec specPrevDiff = Window.partitionBy(col("groupId"), col("seqGroupId"))
.orderBy(col("ID"))
.rowsBetween(Long.MIN_VALUE, 0);
df.withColumn("prevPhase", coalesce(lag("phaseName", 1).over(specGroup), lit("NO_PREV")))
.withColumn("seqCount", when(col("prevPhase").equalTo(col("phaseName")).or(col("prevPhase").equalTo("NO_PREV")),0).otherwise(1))
.withColumn("seqGroupId", sum("seqCount").over(specSeqGroupId))
.withColumn("prevDiff", first("prevPhase").over(specPrevDiff));
Result
df =
| ID | groupId | phaseName | prevPhase | seqCount | seqGroupId | prevDiff |
|----|-----------|-----------|-----------|----------|------------|----------|
| 10 | someHash1 | PhaseA | NO_PREV | 0 | 0 | NO_PREV |
| 11 | someHash1 | PhaseB | PhaseA | 1 | 1 | PhaseA |
| 12 | someHash1 | PhaseB | PhaseA | 0 | 1 | PhaseA |
| 13 | someHash2 | PhaseX | NO_PREV | 0 | 0 | NO_PREV |
| 14 | someHash2 | PhaseY | PhaseX | 1 | 1 | PhaseX |
Any suggestions, specially in terms of efficiency of these operations are appreciated.
I guess you can use Spark window (row frame) functions. Check the api documentation and the following post.
https://databricks.com/blog/2015/07/15/introducing-window-functions-in-spark-sql.html
I need a solution for my problem here.
I got 2 tables, assetdetail and assetcondition. Here is the structure of those tables.
assetdetail
-----------------------------------------------------------
| sequenceindex | assetcode | assetname | acquisitionyear |
-----------------------------------------------------------
| 1 | 110 | Car | 2012-06-30 |
| 2 | 111 | Bus | 2013-02-12 |
assetcondition
--------------------------------------------------------------------------
|sequenceindex | indexassetdetail | fiscalyear | assetamount | assetprice |
---------------------------------------------------------------------------
| 1 | 1 | 2012 | 1 | 20000000 |
| 2 | 1 | 2013 | 1 | 15000000 |
| 3 | 2 | 2013 | 1 | 25000000 |
And i want the result is like this:
------------------------
assetname | assetprice |
------------------------
Car | 20000000 |
Bus | 25000000 |
Note: using "SELECT WHERE fiscalyear = "
Without explaining how your tables are linked one can only guess. Here's the query I came up with.
select assetdetail.assetname,
sum( assetcondition.assetprice )
from assetdetail
inner join assetcondition
on assetcondition.indexassetdetail = assetdetail.sequenceindex
where assetcondition.fiscalyear = 2013
group by assetdetail.assetname;
I haven't understand from a logical point of view your query. By the way the operator that you have to you use is the JOIN's one.
The SQL that follows, I don't know if it is what you want.
Select assetname, assetprice
From assetdetail as ad join assetcondition as ac on (as.sequenceindex = ac.sequenceindex)
Where fiscalyear = '2013'
Not quite sure if it is what you're looking for, but I guess what you want is a JOIN:
SELECT
assetdetail.assetname, assetcondition.assetprice
FROM
assetdetail
JOIN
assetcondition
ON
assetdetail.sequenceindex = assetcondition.sequenceindex
WHERE
YEAR(acquisitionyear) = '2013'
I have following table client_question table
+----+------------+---------+-----+------+------+
| id | is_deleted | version | cid | pqid | qtid |
+----+------------+---------+-----+------+------+
| 1 | | 0 | 1 | 1 | 1 |
| 2 | | 0 | 1 | 2 | 4 |
| 3 | | 0 | 1 | 2 | 4 |
+----+------------+---------+-----+------+------+
This is Parent_question table
+----+------------+---------+-----+------+
| id | is_deleted | version | pid | qid |
+----+------------+---------+-----+------+
| 1 | | 0 | 1 | 1 |
| 2 | | 0 | 1 | 2 |
| 3 | | 0 | 1 | 3 |
| 4 | | 0 | 1 | 4 |
| 5 | | 0 | 1 | 5 |
| 6 | | 0 | 1 | 6 |
| 7 | | 0 | 2 | 7 |
| 8 | | 0 | 2 | 1 |
| 9 | | 0 | 2 | 2 |
| 10 | | 0 | 2 | 8 |
| 11 | | 0 | 3 | 9 |
| 12 | | 0 | 3 | 1 |
| 13 | | 0 | 3 | 10 |
| 14 | | 0 | 3 | 11 |
| 15 | | 0 | 4 | 12 |
+----+------------+---------+-----+------+
And this is question_option
+----+------------+-----------+---------+
| id | is_deleted | name | version |
+----+------------+-----------+---------+
| 1 | | Excellent | 0 |
| 2 | | Good | 0 |
| 3 | | Fair | 0 |
| 4 | | Poor | 0 |
+----+------------+-----------+---------+
I want to retrieve JSON and send to front end via ajax so
I tried this way
public List<ClientQuestionOption> getSavedQuestionOptions(Long parentId,long clientId)
{
Client client = (Client) entityManagerUtil.find(Client.class, clientId);
List<ClientQuestionOption> questionsList = (List<ClientQuestionOption>)serviceClientDaoImpl.getSavedQuestionOptionsList(parentId,client);
System.out.println("The size is nnnnnnnnnn "+questionsList.size());
List optionsList =new ArrayList();
for(int i=0;i<questionsList.size();i++)
{
//optionsList.add(questionsList.get(i).getCqid().getId());
//optionsList.add(questionsList.get(i).getOid().getName());
Map map=new HashMap();
map.put("qid", questionsList.get(i).getCqid().getPqid().getQid().getId());
map.put("name", questionsList.get(i).getOid().getName());
optionsList.add(map);
}
return optionsList;
}
The JSON i got is like this
[
{
name: "Excellent",
qid: 2
},
{
name: "Poor",
qid: 2
}
],
But I want JSON like this
{
"options": [
"Poot",
"Excellent"
],
"qid": 2
},
Can anybody please tell me how to do so?
edit
Following way I am doing to make JSON
JSONObject object=new JSONObject();
List optionslist=null;
optionslist=(List<ClientQuestionOption>)serviceClientServiceImpl.getSavedQuestionOptions(parentId , Long.valueOf(clientId) );
object.accumulate("optionslist",optionslist );
return object.toString();
First, realise what the function returns (ArrayList) is not JSON at all. It seems that happens elsewhere. You'll need to replace JsonObject() with JsonArray() to change the [..] to '{..}`.
Second, to do the grouping per qid you could write a function that has List<ClientQuestionOption> as input ánd output.
You should check google GSON project, and here is what your code will look like:
public class ClientQuestionOption {
String name;
int qid;
public ClientQuestionOption() {
}
public ClientQuestionOption(String name, int qid) {
this.name = name;
this.qid = qid;
}
}
and then use it this way:
public static void main(String[] args) {
List<ClientQuestionOption> questionOptions = Arrays.asList(
new ClientQuestionOption("Excellent", 1),
new ClientQuestionOption("Excellent", 2),
new ClientQuestionOption("Poor", 3)
);
Gson gson = new GsonBuilder().setPrettyPrinting().create();
System.out.println(gson.toJson(questionOptions));
}
You only need to have Class with same names of fields (or declar Qualifier) and gson will handle conversion from/to json.
here is the output for the above code:
[
{
"name": "Excellent",
"qid": 1
},
{
"name": "Excellent",
"qid": 2
},
{
"name": "Poor",
"qid": 3
}
]