I've got few columns in my db. I want to choose one and then return all of the records where values are duplicated. So I want to like, get one column and check which values from my column appeared from the rest of the db. Then return this records. Let's say that database looks like this:
id;col1;col2;col3;col4
'1','ab','cd','ef','1'
'2','ad','bg','ee','5'
'3','xx','bg','cc','6'
'4','vv','zz','ff','4'
'5','zz','ee','gg','4'
'6','zz','vv','zz','2'
'7','vv','aa','bb','8'
'8','ww','nn','zz','4'
'9','zz','yy','ff','9'
'10','qq','oo','ii','3'
and I want my result for col1 to look like so
4,'vv','zz','ff',4
5,'zz','ee','gg',4
6,'zz','vv','zz',2
7,'vv','aa','bb',8
9,'zz','yy','ff',9
Here we present the duplicates in 2 different ways. The first is the format you have requested, with additional information. The second is more concise.
create table t1(
id varchar(10),
col1 varchar(10),
col2 varchar(10),
col3 varchar(10),
col4 varchar(10));
insert into t1 values
('1','ab','cd','ef','1'),
('2','ad','bg','ee','5'),
('3','xx','bg','cc','6'),
('4','vv','zz','ff','4'),
('5','zz','ee','gg','4'),
('6','zz','vv','zz','2'),
('7','vv','aa','bb','8'),
('8','ww','nn','zz','4'),
('9','zz','yy','ff','9'),
('10','qq','oo','ii','3');
select * from t1;
id | col1 | col2 | col3 | col4
:- | :--- | :--- | :--- | :---
1 | ab | cd | ef | 1
2 | ad | bg | ee | 5
3 | xx | bg | cc | 6
4 | vv | zz | ff | 4
5 | zz | ee | gg | 4
6 | zz | vv | zz | 2
7 | vv | aa | bb | 8
8 | ww | nn | zz | 4
9 | zz | yy | ff | 9
10 | qq | oo | ii | 3
with cte as(
select
id,
col1,
col2,
col3,
col4,
row_number() over
( partition by col1
order by id desc) r1,
row_number() over
( partition by col2
order by id desc) r2,
row_number() over
( partition by col3
order by id desc) r3,
row_number() over
( partition by col4
order by id desc) r4
from t1
)
select *
from cte
where
r1 > 1
or r2 > 1
or r3 > 1
or r4 > 1;
id | col1 | col2 | col3 | col4 | r1 | r2 | r3 | r4
:- | :--- | :--- | :--- | :--- | -: | -: | -: | -:
6 | zz | vv | zz | 2 | 2 | 1 | 2 | 1
5 | zz | ee | gg | 4 | 3 | 1 | 1 | 2
4 | vv | zz | ff | 4 | 2 | 1 | 2 | 3
2 | ad | bg | ee | 5 | 1 | 2 | 1 | 1
select 'col1' as "column",
col1 "value",
count(id) "count"
from t1 group by col1
having count(id)>1
union all
select 'col2',col2, count(id)
from t1 group by col2
having count(id)>1
union all
select 'col3',col3, count(id)
from t1 group by col3
having count(id)>1
union all
select 'col4',col4, count(id)
from t1 group by col4
having count(id)>1
order by "column","value";
column | value | count
:----- | :---- | ----:
col1 | vv | 2
col1 | zz | 3
col2 | bg | 2
col3 | ff | 2
col3 | zz | 2
col4 | 4 | 3
db<>fiddle here
Related
I am using select
SELECT
asl.id, asl.outstanding_principal as outstandingPrincipal, the_date as theDate, asl.interest_rate as interestRate, asl.interest_payment as interestPayment, asl.principal_payment as principalPayment,
asl.total_payment as totalPayment, asl.actual_delta as actualDelta, asl.outstanding_usd as outstandingUsd, asl.disbursement, asl.floating_index_rate as floatingIndexRate,
asl.upfront_fee as upfrontFee, asl.commitment_fee as commitmentFee, asl.other_fee as otherFee, asl.withholding_tax as withholdingTax, asl.default_fee as defaultFee,
asl.prepayment_fee as prepaymentFee, asl.total_out_flows as totalOutFlows, asl.net_flows as netFlows, asl.modified, asl.new_row as newRow, asl.interest_payment_modified as
interestPaymentModified, asl.date, asl.amortization_schedule_initial_id as amortizationScheduleInitialId, asl.tranche_id as trancheId, asl.user_id as userId, tr.local_currency_id as localCurrencyId,
f.facility_id
FROM
GENERATE_SERIES
(
(SELECT MIN(ams.date) FROM amortization_schedules ams),
(SELECT MAX(ams.date) + INTERVAL '1' MONTH FROM amortization_schedules ams),
'1 MONTH'
) AS tab (the_date)
FULL JOIN amortization_schedules asl on to_char(the_date, 'yyyy-mm') = to_char(asl.date, 'yyyy-mm')
LEFT JOIN tranches tr ON asl.tranche_id = tr.id
LEFT JOIN facilities f on tr.facility_id = f.id
In this select, I'm using generate_series to get each month since there are no records in the database for each month. But the matter is that this select gives me superfluous results. I use this select in my Spring Boot application. But the fact is that I need all the data, and for example only with a certain facility_id , and when I insert a condition
WHERE f.id = :id and tr.tranche_number_id = :trancheNumberId
My generate_series stops working (as I understand it, because I set certain conditions for generating a request) and instead of 30 lines, I get only 3.
How do I keep the ability to generate the theDate by month, with the ability to select specific IDs
I tried different options.
With this option:
FULL JOIN amortization_schedules asl on to_char(the_date, 'yyyy-mm') = to_char(asl.date, 'yyyy-mm')
| id | outstantandingprincipal | thedate |
-------------------------------------------------------------------
| 1 | 10000 | 2022-05-16 00:00:00.000000 |
| 2 | 50000 | 2023-05-16 00:00:00.000000 |
| 3 | 0 | 2024-05-16 00:00:00.000000 |
In this case, it does not work correctly, since months are not generated and only three lines are displayed (if it is (the_date, 'yyyy-MM') = to_char(asl.date, 'yyyy-MM')).
If I change to (the_date, 'yyyy') = to_char(asl.date, 'yyyy') then the generation works, but it doesn't work correctly because it is year oriented.
| id | outstantandingprincipal | thedate |
-------------------------------------------------------------------
| 1 | 10000 | 2022-05-16 00:00:00.000000 |
| 1 | 10000 | 2022-06-16 00:00:00.000000 |
| 1 | 10000 | 2022-06-16 00:00:00.000000 |
| 1 | 10000 | 2022-07-16 00:00:00.000000 |
... ... ....
| 1 | 10000 | 2022-12-16 00:00:00.000000 |
| 2 | 50000 | 2023-01-16 00:00:00.000000 |
| 2 | 50000 | 2023-02-16 00:00:00.000000 |
| 2 | 50000 | 2023-03-16 00:00:00.000000 |
| 2 | 50000 | 2023-04-16 00:00:00.000000 |
... ... ....
| 3 | 0 | 2024-01-16 00:00:00.000000 |
but it should be:
| id | outstantandingprincipal | thedate |
-------------------------------------------------------------------
| 1 | 10000 | 2022-05-16 00:00:00.000000 |
| 1 | 10000 | 2022-06-16 00:00:00.000000 |
| 1 | 10000 | 2022-06-16 00:00:00.000000 |
| 1 | 10000 | 2022-07-16 00:00:00.000000 |
... ... ....
| 1 | 10000 | 2023-04-16 00:00:00.000000 |
| 2 | 50000 | 2023-05-16 00:00:00.000000 |
| 2 | 50000 | 2023-06-16 00:00:00.000000 |
| 2 | 50000 | 2023-07-16 00:00:00.000000 |
| 2 | 50000 | 2023-08-16 00:00:00.000000 |
... ... ....
| 3 | 0 | 2024-05-16 00:00:00.000000 |
| 3 | 0 | 2024-06-16 00:00:00.000000 |
| 3 | 0 | 2024-07-16 00:00:00.000000 |
I'm making a few intuitive leaps here, so if something looks off it might be because I don't have the entire picture.
From what I can tell you want the amortization schedule starting from the "date" for each ID and then going out a specific amount of time. I am guessing it is not truly the max date in that entire table, and that it varies by ID. In your example you went out one year, so for now I'm going with that.
You can use a generate_series inline, which will explode out each row. I believe something like this will give you the output you seek:
with schedule as (
select
id,
generate_series (date, date + interval '1 year', interval '1 month')::date as dt
from
amortization_schedules
)
select
asl.id, s.dt, asl.outstanding_principal
from
amortization_schedules asl
join schedule s on asl.id = s.id
JOIN tranches tr ON asl.tranche_id = tr.id
JOIN facilities f on tr.facility_id = f.id
WHERE
f.id = :id and
tr.tranche_number_id = :trancheNumberId
Is there another field that tells, by id, when the payments should end or one that will let us derive it (number of payments, payment end date, etc)?
One final note. If you use [left] outer joins and a where clause, as in below:
LEFT JOIN tranches tr ON asl.tranche_id = tr.id
LEFT JOIN facilities f on tr.facility_id = f.id
WHERE
f.id = :id and
tr.tranche_number_id = :trancheNumberId
You have effectively nullified the "left" and made these inner joins. In this case, get rid of "left," not because it will return wrong results but because it misleads. You are saying those fields must have those specific values, which means they must first exist. That's an inner join.
If you truly wanted these as left joins, this would have been more appropriate, but I don't think this is what you meant:
LEFT JOIN tranches tr ON
asl.tranche_id = tr.id and
tr.tranche_number_id = :trancheNumberId
LEFT JOIN facilities f on
tr.facility_id = f.id and
f.id = :id
I have a dataframe df1 of the format
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| A | z | m |
| B | w | n |
| C | x | o |
| A | z | n |
| A | p | o |
+------+------+------+
and another dataframe df2 of the format
+------+------+
| Col1 | Col2 |
+------+------+
| 0-A | 0-z |
| 1-B | 3-w |
| 2-C | 1-x |
| | 2-P |
+------+------+-
I am trying to replace the values in Col1 and Col2 of df1 with values from df2 using Spark Java.
The end dataframe df3 should look like this.
+------+------+------+
| Col1 | Col2 | Col3 |
+------+------+------+
| 0-A | 0-z | m |
| 1-B | 3-w | n |
| 2-C | 1-x | o |
| 0-A | 0-z | n |
| 0-A | 2-p | o |
+------+------+------+
I am trying to replace all the values in the column1 and column2 of df1 with values from col1 and col2 of df2.
Is there anyway that i can achieve this in Spark Java dataframe syntax.?
The initial idea i had was to do the following.
String pattern1="\\p{L}+(?: \\p{L}+)*$";
df1=df1.join(df2, df1.col("col1").equalTo(regexp_extract(df2.col("col1"),pattern1,1)),"left-semi");
Replace your last join operation with below join.
df1.alias("x").join(df2.alias("y").select(col("y.Col1").alias("newCol1")), col("x.Col1") === regexp_extract(col("newCol1"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
.withColumn("Col1", col("newCol1"))
.join(df2.alias("z").select(col("z.Col2").alias("newCol2")), col("x.Col2") === regexp_extract(col("newCol2"),"\\p{L}+(?: \\p{L}+)*$",0), "left")
.withColumn("Col2", col("newCol2"))
.drop("newCol1", "newCol2")
.show(false)
+----+----+----+
|Col1|Col2|Col3|
+----+----+----+
|2-C |1-x |o |
|0-A |0-z |m |
|0-A |0-z |n |
|0-A |2-p |o |
|1-B |3-w |n |
+----+----+----+
I have an Oracle Database with a table containing asset price data for which I want to calculate a 10-day moving average in a separate column.
Is it faster to use SQL for this or should I load the data into a Java Hashmap/ ArrayList first, do the calculation and transfer the results it back to the Oracle DB afterwards?
The Table looks like this:
| ASSET_ID | PRICE | DATE | MA |
-----------------------------------------
| 43 | 33.12 | 2018-09-17 | 33.05 |
| 43 | 34.02 | 2018-09-18 | 33.07 |
| 43 | 30.22 | 2018-09-19 | 33.01 |
| 43 | 31.52 | 2018-09-20 | 32.85 |
Use AVG( PRICE ) OVER ( PARTITION BY asset_id ORDER BY "DATE" RANGE BETWEEN 10 PRECEDING AND 0 FOLLOWING ) to get a moving average:
SQL Fiddle
Oracle 11g R2 Schema Setup:
CREATE TABLE table_name ( ASSET_ID, PRICE, "DATE", MA ) AS
SELECT 43, 33.12, DATE '2018-09-17', CAST( NULL AS NUMBER(8,2) ) FROM DUAL UNION ALL
SELECT 43, 34.02, DATE '2018-09-18', NULL FROM DUAL UNION ALL
SELECT 43, 30.22, DATE '2018-09-19', NULL FROM DUAL UNION ALL
SELECT 43, 31.52, DATE '2018-09-20', NULL FROM DUAL UNION ALL
SELECT 43, 32.52, DATE '2018-09-21', NULL FROM DUAL UNION ALL
SELECT 43, 33.52, DATE '2018-09-22', NULL FROM DUAL UNION ALL
SELECT 43, 34.52, DATE '2018-09-23', NULL FROM DUAL UNION ALL
SELECT 43, 35.52, DATE '2018-09-24', NULL FROM DUAL UNION ALL
SELECT 43, 36.52, DATE '2018-09-25', NULL FROM DUAL UNION ALL
SELECT 43, 37.52, DATE '2018-09-26', NULL FROM DUAL UNION ALL
SELECT 43, 38.52, DATE '2018-09-27', NULL FROM DUAL UNION ALL
SELECT 43, 39.52, DATE '2018-09-28', NULL FROM DUAL UNION ALL
SELECT 43, 40.52, DATE '2018-09-29', NULL FROM DUAL UNION ALL
SELECT 43, 41.52, DATE '2018-09-30', NULL FROM DUAL;
Query 1:
MERGE INTO table_name dst
USING (
SELECT ROWID rid,
ROUND(
AVG( price ) OVER (
PARTITION BY asset_id
ORDER BY "DATE"
RANGE BETWEEN 10 PRECEDING AND 0 FOLLOWING
),
2
) AS new_MA
FROM table_name
) src
ON ( dst.ROWID = src.rid )
WHEN MATCHED THEN
UPDATE SET MA = src.new_MA
Results:
14 Rows Updated
Query 2:
SELECT *
FROM table_name
Results:
| ASSET_ID | PRICE | DATE | MA |
|----------|-------|----------------------|-------|
| 43 | 33.12 | 2018-09-17T00:00:00Z | 33.12 |
| 43 | 34.02 | 2018-09-18T00:00:00Z | 33.57 |
| 43 | 30.22 | 2018-09-19T00:00:00Z | 32.45 |
| 43 | 31.52 | 2018-09-20T00:00:00Z | 32.22 |
| 43 | 32.52 | 2018-09-21T00:00:00Z | 32.28 |
| 43 | 33.52 | 2018-09-22T00:00:00Z | 32.49 |
| 43 | 34.52 | 2018-09-23T00:00:00Z | 32.78 |
| 43 | 35.52 | 2018-09-24T00:00:00Z | 33.12 |
| 43 | 36.52 | 2018-09-25T00:00:00Z | 33.5 |
| 43 | 37.52 | 2018-09-26T00:00:00Z | 33.9 |
| 43 | 38.52 | 2018-09-27T00:00:00Z | 34.32 |
| 43 | 39.52 | 2018-09-28T00:00:00Z | 34.9 |
| 43 | 40.52 | 2018-09-29T00:00:00Z | 35.49 |
| 43 | 41.52 | 2018-09-30T00:00:00Z | 36.52 |
Here is the table for the employee's logs:
And what I want is to generate the time ins and time out of employees. like this:
Can anyone help me for this? Any added logic or algorithm will be accepted.
This is one way. And it will work for day/night any shifts, provided, the first min(datetime) represent IN.
Rextester Sample
select t.enno
,max(datetime) as time_out
,min(datetime) as time_in
,time_to_sec(timediff(max(datetime), min(datetime) )) / 3600
as No_of_hours
from
(
SELECT
floor(#row1 := #row1 + 0.5) as day,
t.*
FROM Table4356 t,
(SELECT #row1 := 0.5) r1
order by t.datetime
) t
group by t.day,t.enno
;
Output
+------+---------------------+---------------------+-------------+
| enno | time_out | time_in | No_of_hours |
+------+---------------------+---------------------+-------------+
| 6 | 16.05.2017 06:30:50 | 15.05.2017 18:30:50 | 12,0000 |
| 6 | 17.05.2017 05:30:50 | 16.05.2017 18:10:50 | 11,3333 |
+------+---------------------+---------------------+-------------+
Explanation:
SELECT
floor(#row1 := #row1 + 0.5) as day,
t.*
FROM Table4356 t,
(SELECT #row1 := 0.5) r1
order by t.datetime
This query uses sequence to increment #row1 with 0.5, so you will get 1 1.5 2 2.5. Now if you just get the integer part of with with floor, you will generate sequece like 1 1 2 2. So this query will give you this output
+-----+------+---------------------+
| day | enno | datetime |
+-----+------+---------------------+
| 1 | 6 | 15.05.2017 18:30:50 |
| 1 | 6 | 16.05.2017 06:30:50 |
| 2 | 6 | 16.05.2017 18:10:50 |
| 2 | 6 | 17.05.2017 05:30:50 |
+-----+------+---------------------+
Now you can group by day and get max and min time.
I have a requirement of picking top 2 students within each subject. Here is my table and the query which I am using to get that.
CREATE TABLE `students` (
`student` varchar(10) DEFAULT NULL,
`subject` varchar(10) DEFAULT NULL,
`marks` int(10) DEFAULT NULL
);
INSERT INTO students VALUES
('Deepak', 'Maths', 100),
('Neha', 'Maths', 90),
('Jyoti', 'Maths', 80),
('Ashwini', 'Maths', 70),
('Amit', 'Maths', 30),
('Sandeep', 'Maths', 95),
('Cinni', 'Maths', 86),
('Anand', 'Maths', 75),
('Deepak', 'Science', 100),
('Neha', 'Science', 90),
('Jyoti', 'Science', 80),
('Ashwini', 'Science', 70),
('Amit', 'Science', 30),
('Sandeep', 'Science', 95),
('Cinni', 'Science', 86),
('Anand', 'Science', 75),
('Deepak', 'History', 100),
('Neha', 'History', 90),
('Jyoti', 'History', 80),
('Ashwini', 'History', 70),
('Amit', 'History', 30),
('Sandeep', 'History', 95),
('Cinni', 'History', 86),
('Anand', 'History', 75);
mysql> SELECT * FROM students
+---------+---------+-------+
| student | subject | marks |
+---------+---------+-------+
| Deepak | Maths | 100 |
| Neha | Maths | 90 |
| Jyoti | Maths | 80 |
| Ashwini | Maths | 70 |
| Amit | Maths | 30 |
| Sandeep | Maths | 95 |
| Cinni | Maths | 86 |
| Anand | Maths | 75 |
| Deepak | Science | 100 |
| Neha | Science | 90 |
| Jyoti | Science | 80 |
| Ashwini | Science | 70 |
| Amit | Science | 30 |
| Sandeep | Science | 95 |
| Cinni | Science | 86 |
| Anand | Science | 75 |
| Deepak | History | 100 |
| Neha | History | 90 |
| Jyoti | History | 80 |
| Ashwini | History | 70 |
| Amit | History | 30 |
| Sandeep | History | 95 |
| Cinni | History | 86 |
| Anand | History | 75 |
+---------+---------+-------+
24 rows in set (0.00 sec)
mysql> Set character_set_connection=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> Set character_set_results=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> Set character_set_client=latin1;
Query OK, 0 rows affected (0.00 sec)
mysql> SET #rowcnt := 0; SET #grp := ''; SELECT d.* FROM (
-> SELECT
-> cs.*,
-> #rowcnt := IF(#grp != cs.subject, 1, #rowcnt + 1) AS rowcnt,
-> #grp := cs.subject
-> FROM (
-> SELECT * FROM students ORDER BY subject, marks DESC
-> ) cs
-> ) d
-> WHERE d.rowcnt < 3;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
+---------+---------+-------+--------+--------------------+
| student | subject | marks | rowcnt | #grp := cs.subject |
+---------+---------+-------+--------+--------------------+
| Deepak | History | 100 | 1 | History |
| Sandeep | History | 95 | 2 | History |
| Deepak | Maths | 100 | 1 | Maths |
| Sandeep | Maths | 95 | 2 | Maths |
| Deepak | Science | 100 | 1 | Science |
| Sandeep | Science | 95 | 2 | Science |
+---------+---------+-------+--------+--------------------+
6 rows in set (0.00 sec)
mysql>
Now, everything works fine from the console, but when I execute the same query in Spring JdbcTemplate, it gives me error.
jdbcTemplate.query(query, new StudentRowMapper());
The query prints out to following which is exactly same as the query which I am using on command line.
SET #rowcnt := 0; SET #grp := ''; SELECT d.* FROM ( SELECT cs.*, #rowcnt := IF(#grp != cs.subject, 1, #rowcnt + 1) AS rowcnt, #grp := cs.subject FROM ( SELECT * FROM students ORDER BY subject, marks DESC ) cs ) d WHERE d.rowcnt < 3;
Here is the error which I get while running this:
com.mysql.jdbc.exceptions.jdbc4.MySQLSyntaxErrorException: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'SET #grp := ''; SELECT d.* FROM ( SELECT cs.*, ' at line 1
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
at com.mysql.jdbc.Util.getInstance(Util.java:386)
at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1054)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4190)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4122)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2570)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2731)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2818)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2157)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2324)
at org.springframework.jdbc.core.JdbcTemplate$1.doInPreparedStatement(JdbcTemplate.java:646)
at org.springframework.jdbc.core.JdbcTemplate.execute(JdbcTemplate.java:589)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:639)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:668)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:676)
at org.springframework.jdbc.core.JdbcTemplate.query(JdbcTemplate.java:716)
Just place the vars in the select like this
SELECT d.*
FROM
( SELECT cs.*, #rowcnt := IF(#grp != cs.subject, 1, #rowcnt + 1) AS rowcnt, #grp := cs.subject
FROM ( SELECT * FROM (select #rowcnt :=0, #grp :='') a,students ORDER BY subject, marks DESC ) cs ) d
WHERE d.rowcnt < 3;