SQL: Left join on calendar table (spark SQL) - sql

I am trying to join data to a calendar table cross joined with user id, to get other columns corresponding to it. I have tried joining on date condition, without the date condition. Created a cross joined master table to left join the other data on. However, seems like I am missing something.
DATE_TBL looks like:
CAL_DT BUYER_ID
2019-03-31 1
2019-03-31 2
2019-03-31 3
2019-03-30 1
2019-03-30 2
2019-03-30 3
2019-03-29 1
2019-03-29 2
2019-03-29 3 ......
DATA2 looks like:
CREATED_DT BUYER_ID ITEM_PRICE
2019-03-31 1 10
2019-03-30 2. 12
2019-03-29 3. 45
2019-03-29 2. 13 ........
Here is my code:
WITH DATE_TBL AS
(
SELECT CAL.CAL_DT, CK.BUYER_ID
FROM DATA1 CAL
CROSS JOIN DATA2 CK
WHERE cal.CAL_DT BETWEEN '2018-01-01' AND '2019-03-31'
AND CK.BYR_CNTRY_ID IN (1,2,3) AND CK.CREATED_DT BETWEEN '2019-03-01' AND '2019-03-31'
GROUP BY 1,2
)
,
REVENUE_CALC AS
(
SELECT CAL.CAL_DT
,CK.BYR_CNTRY_ID
,CK.BUYER_ID
,CK.CREATED_DT AS CREATED_DT
,SUM(CK.ITEM_PRICE) AS ITEM_PRICE
,SUM(CK.QUANTITY) AS QUANTITY
,MAX(COALESCE(I.CURNCY_PLAN_RATE, 1)) AS CURNCY_PLAN_RATE
,SUM(CK.ITEM_PRICE *CK.QUANTITY *I.CURNCY_PLAN_RATE) AS REVENUE
FROM DATE_TBL CAL
LEFT JOIN DATA2 CK
ON CAL.BUYER_ID = CK.BUYER_ID AND CAL.CAL_DT = CK.CREATED_DT
LEFT JOIN DATA3 I
ON I.CURNCY_ID = CK.LSTG_CURNCY_ID
GROUP BY 1,2,3,4
ORDER BY CAL.CAL_DT DESC, CK.BUYER_ID
)
SELECT *
FROM REVENUE_CALC
Desired Result Must look like:
CAL_DT BUYER_ID ITEM ITEM_PRICE
2019-03-31 1. 10
2019-03-31 2. null
2019-03-31 3. null
2019-03-30 1. null
2019-03-30 2. 12
2019-03-30 3. null
2019-03-29 1. null
2019-03-29 2. 13
2019-03-29 3. 45......
What I get is only the data for common dates. Could someone help me understand what I am doing wrong?

Related

Add remaining value to next rows in sql server

I have table, as below and its contains customer electricity volume for the period as.Available data like
OwnerID StartDate EndDate Volume
1 2019-01-01 2019-01-15 10.40
1 2019-01-16 2019-01-31 5.80
1 2019-02-01 2019-02-10 7.90
1 2019-02-11 2019-02-28 8.50
2 2019-03-01 2019-03-04 10.50
And another table having their existing remaining volume. Both table are connected with Column OwnerID
OwnerID ExistingVolume
1 0.90
2 0.60
Now add (apply) the ExistingVolume with current Volume (first table) as
Calculate the new volume as whole numer and remaining decimal value add to next period to the customer.
So expected result set should like,
OwnerId StartDate EndDate CalulatedVolume RemainingExistingVolume
1 2019-01-01 2019-01-15 11 0.30
1 2019-01-16 2019-01-31 6 0.10
1 2019-02-01 2019-02-10 8 0.00
1 2019-02-11 2019-02-28 8 0.50
2 2019-03-01 2019-03-04 11 0.10
Don't round off the CalulatedVolume. Just get the whole when add the table1.Volume + table2.ExistingVolume.
And Remaining decimal value (from 1st row) should be applied the next row value table1.Volume
Could you someone suggest how to achieve this is in SQL query?
If I understand correctly, you want to accumulative the "error" from rounding and apply that against the value in the second table.
You can use a cumulative sum for this purpose -- along with some arithmetic:
select t1.ownerid, t1.startdate, t1.enddate,
round(t1.volume, 0) as calculatedvolume,
( sum( t1.volume - round(t1.volume, 0) ) over (partition by t1.ownerid order by t1.startdate) +
t2.existingvolume
) as remainingexisting
from table1 t1 left join
table2 t2
on t1.ownerid = t2.ownerid;
You have a non-standard definition of rounding. This can be implemented as ceil(x - 0.5). With this definition, the code is:
select t1.ownerid, t1.startdate, t1.enddate,
ceiling(t1.volume - 0.5) as calculatedvolume,
( sum( t1.volume - ceiling(t1.volume - 0.5) ) over (partition by t1.ownerid order by t1.startdate) +
t2.existingvolume
) as remainingexisting
from table1 t1 left join
table2 t2
on t1.ownerid = t2.ownerid;
Here is a db<>fiddle.

SQL query to check if the next row value is same or different

I am joining two tables based on a common column date. However, the column I am trying to get from one the table (cmg) in this case, should get next row value only if it is different from its previous row's value
Table A
Date comp.no
-----------------------
2019-03-08 5
2019-02-26 5
2019-01-17 5
2019-01-10 5
2018-12-27 5
Table B
Date cmg
-----------------
2019-07-17 NULL
2019-04-20 NULL
2019-02-26 RHB
2019-01-19 NULL
2019-01-17 RHB
2019-01-10 RMB
2018-12-28 NULL
2018-12-27 RHB
2018-12-12 RUB
2018-11-28 RUB
2018-10-20 NULL
2018-07-21 NULL
2018-04-21 NULL
2018-01-20 NULL
2017-10-21 NULL
2017-07-29 NULL
2017-05-07 NULL
2017-02-13 NULL
2016-11-22 NULL
2016-08-29 NULL
2016-06-07 NULL
2016-04-06 RUB
2016-03-21 RUB
2016-03-07 RUB
You can use lag function to compare with previous value. And for the first row you'll need an isnull() check since the first row won't have a previous value.
;with cte as(
select case
when isnull(lag(t2.cmg)over (order by t2.cmg desc),'') <>t2.cmg then 1 else 0 end as isresult
,t2.date,t2.cmg
from TableA t1
inner join TableB t2
on t1.date=t2.date
)
select date,cmg from cte where isresult=1
Use lag():
select date, cmg
from (select b.date, b.cmg, lag(b.cmg) over (order by b.date) as prev_cmg
from a join
b
on a.date = b.date
) b
where prev_cmg is null or prev_cmg <> cmg
order by date;

How to join a table to another one depending two date columns?

I have two tables which are
T1:
UserID Tier BeginDate EndDate
8278020 1 2019-03-02 18:33:04.893 2019-03-28 10:34:33.837
8278020 2 2019-03-28 10:34:33.837 2019-04-01 16:48:22.107
8278020 3 2019-04-01 16:48:22.107 2019-04-07 21:44:40.060
8278020 4 2019-04-07 21:44:40.060 2019-06-30 23:59:59.999
T2:
UserID GiftCardID UseDate OrderID IsUsed
8278020 165491838 2019-03-06 23057796 1
8278020 165491839 2019-03-10 23106429 1
8278020 165491840 2019-03-24 23277217 1
8278020 166418161 NULL NULL 0
8278020 166418162 NULL NULL 0
8278020 167026357 2019-04-22 23594414 1
8278020 167026358 2019-04-28 23668492 1
I want to match two tables such that I show the each tier of the customer when he/she used the giftcard.
For example, when the user used the Giftcard with '165491839' he was in tier 1.
Or at GiftCardID = '167026357' the tier is 4.
I couldn't find how to match the tables according to that.
I wait for your help...
Just use JOIN:
select t2.*, t1.tier
from table2 t2 left join
table1 t1
on t2.userid = t1.userid and
t2.usedate >= t1.begindate and
t2.userdate < t1.enddate;
This is a left join, so you won't lose rows if, for some reason, the dates don't match.

How duplicate a rows in SQL base on difference between date columns and divided aggregated column per duplicate row?

I have a table with some records about fuel consumption. The important columns in the table are: CONSUME_DATE_FROM and CONSUM_DATE_TO.
I want to calculate average fuel consumption per cars on a monthly basis but some rows are not in the same month. For example some have a three month difference between them and the total of gas per litre is aggregated in a single row.
Now I should find records that have difference more than a month between CONSUME_DATE_FROM and CONSUM_DATE_TO, and duplicate them in current or second table per count of month and divide the total gas per litre between related rows.
I've this table with the following data:
ID VehicleId CONSUME_DATE_FROM CONSUM_DATE_TO GAS_PER_LITER
1 100 2018-10-25 2018-12-01 600
2 101 2018-07-19 2018-07-24 100
3 102 2018-12-31 2019-01-01 400
4 103 2018-03-29 2018-05-29 200
5 104 2018-02-05 2018-02-09 50
The expected output table should be as below
ID VehicleId CONSUME_DATE_FROM CONSUM_DATE_TO GAS_PER_LITER
1 100 2018-10-25 2018-12-01 200
1 100 2018-10-25 2018-12-01 200
1 100 2018-10-25 2018-12-01 200
2 101 2018-07-19 2018-07-24 100
3 102 2018-12-31 2019-01-01 200
3 102 2018-12-31 2019-01-01 200
4 103 2018-03-29 2018-05-29 66.66
4 103 2018-03-29 2018-05-29 66.66
4 103 2018-03-29 2018-05-29 66.66
5 104 2018-02-05 2018-02-09 50
Or as below
ID VehicleId CONSUME_DATE_FROM CONSUM_DATE_TO GAS_PER_LITER DATE_RELOAD_GAS
1 100 2018-10-25 2018-12-01 200 2018-10-01
1 100 2018-10-25 2018-12-01 200 2018-11-01
1 100 2018-10-25 2018-12-01 200 2018-12-01
2 101 2018-07-19 2018-07-24 100 2018-07-01
3 102 2018-12-31 2019-01-01 200 2018-12-01
3 102 2018-12-31 2019-01-01 200 2019-01-01
4 103 2018-03-29 2018-05-29 66.66 2018-03-01
4 103 2018-03-29 2018-05-29 66.66 2018-04-01
4 103 2018-03-29 2018-05-29 66.66 2018-05-01
5 104 2018-02-05 2018-02-09 50 2018-02-01
Can someone please help me out with this query?
I'm using oracle database
Your business rule treats the difference between CONSUME_DATE_FROM and CONSUM_DATE_TO as absolute months. So you expect the difference between 2018-10-25 and 2018-12-01 to be three months whereas the difference in days actually equates to about 1.1 months. So we can't use simple date arithmetic to get your desired output, we need to do some additional massaging of the dates.
The query below implements your desired logic by deriving the first day of the month for CONSUME_DATE_FROM and the last day of the month for CONSUME_DATE_TO, then using ceil() to round the difference up to the nearest whole number of months.
This is calculated in a subquery which is used in the main query with the old connect by level trick to multiply a record by level number of times:
with cte as (
select f.*
, ceil(months_between(last_day(CONSUM_DATE_TO)
, trunc(CONSUME_DATE_FROM,'mm'))) as diff
from fuel_consumption f
)
select cte.id
, cte.VehicleId
, cte.CONSUME_DATE_FROM
, cte.CONSUM_DATE_TO
, cte.GAS_PER_LITER/cte.diff as GAS_PER_LITER
, add_months(trunc(cte.CONSUME_DATE_FROM, 'mm'), level-1) as DATE_RELOAD_GAS
from cte
connect by level <= cte.diff
and prior cte.id = cte.id
and prior sys_guid() is not null
;
"what about if add a additional column "DATE_RELOAD_GAS" that display difference date for similar rows"
From your posted sample it seems like DATE_RELOAD_GAS is the first day of the month for each month bounded by CONSUME_DATE_FROM and CONSUM_DATE_TO. I have amended my solution to implement this rule.
By using connect by level structure with considering to_char(c.CONSUME_DATE_FROM + level - 1,'yyyymm') as month I was able to resolve as below :
select ID, VehicleId, myMonth, CONSUME_DATE_FROM, CONSUM_DATE_TO,
trunc(GAS_PER_LITER/max(rn) over (partition by ID order by ID),2) as GAS_PER_LITER,
'01.'||substr(myMonth,5,2)||'.'||substr(myMonth,1,4) as DATE_RELOAD_GAS
from
(
with consumption( ID, VehicleId, CONSUME_DATE_FROM, CONSUM_DATE_TO, GAS_PER_LITER ) as
(
select 1,100,date'2018-10-25',date'2018-12-01',600 from dual union all
select 2,101,date'2018-07-19',date'2018-07-24',100 from dual union all
select 3,102,date'2018-12-31',date'2019-01-01',400 from dual union all
select 4,103,date'2018-03-29',date'2018-05-29',200 from dual union all
select 5,104,date'2018-02-05',date'2018-02-09', 50 from dual
)
select ID, to_char(c.CONSUME_DATE_FROM + level - 1,'yyyymm') myMonth,
VehicleId, c.CONSUME_DATE_FROM, c.CONSUM_DATE_TO, GAS_PER_LITER,
row_number() over (partition by ID order by ID) as rn
from dual join consumption c
on c.ID >= 2
group by ID, to_char(c.CONSUME_DATE_FROM + level - 1,'yyyymm'), VehicleId,
c.CONSUME_DATE_FROM, c.CONSUM_DATE_TO, c.GAS_PER_LITER
connect by level <= c.CONSUM_DATE_TO - c.CONSUME_DATE_FROM + 1
union all
select ID, to_char(c.CONSUME_DATE_FROM + level - 1,'yyyymm') myMonth,
VehicleId, c.CONSUME_DATE_FROM, c.CONSUM_DATE_TO, GAS_PER_LITER,
row_number() over (partition by ID order by ID) as rn
from dual join consumption c
on c.ID = 1
group by ID, to_char(c.CONSUME_DATE_FROM + level - 1,'yyyymm'), VehicleId,
c.CONSUME_DATE_FROM, c.CONSUM_DATE_TO, c.GAS_PER_LITER
connect by level <= c.CONSUM_DATE_TO - c.CONSUME_DATE_FROM + 1
) q
group by ID, VehicleId, myMonth, CONSUME_DATE_FROM, CONSUM_DATE_TO, GAS_PER_LITER, rn
order by ID, myMonth;
I met an interesting issue that if I consider the join condition in the subquery as c.ID >= 1 query hangs on for huge period of time, so splitted into two parts by union all
as c.ID >= 2 and c.ID = 1
Rextester Demo

what is the best method to Identify which pairs of rows have identical Products, Customers and Measures, and overlapping date ranges?

image of sample question where i have to identify duplicate rows then make date ranges not overlap.
The overlapping for row 1, 2 is represented as:
rows 1 and 2 are overlap , like this:
20130101 |--------------------| 20130401
20130301 |----------------------| 20131231
You can use T-Sql language in MS SQL Server:
select t1a.id , t1.id second_id,
t1.valid_from_day , t1.valid_to_day ,
t1a.valid_from_day second_valid_from_day ,
t1a.valid_to_day second_valid_to_day
from t1 t1a
cross apply
(
select * from t1
where t1.product = t1a.product
and t1.customer = t1a.customer
and t1.measure = t1a.measure
and t1.id <> t1a.id
and t1.valid_from_day >= t1a.valid_from_day -- overlap
and t1.valid_to_day >= t1a.valid_to_day
) t1
The results of the query is:
id second_id valid_from_day valid_to_day second_valid_from_day second_valid_to_day
1 2 2013-03-01 2013-12-31 2013-01-01 2013-04-01
4 5 2013-03-01 2014-04-01 2013-01-01 2013-04-01
9 10 2014-04-01 2015-01-01 2013-03-01 2013-12-31
so The pairs identical are:
pair 1,2
pair 4,5
pair 9,10