Interview question:How to get last 3 month aggregation at column level? - sql

This is the question i was being asked at Apple onsite interview and it blew my mind. Data is like this:
orderdate,unit_of_phone_sale
20190806,3000
20190704,3789
20190627,789
20190503,666
20190402,765
I had to write a query to get the result for each month sale, we should have last 3 month sales values. Let me put the expected output here.
order_monnth,M-1_Sale, M-2_Sale, M-3_Sale
201908,3000,3789,789,666
201907,3789,789,666,765
201906,789,666,765,0
201905,666,765,0,0
201904,765,0,0
I could only got the month wise sale and and used case statement by hardcoding month which was wrong. I banged my head to write this sql, but i could not.
Can anyone help on this. It will be really helpful for me to prepare for sql interviews
Update: This is what i tried
with abc as(
select to_char(order_date,'YYYYMM') as yearmonth,to_char(order_date,'YYYY') as year,to_char(order_date,'MM') as moth, sum(unit_of_phone_sale) as unit_sale
from t1 group by to_char(order_date,'YYYYMM'),to_char(order_date,'YYYY'),to_char(order_date,'MM'))
select yearmonth, year, case when month=01 then unit_sale else 0 end as M1_Sale,
case when month=02 then unit_sale else 0 end as M2_Sale...
case when month=12 then unit_sale else 0 end as M12_Sale
from abc

You will first of all need to sum the month's data and then use the LAG function to get previous months' data as following:
SELECT
ORDER_MONTH,
LAG(UNIT_OF_PHONE_SALE, 1) OVER(
ORDER BY
ORDER_MONTH
) AS "M-1_Sale",
LAG(UNIT_OF_PHONE_SALE, 2) OVER(
ORDER BY
ORDER_MONTH
) AS "M-2_Sale",
LAG(UNIT_OF_PHONE_SALE, 3) OVER(
ORDER BY
ORDER_MONTH
) AS "M-3_Sale"
FROM
(
SELECT
TO_CHAR(ORDERDATE, 'YYYYMM') AS ORDER_MONTH,
SUM(UNIT_OF_PHONE_SALE) AS UNIT_OF_PHONE_SALE
FROM
DATAA
GROUP BY
TO_CHAR(ORDERDATE, 'YYYYMM')
)
ORDER BY
ORDER_MONTH DESC;
Output:
ORDER_ M-1_Sale M-2_Sale M-3_Sale
------ ---------- ---------- ----------
201908 3789 789 666
201907 789 666 765
201906 666 765
201905 765
201904
db<>fiddle demo
Cheers!!
-- Update --
For the requirement mentioned in the comments, Following query will work for it.
CTE AS (
SELECT
TRUNC(ORDERDATE, 'MONTH') AS ORDER_MONTH,
SUM(UNIT_OF_PHONE_SALE) AS UNIT_OF_PHONE_SALE
FROM
DATAA
GROUP BY
TRUNC(ORDERDATE, 'MONTH')
)
SELECT
TO_CHAR(C.ORDER_MONTH,'YYYYMM') as ORDER_MONTH,
NVL(C1.UNIT_OF_PHONE_SALE, 0) AS "M-1_Sale",
NVL(C2.UNIT_OF_PHONE_SALE, 0) AS "M-2_Sale",
NVL(C3.UNIT_OF_PHONE_SALE, 0) AS "M-3_Sale"
FROM
CTE C
LEFT JOIN CTE C1 ON ( C1.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 1) )
LEFT JOIN CTE C2 ON ( C2.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 2) )
LEFT JOIN CTE C3 ON ( C3.ORDER_MONTH = ADD_MONTHS(C.ORDER_MONTH, - 3) )
ORDER BY
C.ORDER_MONTH DESC
Output:
db<>fiddle demo of updated answer.
Cheers!!

I think LEAD function can help here -
SELECT TO_CHAR(orderdate, 'YYYYMM') "DATE"
,unit_of_phone_sale M_1_Sale
,LEAD(unit_of_phone_sale,1,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_2_Sale
,LEAD(unit_of_phone_sale,2,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_3_Sale
,LEAD(unit_of_phone_sale,3,0) OVER(ORDER BY TO_CHAR(orderdate, 'YYYYMM') DESC) M_4_Sale
FROM table_sales
Here is the DB Fiddle

You can use this query:
select a.order_month, a.unit_of_phone_sale,
LEAD(unit_of_phone_sale, 1, 0) OVER (ORDER BY rownum) AS M_1,
LEAD(unit_of_phone_sale, 2, 0) OVER (ORDER BY rownum) AS M_2,
LEAD(unit_of_phone_sale, 3, 0) OVER (ORDER BY rownum) AS M_3
from (
select TO_CHAR(orderdate, 'YYYYMM') order_month,
unit_of_phone_sale,
rownum
from Y
order by order_month desc) a

Related

SQL QUERY to get previous date

PERIOD_SERV
PERSON_NUMBER DATE_sTART PERIOD_ID
10 06-JAN-2020 192726
10 04-APR-2019 12827
11 01-FEB-2021 282726
11 09-APR-2018 827266
For each person_number I want to add a column with previous date start. When i am using the below query, it is giving me repeated rows.
I want to get only row, with an additional column of the most recent "last date_start". For example -
PERSON_NUMBER DATE_sTART PERIOD_ID PREVIOUS_DATE
10 06-JAN-2020 192726 04-APR-2019
11 01-FEB-2021 282726 09-APR-2018
I am using the below query but getting two rows,
SELECT person_number,
period_id AS pv_period_id,
LAG(date_start) OVER ( PARTITION BY person_number ORDER BY date_start) AS previous_date
FROM period_serv
You can restrict the set of rows in the outer query
select person_number, pv_period_id, PREVIOUS_DATE
from (
select person_number,
PERIOD_ID pv_period_id,
lag(date_start) OVER ( partition BY person_number order by DATE_sTART ) PREVIOUS_DATE ,
row_number() OVER ( partition BY person_number order by DATE_sTART desc) rn
from period_serv
) t
where rn = 1
One option is to use MAX(..) KEEP (DENSE_RANK ..) OVER (PARTITION BY ..) analytic function such as
WITH p AS
(
SELECT MAX(date_start) KEEP (DENSE_RANK FIRST ORDER BY date_start)
OVER (PARTITION BY person_number) AS previous_date,
p.*
FROM period_serv p
)
SELECT p.person_number, p.date_start, p.period_id, p.previous_date
FROM p
JOIN period_serv ps
ON ps.person_number = p.person_number
AND ps.period_id = p.period_id
WHERE ps.date_start != previous_date
Demo

How to get the validity date range of a price from individual daily prices in SQL

I have some prices for the month of January.
Date,Price
1,100
2,100
3,115
4,120
5,120
6,100
7,100
8,120
9,120
10,120
Now, the o/p I need is a non-overlapping date range for each price.
price,from,To
100,1,2
115,3,3
120,4,5
100,6,7
120,8,10
I need to do this using SQL only.
For now, if I simply group by and take min and max dates, I get the below, which is an overlapping range:
price,from,to
100,1,7
115,3,3
120,4,10
This is a gaps-and-islands problem. The simplest solution is the difference of row numbers:
select price, min(date), max(date)
from (select t.*,
row_number() over (order by date) as seqnum,
row_number() over (partition by price, order by date) as seqnum2
from t
) t
group by price, (seqnum - seqnum2)
order by min(date);
Why this works is a little hard to explain. But if you look at the results of the subquery, you will see how the adjacent rows are identified by the difference in the two values.
SELECT Lag.price,Lag.[date] AS [From], MIN(Lead.[date]-Lag.[date])+Lag.[date] AS [to]
FROM
(
SELECT [date],[Price]
FROM
(
SELECT [date],[Price],LAG(Price) OVER (ORDER BY DATE,Price) AS LagID FROM #table1 A
)B
WHERE CASE WHEN Price <> ISNULL(LagID,1) THEN 1 ELSE 0 END = 1
)Lag
JOIN
(
SELECT [date],[Price]
FROM
(
SELECT [date],Price,LEAD(Price) OVER (ORDER BY DATE,Price) AS LeadID FROM [#table1] A
)B
WHERE CASE WHEN Price <> ISNULL(LeadID,1) THEN 1 ELSE 0 END = 1
)Lead
ON Lag.[Price] = Lead.[Price]
WHERE Lead.[date]-Lag.[date] >= 0
GROUP BY Lag.[date],Lag.[price]
ORDER BY Lag.[date]
Another method using ROWS UNBOUNDED PRECEDING
SELECT price, MIN([date]) AS [from], [end_date] AS [To]
FROM
(
SELECT *, MIN([abc]) OVER (ORDER BY DATE DESC ROWS UNBOUNDED PRECEDING ) end_date
FROM
(
SELECT *, CASE WHEN price = next_price THEN NULL ELSE DATE END AS abc
FROM
(
SELECT a.* , b.[date] AS next_date, b.price AS next_price
FROM #table1 a
LEFT JOIN #table1 b
ON a.[date] = b.[date]-1
)AA
)BB
)CC
GROUP BY price, end_date

Query to get the sum of values for the maximum number of months

This query in Oracle 11 gets the sum of value for the last 1 years, and it works when there are 1 years of data.
When there is less than 1 years of data, this query returns 0, instead of the sum of values until whatever the oldest years are.
For example, if there are only 6 months of data, the query should return the sum of values until the 6th month.
SELECT SUM (DECODE (rnk, 11, rt, 0)) 1Y
FROM (SELECT entity_id,rnk,
SUM (ABS(NVL (value, 0))) OVER (PARTITION BY TRIM (entity_id) ORDER BY rnk) rt
FROM (SELECT psm.*,RANK () OVER (PARTITION BY entity_id ORDER BY period_end_date DESC) AS rnk
FROM myTable psm
WHERE psm.entity_id = '1'
ORDER BY period_end_date DESC
) rank_tab
WHERE rnk < 12
);
If the biggest rank is 6, the result from the above query is 0
I attempted this, but got the error "ORA-00978: nested group function without GROUP BY"
SELECT case when rnk < 11
then SUM (DECODE (rnk, Max(rnk), rt, 0))
else SUM (DECODE (rnk, 11, rt, 0))
end as Y
FROM (SELECT entity_id,rnk,
SUM (ABS(NVL (value, 0))) OVER (PARTITION BY TRIM (entity_id) ORDER BY rnk) rt
FROM (SELECT psm.*,RANK () OVER (PARTITION BY entity_id ORDER BY period_end_date DESC) AS rnk
FROM myTable psm
WHERE psm.entity_id = '1'
ORDER BY period_end_date DESC
) rank_tab
WHERE rnk < 12
);
Sample data:
entity_id value period_end_date
1 1 9/30/19
1 2 8/31/19
1 3 7/31/19
1 4 6/30/19
1 5 5/31/19
1 6 4/30/19
In the above example, 1Y should return 1+2+3+4+5+6 = 21.
Instead my query returns 0 because it is looking for rnk = 11, which doesn't exist.
SUM (DECODE (rnk, 11, rt, 0)) 1Y
Thank you.
EDIT:
This works. But, if you know of a better way to do it, please let me know. Thank you.
SELECT
CASE WHEN MRank < 11 then maxY else OneY end as lc_incearned_1Y
FROM (
WITH R as
(SELECT MAX(RNK) MaxRank FROM (
SELECT RANK () OVER (PARTITION BY TRIM (entity_id) ORDER BY period_end_date
DESC) AS rnk FROM myTbl psm
WHERE TRIM (psm.entity_id) = '1' AND period_end_date <
to_date('9/30/2019','MM/DD/YYYY')
ORDER BY period_end_date DESC))
select MAX(MaxRank) MRank,
SUM (DECODE (rnk, MaxRank, rt, 0)) maxY,
SUM (DECODE (rnk, 11, rt, 0)) OneY, --13051.97
FROM (SELECT entity_id,rnk,
SUM (ABS (NVL (value, 0))) OVER (PARTITION BY TRIM (entity_id) ORDER BY rnk) rt
FROM (SELECT psm.*,RANK () OVER (PARTITION BY TRIM (entity_id) ORDER BY period_end_date DESC) AS rnk FROM CREF.PORTFOLIO_SUMM_MTHEND psm
WHERE TRIM (psm.entity_id) = '1' AND period_end_date < to_date('9/30/2019','MM/DD/YYYY')
ORDER BY period_end_date DESC) rank_tab WHERE rnk < 12) T,R)
It seems you need to sum your values up starting from the latest period_end_date to the earliest date within the eleven months range. It would be suitable to use max(period_end_date) over (partition by entity_id order by period_end_date desc) analytic function along with your current rank() function. And then apply months_between(<max_period_end_date>,period_end_date). If your need to look up from the current date, then get rid of max() analytic function and replace <max_period_end_date> with trunc(sysdate) in months_between() function. So, use :
with t as
(
select max(period_end_date) over (partition by entity_id order by period_end_date desc) as mx,
rank() over (partition by entity_id order by period_end_date desc) as rnk,
t.*
from myTable t
)
select sum(nvl(value,0)) as sum_value
from t
where months_between(mx,period_end_date)<=11
Demo

Group by in columns and rows, counts and percentages per day

I have a table that has data like following.
attr |time
----------------|--------------------------
abc |2018-08-06 10:17:25.282546
def |2018-08-06 10:17:25.325676
pqr |2018-08-05 10:17:25.366823
abc |2018-08-06 10:17:25.407941
def |2018-08-05 10:17:25.449249
I want to group them and count by attr column row wise and also create additional columns in to show their counts per day and percentages as shown below.
attr |day1_count| day1_%| day2_count| day2_%
----------------|----------|-------|-----------|-------
abc |2 |66.6% | 0 | 0.0%
def |1 |33.3% | 1 | 50.0%
pqr |0 |0.0% | 1 | 50.0%
I'm able to display one count by using group by but unable to find out how to even seperate them to multiple columns. I tried to generate day1 percentage with
SELECT attr, count(attr), count(attr) / sum(sub.day1_count) * 100 as percentage from (
SELECT attr, count(*) as day1_count FROM my_table WHERE DATEPART(week, time) = DATEPART(day, GETDate()) GROUP BY attr) as sub
GROUP BY attr;
But this also is not giving me correct answer, I'm getting all zeroes for percentage and count as 1. Any help is appreciated. I'm trying to do this in Redshift which follows postgresql syntax.
Let's nail the logic before presenting:
with CTE1 as
(
select attr, DATEPART(day, time) as theday, count(*) as thecount
from MyTable
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
select t1.attr, t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
From here you can pivot to create a day by day if you feel the need
I am trying to enhance the query #johnHC btw if you needs for 7days then you have to those days in case when
with CTE1 as
(
select attr, time::date as theday, count(*) as thecount
from t group by attr,time::date
)
, CTE2 as
(
select theday, sum(thecount) as daytotal
from CTE1
group by theday
)
,
CTE3 as
(
select t1.attr, EXTRACT(DOW FROM t1.theday) as day_nmbr,t1.theday, t1.thecount, t1.thecount/t2.daytotal as percentofday
from CTE1 t1
inner join CTE2 t2
on t1.theday = t2.theday
)
select CTE3.attr,
max(case when day_nmbr=0 then CTE3.thecount end) as day1Cnt,
max(case when day_nmbr=0 then percentofday end) as day1,
max(case when day_nmbr=1 then CTE3.thecount end) as day2Cnt,
max( case when day_nmbr=1 then percentofday end) day2
from CTE3 group by CTE3.attr
http://sqlfiddle.com/#!17/54ace/20
In case that you have only 2 days:
http://sqlfiddle.com/#!17/3bdad/3 (days descending as in your example from left to right)
http://sqlfiddle.com/#!17/3bdad/5 (days ascending)
The main idea is already mentioned in the other answers. Instead of joining the CTEs for calculating the values I am using window functions which is a bit shorter and more readable I think. The pivot is done the same way.
SELECT
attr,
COALESCE(max(count) FILTER (WHERE day_number = 0), 0) as day1_count, -- D
COALESCE(max(percent) FILTER (WHERE day_number = 0), 0) as day1_percent,
COALESCE(max(count) FILTER (WHERE day_number = 1), 0) as day2_count,
COALESCE(max(percent) FILTER (WHERE day_number = 1), 0) as day2_percent
/*
Add more days here
*/
FROM(
SELECT *, (count::float/count_per_day)::decimal(5, 2) as percent -- C
FROM (
SELECT DISTINCT
attr,
MAX(time::date) OVER () - time::date as day_number, -- B
count(*) OVER (partition by time::date, attr) as count, -- A
count(*) OVER (partition by time::date) as count_per_day
FROM test_table
)s
)s
GROUP BY attr
ORDER BY attr
A counting the rows per day and counting the rows per day AND attr
B for more readability I convert the date into numbers. Here I take the difference between current date of the row and the maximum date available in the table. So I get a counter from 0 (first day) up to n - 1 (last day)
C calculating the percentage and rounding
D pivot by filter the day numbers. The COALESCE avoids the NULL values and switched them into 0. To add more days you can multiply these columns.
Edit: Made the day counter more flexible for more days; new SQL Fiddle
Basically, I see this as conditional aggregation. But you need to get an enumerator for the date for the pivoting. So:
SELECT attr,
COUNT(*) FILTER (WHERE day_number = 1) as day1_count,
COUNT(*) FILTER (WHERE day_number = 1) / cnt as day1_percent,
COUNT(*) FILTER (WHERE day_number = 2) as day2_count,
COUNT(*) FILTER (WHERE day_number = 2) / cnt as day2_percent
FROM (SELECT attr,
DENSE_RANK() OVER (ORDER BY time::date DESC) as day_number,
1.0 * COUNT(*) OVER (PARTITION BY attr) as cnt
FROM test_table
) s
GROUP BY attr, cnt
ORDER BY attr;
Here is a SQL Fiddle.

How to get stockBalance of every end of the month

I have Four columns like
Date Customer InvoiceNo StockBalance
11/29/2017 A IN000414 5000
11/30/2017 B IN000415 4000
12/27/2017 A IN000416 3500
12/30/2017 B IN000417 2000
I want to get Stockbalance of every end of month, I need the output as
11/30/2017 B IN000415 4000
12/30/2017 B IN000417 2000
how could i get this could anybody guide to me?
You can use row_number() function :
select t.*
from (select *, row_number() over (partition by year(date), month(date) order by date desc) seq
from table
) t
where seq = 1;
EDIT : You want apply :
select t.*
from table t cross apply
( select top (1) t1.*
from table t1
where t1.Customer = t.Customer and
EOMONTH(t1.Dat) = t.Dat
order by t1.Dat desc
) t1;
Use row_number(), but be sure you include the year and month in the calculation:
select t.*
from (select t.*,
row_number() over (partition by year(date), month(date)
order by date desc
) as seqnum
from t
) t
where seqnum = 1;