logic for subtracting and getting variance in oracle - sql

how to write a logic for getting the below output
with b2 as
(
select COUNT(SaLE) As cnt,
TO_CHAR(date,'YYYY-MON') As Period
from Order
where date between date '2020-02-01' and date '2020-02-28'
group by TO_CHAR(BATCH_DTE_CYMD,'YYYY-MON')
union all
select COUNT(Sale) As cnt,
TO_CHAR(Date,'YYYY-MON') As Period
from Order
where date between date '2020-01-01' and date '2020-01-31'
group by TO_CHAR(Date,'YYYY-MON')
)
select cnt, Period,
100*(cnt-lag(cnt,1,cnt) over (order by period))
/lag(cnt,1,cnt) over (order by period)
as "variance(%)"
from b2
order by period
i am getting this ouput
Cnt | period | variance(%)
11917 | 2020-FEB | 0
11707 | 2020-JAN | -1.76218847025258034740286984979441134514
but i want the this output
Cnt | period | variance(%) | sign
11917 | 2020-FEB | JAN-FEB (Variance we get in feb in % (with no decimal)) | Increase/decrease
11707 | 2020-JAN | 0 | 0

The issue with your code is, You are using LAG on the CHAR column(PERIOD) which is not correct as 2020-FEB is lower than the 2020-JAN when the comparison is in the string.
You must use them as the date in LAG function as following:
WITH B2 AS (
SELECT COUNT(SALE) AS CNT,
TRUNC(DATE, 'MON') AS PERIOD
FROM ORDER
WHERE DATE BETWEEN DATE '2020-01-01' AND DATE '2020-02-28'
GROUP BY TRUNC(DATE, 'MON')
)
SELECT
CNT,
TO_CHAR(PERIOD,'YYYY-MON') AS PERIOD,
100 * ( CNT - LAG(CNT, 1, CNT) OVER( ORDER BY PERIOD ) )
/ LAG(CNT, 1, CNT) OVER(ORDER BY PERIOD)
AS "variance(%)"
FROM B2
ORDER BY PERIOD
Cheers!!

Try below query:
select cnt,period,
case when variance<>0 then
'JAN-FEB(Variance we get in feb in '||to_char(round(variance,0))||')'
else to_char(variance) end as variance,
case when SIGN (variance)<0 then 'Decrease'
when sign(variance)=0 then 'No Change' else 'Increase'
end as sign from(select cnt, Period,
100*(cnt-lag(cnt,1,cnt) over (order by period desc ))
/lag(cnt,1,cnt) over (order by period desc)
as variance
from test
)t
order by period
I have used round function which will convert -1.7.. to -2. you can also use Ceil

Related

Minimum and maximum dates within continuous date range grouped by name

I have a data ranges with start and end date for a persons, I want to get the continuous date ranges only per persons:
Input:
NAME | STARTDATE | END DATE
--------------------------------------
MIKE | **2019-05-15** | 2019-05-16
MIKE | 2019-05-17 | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
Expected output like:
MIKE | **2019-05-15** | **2019-05-18**
MIKE | 2020-05-18 | 2020-05-19
So basically output is MIN and MAX for each continuous period for the person.
Appreciate any help.
I have tried the below query:
With N AS ( SELECT Name, StartDate, EndDate
, LastStop = MAX(EndDate)
OVER (PARTITION BY Name ORDER BY StartDate, EndDate
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) FROM Table ), B AS ( SELECT Name, StartDate, EndDate
, Block = SUM(CASE WHEN LastStop Is Null Then 1
WHEN LastStop < StartDate Then 1
ELSE 0
END)
OVER (PARTITION BY Name ORDER BY StartDate, LastStop) FROM N ) SELECT Name
, MIN(StartDate) DateFrom
, MAX(EndDate) DateTo FROM B GROUP BY Name, Block ORDER BY Name, Block
But its not considering the continuous period. It's showing the same input.
This is a type of gap-and-islands problem. There is no need to expand the data out by day! That seems very inefficient.
Instead, determine the "islands". This is where there is no overlap -- in your case lag() is sufficient. Then a cumulative sum and aggregation:
select name, min(startdate), max(enddate)
from (select t.*,
sum(case when prev_enddate >= dateadd(day, -1, startdate) then 0 else 1 end) over
(partition by name order by startdate) as grp
from (select t.*,
lag(enddate) over (partition by name order by startdate) as prev_enddate
from t
) t
) t
group by name, grp;
Here is a db<>fiddle.
Here is an example using an ad-hoc tally table
Example or dbFiddle
;with cte as (
Select A.[Name]
,B.D
,Grp = datediff(day,'1900-01-01',D) - dense_rank() over (partition by [Name] Order by D)
From YourTable A
Cross Apply (
Select Top (DateDiff(DAY,StartDate,EndDate)+1) D=DateAdd(DAY,-1+Row_Number() Over (Order By (Select Null)),StartDate)
From master..spt_values n1,master..spt_values n2
) B
)
Select [Name]
,StartDate= min(D)
,EndDate = max(D)
From cte
Group By [Name],Grp
Returns
Name StartDate EndDate
MIKE 2019-05-15 2019-05-18
MIKE 2020-05-18 2020-05-19
Just to help with the Visualization, the CTE generates the following
This will give you the same result
SELECT subquery.name,min(subquery.startdate),max(subquery.enddate1)
FROM (SELECT NAME,startdate,
CASE WHEN EXISTS(SELECT yt1.startdate
FROM t yt1
WHERE yt1.startdate = DATEADD(day, 1, yt2.enddate)
) THEN null else yt2.enddate END as enddate1
FROM t yt2) as subquery
GROUP by NAME, CAST(MONTH(subquery.startdate) AS VARCHAR(2)) + '-' + CAST(YEAR(subquery.startdate) AS VARCHAR(4))
For the CASE WHEN EXISTS I refered to SQL CASE
For the group by month and year you can see this GROUP BY MONTH AND YEAR
DB_FIDDLE

SQL - unique users who are visiting for the first time

Given following table visitorLog, write a SQL to find the following by date.
Total_Visitors
VisitorGain - compare to previous day
VisitorLoss - compare to previous day
Total_New_Visitors - unique users who are visiting for the first time
visitorLog :
*----------------------*
| Date Visitor |
*----------------------*
| 01-Jan-2011 V1 |
| 01-Jan-2011 V2 |
| 01-Jan-2011 V3 |
| 02-Jan-2011 V2 |
| 03-Jan-2011 V2 |
| 03-Jan-2011 V4 |
| 03-Jan-2011 V5 |
*----------------------*
Expected output:
*---------------------------------------------------------------------*
| Date Total_Visitors VisitorGain VisitorLoss Total_New_Visitors |
*---------------------------------------------------------------------*
| 01-Jan-2011 3 3 0 3 |
| 02-Jan-2011 1 0 2 0 |
| 03-Jan-2011 3 2 0 2 |
*---------------------------------------------------------------------*
Here is my SQL and SLQ fiddle.
with cte as
(
select
date,
total_visitors,
lag(total_visitors) over (order by date) as prev_visitors,
row_number() over (order by date ) as rnk
from
(
select
*,
count(visitor) over (partition by date) as total_visitors
from visitorLog
) val
group by
date,
total_visitors
),
cte2 as
(
select
date,
sum(case when rnk = 1 then 1 else 0 end) as total_new_visitors
from
(
select
date,
visitor,
row_number() over (partition BY visitor order by date) as rnk
from visitorLog
) t
group by
date
)
select
c.date,
sum(total_visitors) as total_visitors,
sum(
case
when rnk = 1 then total_visitors
when (rnk > 1 and prev_visitors < total_visitors) then (total_visitors - prev_visitors)
else
0
end
)visitorGain,
sum(
case
when rnk = 1 then 0
when prev_visitors > total_visitors then (prev_visitors - total_visitors)
else
0
end
) as visitorLoss,
sum(total_new_visitors) as total_new_visitors
from cte c
join cte2 c2
on c.date = c2.date
group by
c.date
order by
c.date
My solution is working as expected but I am wondering if I am missing any any edge cases here which may break my logic. any help would be great.
This logic does what you want:
select date, count(*) as num_visitor,
greatest(count(*) - lag(count(*)::int, 1, 0) over (order by date), 0) as visitor_gain,
greatest(lag(count(*)::int, 1, 0) over (order by date) - count(*), 0) as visitor_loss,
count(*) filter (where seqnum = 1) as num_new_visitors
from (select vl.*,
row_number() over (partition by visitor order by date) as seqnum
from visitorLog vl
) vl
group by date
order by date
Here is a db<>fiddle.
I would use window functions and aggregation:
select
date,
count(*) no_visitor,
count(*) - lag(count(*), 1, 0) over(partition by date) no_visitor_diff,
count(*) filter(where rn = 1) no_new_visitors
from (
select t.*, row_number() over(partition by visitor order by date) rn
from visitorLog
) t
group by date
order by date
The subquery ranks the visits of each customer using row_number() (the first visit of each customer gets row number 1). Then, the outer query aggregates by date, and uses lag() to get the visitor count of the "previous" day.
I don't really see the point to have two distinct columns for the difference of visitors compared to the last day, so this gives you a single column, with a value that's either positive or negative depending whether customers were gained or lost.
If you really want two columns, then:
greatest(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_gain,
- least(count(*) - lag(count(*), 1, 0) over(partition by date), 0) visitor_loss

Get difference for each month data

I am having a table like below. I want to get a difference for each month (Subtract the value of last day of the month by the first date of the month).
========================
Date | type| value
------------------------
01-01-2019 | A | 10
01-02-2019 | A | 30
31-01-2019 | A | 40
Expected output
A -> 30
You can use row_number():
select type, year(date), month(date),
(max(case when seqnum_desc = 1 then value end) -
max(case when seqnum_asc = 1 then value end)
) as month_diff
from (select t.*,
row_number() over (partition by type, year(date), month(date) order by date) as seqnum_asc,
row_number() over (partition by type, year(date), month(date) order by date desc) as seqnum_desc
from t
) t
group by type, year(date), month(date);
If you know what the values are increasing (or non-decreasing), then you can use aggregation:
select type, year(date), month(date), max(value) - min(value)
from t
group by type, year(date), month(date);
try this query:
select diff
from
(
select abs(value-
(select value
from yourtable as a2
where a2.date=EOMONTH(a1.date))) diff
from yourtable as a1
)
where diff is not null

SQL partition by on date range

Assume this is my table:
ID NUMBER DATE
------------------------
1 45 2018-01-01
2 45 2018-01-02
2 45 2018-01-27
I need to separate using partition by and row_number where the difference between one date and another is greater than 5 days. Something like this would be the result of the above example:
ROWNUMBER ID NUMBER DATE
-----------------------------
1 1 45 2018-01-01
2 2 45 2018-01-02
1 3 45 2018-01-27
My actual query is something like this:
SELECT ROW_NUMBER() OVER(PARTITION BY NUMBER ODER BY ID DESC) AS ROWNUMBER, ...
But as you can notice, it doesn't work for the dates. How can I achieve that?
You can use lag function :
select *, row_number() over (partition by number, grp order by id) as [ROWNUMBER]
from (select *, (case when datediff(day, lag(date,1,date) over (partition by number order by id), date) <= 1
then 1 else 2
end) as grp
from table
) t;
by using lag and datediff funtion
select * from
(
select t.*,
datediff(day,
lag(DATE) over (partition by NUMBER order by id),
DATE
) as diff
from t
) as TT where diff>5
http://sqlfiddle.com/#!18/130ae/11
I think you want to identify the groups, using lag() and datediff() and a cumulative sum. Then use row_number():
select t.*,
row_number() over (partition by number, grp order by date) as rownumber
from (select t.*,
sum(grp_start) over (partition by number order by date) as grp
from (select t.*,
(case when lag(date) over (partition by number order by date) < dateadd(day, 5, date)
then 1 else 0
end) as grp_start
from t
) t
) t;

SQL Server - find absence date occurrences [duplicate]

This question already has an answer here:
SQL: Gaps and Islands, Grouped dates
(1 answer)
Closed 5 years ago.
I have the following dataset:
enter image description here
Here is script for this data:
;with dataset AS (
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE
UNION
select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE
UNION
select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE
)
select *
from dataset
I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).
Expecting result for the dataset above is:
If your version of SQL Server supports LAG, you can use this.
SELECT EMPLOYEE_ID,
ITEM_TYPE,
MIN(APPLY_DATE) AS STARTDATE,
MAX(APPLY_DATE) AS ENDDATE
FROM
(SELECT T.*,
SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END)
OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
FROM (SELECT D.*,
LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
FROM DATA D
) T
) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP
The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.
Sample Demo
You would use the LAG function.
If you order by something, the LAG function gives the previous value;
a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/
Take a look at vkp's answer for a full query
This is another way if way if lag is supported.
Rextester Sample
with tbl as
(select d.*
,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
then 0
else 1
end grp_tmp
from DATA2 d
where
item_type <> 'Worked'
)
,tbl2 as
(select t.*
,sum(grp_tmp) over (order by employee_id,apply_date
rows between unbounded preceding and current row
)
as grp
from tbl t
)
select
EMPLOYEE_ID
,ITEM_TYPE
,(CONVERT(VARCHAR(24),min(apply_date),103)
+' - '
+CONVERT(VARCHAR(24),max(apply_date),103)
) as range
from tbl2
group by EMPLOYEE_ID,
ITEM_TYPE
,grp
order by
employee_id
,min(apply_date);
Output
+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE | range |
+-------------+-----------+-------------------------+
| 1 | Sickness | 23/05/2017 - 24/05/2017 |
| 1 | Vacation | 26/05/2017 - 29/05/2017 |
| 1 | Sickness | 01/06/2017 - 01/06/2017 |
| 2 | Sickness | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+