How to get calculations from two rows - sql

I started learning SQL recently and would like to know if it is possible to do the calculations as below.
Basically my table looks like this:
id Date Fill_nbr
1 01/01/2015 30
1 02/05/2015 30
1 03/02/2015 30
1 07/01/2015 30
1 07/26/2015 30
2 03/01/2015 30
....
And I'd like to create a table like this:
id Date Fill_nbr Date_last Gap First
1 01/01/2015 30 01/30/2015 0 1
1 02/05/2015 30 03/04/2015 5 0
1 03/02/2015 30 03/31/2015 0 0
1 07/01/2015 30 07/30/2015 91 1
1 07/26/2015 30 08/24/2015 0 0
2 03/01/2015 30 03/30/2015 0 1
....
The rule for column 'Date_last' is Date_last = Date + fill_nbr which is easy to get.
The difficult part for me is the 'Gap' part. The rules are:
Gap='Date' - last record of "Date_last'.
For example, gap for the second row is calculated as Gap=02/05/2015-
01/30/2015;
Gap=0 for everyone's first record or when the calculated gap<0;
The rule for column 'First':
First=1 for everyone's first record OR when gap>60.
Otherwise, First=0;

Looks like this question is about abandoned already since important details are still missing...thought it'd be interesting to at least find a solution. The solution below works for SQL Server 2012 or higher since it uses LAG.
SELECT
id,
[Date],
Fill_nbr,
(CASE WHEN LAG (DATEADD(DD, Fill_nbr - 1, [Date]), 1, NULL) OVER (
PARTITION BY id ORDER BY [Date]) > [Date] THEN 0 ELSE
COALESCE(DATEDIFF(DD, LAG (DATEADD(DD, Fill_nbr - 1, [Date]), 1, NULL) OVER (
PARTITION BY id ORDER BY [Date]), [Date]) - 1, 0) END) AS Gap,
DATEADD(DD, Fill_nbr - 1, [Date]) AS Date_last,
CASE WHEN DATEPART(DD, [Date]) = 1 THEN 1 ELSE 0 END AS [First]
FROM Records
SQL Fiddle: http://sqlfiddle.com/#!6/a9b68/8

Thanks, Jason! I figured it out that the LAG or LEAD functions would work for this problem. So here is my solution which is similar with yours. Thanks again for your input!
select
id,
date,
fill_nbr,
date + fill_nbr - 1 AS date_last,
LAG(date_last) OVER (PARTITION BY id OREDER by id, date) LagV,
date - LagV - 1 as gap,
ROW_NUMBER() OVER(PARTITION BY id IRDER BY id, date) AS rk,
CASE
WHEN (gap>30 or rk=1) then '1'
ELSE '0'
END AS first
FROM table;

Related

Find non consecutive date ranges

i want to find if some of all the consecutive date ranges has gap between. Some of the dates are not consecutive, in this case it will return the RowId of the single range.
Table Name: Subscriptions
RowId
ClientId
Status
StartDate
EndDate
1
1
1
01/01/2022
02/01/2022
2
1
1
03/01/2022
04/01/2022
3
1
1
12/01/2022
15/01/2022
4
2
1
03/01/2022
06/01/2022
i want a sql statement to find RowId of non consecutive ranges for each client and status in (1,3) (example of result)
RowId
3
I want to solve the problem using SQL only.
thanks
One way you could do this is to use Lag (or lead) to identify gaps in neighbouring rows' date ranges and take the top N rows where the gap exceeds 1 day.
select top (1) with ties rowId
from t
where status in (1,3)
order by
case when DateDiff(day, lag(enddate,1,enddate)
over(partition by clientid order by startdate), startdate) >1
then 0 else 1 end;
You can detect gaps with LAG() and mark them. Then, it's easy to filter out the rows. For example:
select *
from (
select *,
case when dateadd(day, -1, start_date) >
lag(end_date) over(partition by client_id order by start_date)
then 1 else 0 end as i
from t
) x
where i = 1
Or simpler...
select *
from (
select *,
lag(end_date) over(partition by client_id order by start_date) as prev_end
from t
) x
where dateadd(day, -1, start_date) > prev_end

Redshift SQL Window Function frame_clause with days

I am trying to perform a window function on a data-set in Redshift using days an an interval for the preceding rows.
Example data:
date ID score
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 3
3/5/2017 555 2
SQL window function for avg score from the last 3 scores:
select
date,
id,
avg(score) over
(partition by id order by date rows
between preceding 3 and
current row) LAST_3_SCORES_AVG,
from DATASET
Result:
date ID LAST_3_SCORES_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2
Problem is that I would like the average score from the last 3 DAYS (moving average) and not the last three tests. I have gone over the Redshift and Postgre Documentation and can't seem to find any way of doing it.
Desired Result:
date ID 3_DAY_AVG
3/1/2017 123 1
3/1/2017 555 1
3/2/2017 123 1
3/3/2017 555 2
3/5/2017 555 2.5
Any direction would be appreciated.
You can use lag() and explicitly calculate the average.
select t.*,
(score +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 1) over (partition by id order by date)
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then lag(score, 2) over (partition by id order by date)
else 0
end)
)
) /
(1 +
(case when lag(date, 1) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end) +
(case when lag(date, 2) over (partition by id order by date) >=
date - interval '2 day'
then 1
else 0
end)
)
from dataset t;
The following approach could be used instead of the RANGE window option in a lot of (or all) cases.
You can introduce "expiry" for each of the input records. The expiry record would negate the original one, so when you aggregate all preceding records, only the ones in the desired range will be considered.
AVG is a bit harder as it doesn't have a direct opposite, so we need to think of it as SUM/COUNT and negate both.
SELECT id, date, running_avg_score
FROM
(
SELECT id, date, n,
SUM(score) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
/ NULLIF(SUM(n) OVER (PARTITION BY id ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), 0) as running_avg_score
FROM
(
SELECT date, id, score, 1 as n
FROM DATASET
UNION ALL
-- expiry and negate
SELECT DATEADD(DAY, 3, date), id, -1 * score, -1
FROM DATASET
)
) a
WHERE a.n = 1

Netezza Grouping by Week Start (Sunday) AND Month Start

I have a little bit of an unusual question. I'm using Python to write some data to a text file that I then use Tableau to read from and build visualizations. I'm grouping the query results by week in order to reduce the size of the output file. I think the SQL is pretty standard for that type of operation.
SELECT [Date] - EXTRACT(DOW FROM [Date]) + 1
[this gives me the Sunday of the week for any date]
However, I occasionally want to group by months rather than weeks, which is impossible with the current output. What I want is a modification to the query which will group by week EXCEPT when a week overlaps two months. If the week overlaps two months, it will split the results into the first part of the week which is in the first month, and then the second part of the week which is in the second month. That way, we could use the output to show weekly result OR monthly/quarterly/yearly results simply by grouping the dates within Tableau.
Has anyone tackled a problem like this before?
As an illustration, consider the following values.
2016-08-21 1
2016-08-22 1
2016-08-23 1
2016-08-24 1
2016-08-25 1
2016-08-26 1
2016-08-27 1
2016-08-28 1
2016-08-29 1
2016-08-30 1
2016-08-31 1
2016-09-01 1
2016-09-02 1
2016-09-03 1
2016-09-04 1
... ...
I would like the code to output the following values:
2016-08-21 7
2016-08-28 4
2016-09-01 3
2016-09-04 1...
Would really appreciate any help!
Based on googled Netzetta syntax, this could work:
select
min([Date]) as MinDate, count(*) as TotalDays
from YourTable
group by
extract(year from [Date]),
extract(month from [Date]),
(case
when extract(dow from [Date]) = 1 -- dow 1 is sunday
then extract(week from [Date]) + 1 -- week starts on monday
else extract(week from [Date])
end);
Or as suggested in the comments, group on the sunday:
select
min([Date]) as MinDate, count(*) as TotalDays
from YourTable
group by
([Date] - (extract(dow from [Date]) - 1));
Here's the final code that I used.
CASE
WHEN EXTRACT(MONTH FROM [Date]) <> EXTRACT(MONTH FROM [Date] - EXTRACT(DOW FROM [Date]) + 1)
THEN DATE_TRUNC('month', [Date])
ELSE [Date] - EXTRACT(DOW FROM [Date]) + 1 END
Then I grouped on that field.
The way it works is that it checks if the month of the date is equal to the month of the week start. If it isn't, it returns the first day of the month. If it is, it returns the week start. This code returns the values in the example from the original post.

SQL - How to count records for each status in one line per day?

I have a table Sales
Sales
--------
id
FormUpdated
TrackingStatus
There are several status e.g. Complete, Incomplete, SaveforLater, ViewRates etc.
I want to have my results in this form for the last 8 days(including today).
Expected Result:
Date Part of FormUpdated, Day of Week, Counts of ViewRates, Counts of Sales(complete), Counts of SaveForLater
--------------------------------------
2015-05-19 Tuesday 3 1 21
2015-05-18 Monday 12 5 10
2015-05-17 Sunday 6 1 8
2015-05-16 Saturday 5 3 7
2015-05-15 Friday 67 5 32
2015-05-14 Thursday 17 0 5
2015-05-13 Wednesday 22 0 9
2015-05-12 Tuesday 19 2 6
Here is my sql query:
select datename(dw, FormUpdated), count(ID), TrackingStatus
from Sales
where FormUpdated <= GETDATE()
AND FormUpdated >= GetDate() - 8
group by datename(dw, FormUpdated), TrackingStatus
order by datename(dw, FormUpdated) desc
I do not know how to make the next step.
Update
I forgot to mention, I only need the Date part of the FormUpdated, not all parts.
You can use SUM(CASE WHEN TrackingStatus = 'SomeTrackingStatus' THEN 1 ELSE 0 END)) to get the status count for each tracking status in individual column. Something like this. SQL Fiddle
select
CONVERT(DATE,FormUpdated) FormUpdated,
DATENAME(dw, CONVERT(DATE,FormUpdated)),
SUM(CASE WHEN TrackingStatus = 'ViewRates' THEN 1 ELSE 0 END) c_ViewRates,
SUM(CASE WHEN TrackingStatus = 'Complete' THEN 1 ELSE 0 END) c_Complete,
SUM(CASE WHEN TrackingStatus = 'SaveforLater' THEN 1 ELSE 0 END) c_SaveforLater
from Sales
where FormUpdated <= GETDATE()
AND FormUpdated >= DATEADD(D,-8,GetDate())
group by CONVERT(DATE,FormUpdated)
order by CONVERT(DATE,FormUpdated) desc
You can also use a PIVOT to achieve this result - you'll just need to complete the list of TrackingStatus names in both the SELECT and the FOR, and no GROUP BY required:
WITH DatesOnly AS
(
SELECT Id, CAST(FormUpdated AS DATE) AS DateOnly, DATENAME(dw, FormUpdated) AS DayOfWeek, TrackingStatus
FROM Sales
)
SELECT DateOnly, DayOfWeek,
-- List of Pivoted Columns
[Complete],[Incomplete], [ViewRates], [SaveforLater]
FROM DatesOnly
PIVOT
(
COUNT(Id)
-- List of Pivoted columns
FOR TrackingStatus IN([Complete],[Incomplete], [ViewRates], [SaveforLater])
) pvt
WHERE DateOnly <= GETDATE() AND DateOnly >= GetDate() - 8
ORDER BY DateOnly DESC
SqlFiddle
Also, I think your ORDER BY is wrong - it should just be the Date, not day of week.

Id like to group by number of days (+ or -) and use min date

ID Date Count
1, 2014-05-01 1
1, 2014-05-04 1
1, 2014-05-10 1
2, 2014-05-02 1
2, 2014-05-03 1
2, 2014-05-09 1
if I was to group where the time difference +/- 5 days, this would become
ID Date Count
1, 2014-05-01 2
1, 2014-05-10 1
2, 2014-05-02 2
2, 2014-05-09 1
Is this possible in Sequel Server 2012? Any pointers would be greatly appreciated. Thanks
I think you want to start a new group when there is a gap of five days. So, if you had a record with (1, 2014-05-07), then you would have only one group for 1.
If so, the following will work:
select id, min(date), sum(count)
from (select t.*, sum(HasGap) over (partition by id order by date) as grpid
from (select t.*,
(case when datediff(day,
lag(date) over (partition by id order by date),
date) < 5
then 0 else 1
end) as HasGap
from table t
) t
) t
group by id, grpid;