calculate sum based on value of other row in another column - sql

I am trying to figure how can I calculate the number of days,the customer did not eat any candy.
Assuming that the Customer eats 1 candy/day.
If customer purchases more candy, it gets added to previous stock
Eg.
Day Candy Puchased
0 30
40 30
65 30
110 30
125 40
170 30
Answer here is 20.
Meaning on 0th day, customer brought 30 candies and his next purchase was on 40th day so he did not get to eat any candy between 30th to 39th day, also in the same way he did not eat any candy between 100th to 109th day.
Can anyone help me to write the query. I think I have got the wrong logic in my query.
select sum(curr.candy_purchased-(nxt.day-curr.day)) as diff
from candies as curr
left join candies as nxt
on nxt.day=(select min(day) from candies where day > curr.day)

You need a recursive CTE
First I need create a row_id so I use row_number
Now I need the base case for recursion.
Day: Mean how many day has pass. (0 from db)
PrevD: Is the Prev day amount so you can calculate Day (start at 0)
Candy Puchased: How many cadies bought (30 from db)
Remaining: How many candies left after eating (start at 0)
NotEat: How many days couldnt eat candy (start at 0)
Level: Recursion Level (start at 0)
Recursion Case
Day, PrevD, Candy Puchased are easy
Remaining: if I eat more than I have then 0
NotEat: Keep adding the diffence when doesnt have candy.
SQL Fiddle Demo
WITH Candy as (
SELECT
ROW_NUMBER() over (order by [Day]) as rn,
*
FROM Table1
), EatCandy ([Day], [PrevD], [Candy Puchased], [Remaining], [NotEat], [Level]) as (
SELECT [Day], 0 as [PrevD], [Candy Puchased], [Candy Puchased] as [Remaining], 0 as [NotEat], 1 as [Level]
FROM Candy
WHERE rn = 1
UNION ALL
SELECT c.[Day] - ec.[PrevD],
c.[Day],
c.[Candy Puchased],
c.[Candy Puchased] +
IIF((c.[Day] - ec.[PrevD]) > ec.[Remaining], 0, ec.[Remaining] - (c.[Day] - ec.[PrevD])),
ec.[NotEat] +
IIF((c.[Day] - ec.[PrevD]) > ec.[Remaining], (c.[Day] - ec.[PrevD]) - ec.[Remaining], 0),
ec.[Level] + 1
FROM Candy c
JOIN EatCandy ec
ON c.rn = ec.[level] + 1
)
select * from EatCandy
OUTPUT
| Day | PrevD | Candy Puchased | Remaining | NotEat | Level |
|-----|-------|----------------|-----------|--------|-------|
| 0 | 0 | 30 | 30 | 0 | 1 |
| 40 | 40 | 30 | 30 | 10 | 2 |
| 25 | 65 | 30 | 35 | 10 | 3 |
| 45 | 110 | 30 | 30 | 20 | 4 |
| 15 | 125 | 40 | 55 | 20 | 5 |
| 45 | 170 | 30 | 40 | 20 | 6 |
Just add SELECT MAX(NotEat) over the last query

Nice question.
Check my answer and also try with different sample data.
and please,if with different sample data it is not working then let me know.
declare #t table([Day] int, CandyPuchased int)
insert into #t
values (0, 30),(40,30),(65, 30)
,(110, 30),(125,40),(170,30)
select * from #t
;With CTE as
(
select *,ROW_NUMBER()over(order by [day])rn from #t
)
,CTE1 as
(
select [day],[CandyPuchased],rn from CTE c where rn=1
union all
select a.[Day],case when a.Day-b.Day<b.CandyPuchased
then a.CandyPuchased+(b.CandyPuchased-(a.Day-b.Day))
else a.CandyPuchased end CandyPuchased
,a.rn from cte A
inner join CTE B on a.rn=b.rn+1
)
--select * from CTE1
select sum(case when a.Day-b.Day>b.CandyPuchased
then (a.Day-b.Day)-b.CandyPuchased else 0 end)[CandylessDays]
from CTE1 A
inner join CTE1 b on a.rn=b.rn+1

If you just need the result at the end of the series, you don't really need that join.
select max(days) --The highest day in the table (convert these to int first)
- (sum(candies) --Total candies purchased
- (select top 1 candies from #a order by days desc)) --Minus the candies purchased on the last day
from MyTable
If you need this as a sort of running total, try over:
select *, sum(candies) over (order by days) as TotalCandies
from MyTable
order by days desc

Related

Select Top 20 Distinct Rows in Each Category

I have a database table in the following format.
Product | Date | Score
A | 01/01/18 | 99
B | 01/01/18 | 98
C | 01/01/18 | 97
--------------------------
A | 02/01/18 | 99
B | 02/01/18 | 98
C | 02/01/18 | 97
--------------------------
D | 03/01/18 | 99
A | 03/01/18 | 98
B | 03/01/18 | 97
C | 03/01/18 | 96
I want to pick the first from every month such that there are no repeat products. For example, the output of the above table should be
Product | Date | Score
A | 01/01/18 | 99
B | 02/01/18 | 98
D | 03/01/18 | 99
How do I get this result with a single sql query? The actual table is much bigger than this and I want top 20 from every month without repetition.
This is a hard problem -- a type of subgraph problem that isn't really suitable to SQL. There is a brute force approach:
with jan as (
select *
from t
where date = '2018-01-01'
limit 1
),
feb as (
select *
from t
where date = '2018-02-01' and
product not in (select product from jan)
),
mar as (
select *
from t
where date = '2018-03-01' and
product not in (select product from jan) and
product not in (select product from feb)
)
select *
from jan
union all
select *
from feb
union all
select *
from mar;
You can generalize this with additional CTEs. But there is no guarantee that a month will have a product -- even when it could have had one.
It is possible by using row_number.
select * from (
select row_Number() over(partition by Product order by Product ) as rno,* from
Products
) as t where t.rno<=20
I think you want top 20 records every month without repeating products than below solution will be work.
select *
into #temp
from
(values
('A','01/01/18','99')
,('B','01/01/18','98')
,('C','01/01/18','97')
,('A','02/01/18','99')
,('B','02/01/18','98')
,('C','02/01/18','97')
,('D','03/01/18','99')
,('A','03/01/18','98')
,('B','03/01/18','97')
,('C','03/01/18','96')
) AS VTE (Product ,Date, Score )
select * from
(
select * , ROW_NUMBER() over (partition by date,product order by score ) as rn
from #TEMP
)
A where rn < 20

T-SQL, how to get IDs that visit X amount of location within X amount of time?

T-SQL question, I been trying to find the best/optimal solution for this one.
Say we have this theoritical table
-----------------------------------
ID | DATETIME | Location
11 | 1/27 3:30pm | a
11 | 1/27 3:31pm | b
11 | 1/27 3:32pm | c
22 | 2/14 1:10pm | g
22 | 2/14 1:12pm | i
22 | 2/15 5:48pm | w
55 | 3/18 8:48pm | d
55 | 3/18 9:48pm | e
---------------------------
I want to create a query that return IDs that have been in 2 or more different locations within 5 minutes. In this case if you look at the table, ID: 11 and 22 visits 2 or more different location within 5 minutes, thus it should return ID 11 and 22. How do I develop a query that returns the IDs that been to X amount of location within X amount of time in minutes?
I suggest using cross apply
select t.*, ca.num_visit
from table1 as t
cross apply (
select count(*) num_visit
from table1 as c
where c.id = t.id
and c.DATETIME > t.DATETIME
and c.DATETIME <= dateadd(minute,5,t.DATETIME)
) ca
where num_visit >= 2
If you assume that the locations are different on each row for a given id, you can use lead()/lag():
select id, datetime
from (select t.*,
lead(datetime) over (partition by id order by datetime) as next_datetime
from t
) t
where next_datetime < dateadd(minute, 5, datetime);
This is not a general solution to the problem. But it does solve the particular example you have in your question.

Teradata sql query from grouping records using Intervals

In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.

How to calculate moving average in SQL?

I've a table with 2 columns in SQL
+------+--------+
| WEEK | OUTPUT |
+------+--------+
| 1 | 10 |
| 2 | 20 |
| 3 | 30 |
| 4 | 40 |
| 5 | 50 |
| 6 | 50 |
+------+--------+
How do I calculate to sum up output for 2 weeks before (ex : on week 3, it will sum up the output for week 3, 2 and 1), I've seen many tutorials to do moving average but they are using date, in my case i want to use (int), is that possible ?.
Thanks !.
I think you want something like this :
SELECT *,
(SELECT Sum(output)
FROM table1 b
WHERE b.week IN( a.week, a.week - 1, a.week - 2 )) AS SUM
FROM table1 a
OR
In clause can be converted to between a.week-2 and a.week.
sql fiddle
You can use a self-join. The idea is to put you table beside itself with a condition that brings matching rows in a single row:
SELECT * FROM [output] o1
INNER JOIN [output] o2 ON o1.Week between o2.Week and o2.Week + 2
this select will produce this output:
o1.Week o1.Output o2.Week o2.Output
--------------------------------------------
1 10 1 10
2 20 1 10
2 20 2 20
3 30 1 10
3 30 2 20
3 30 3 30
4 40 2 20
4 40 3 30
4 40 4 40
and so on. Note that for weeks 1 and 2 there aren't previous weeks available.
Now you should just group the data by o1.Week and get the SUM:
SELECT o1.Week, SUM(o2.Output)
FROM [output] o1
INNER JOIN [output] o2 ON o1.Week between o2.Week and o2.Week + 2
GROUP BY o1.Week
If week is continuous, you can simply use Window function
SELECT [Week], [Output],
SUM([Output]) OVER (ORDER BY [Week] ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dbo.SomeTable
Range is more accurate for your calculation, but it not implemented in SQL Server yet. Other database engines may support
SELECT [Week], [Output],
SUM([Output]) OVER (ORDER BY [Week] RANGE BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM dbo.SomeTable
Try this:
SELECT SUM(t1.output) / 3
FROM yourtable t1
WHERE t1.week <=
(select t2.week from yourtable t2 where t2.week - t1.week > 0 and t2.week - t1.week <= 2)
You are not written your sqlserver, if it is sqlserver2012 or above , then the simple example is
declare #table table(wk int,outpt int )
insert into #table values (1,10)
,(2,20)
,(3,30)
,(4,40)
,(5,50)
,(6,60)
select *,SUM(outpt) over(partition by id order by id rows between unbounded preceding and current row ) dd
from (
select * , 1 id
from #table
where wk < 5
) a

Get Monthly Totals from Running Totals

I have a table in a SQL Server 2008 database with two columns that hold running totals called Hours and Starts. Another column, Date, holds the date of a record. The dates are sporadic throughout any given month, but there's always a record for the last hour of the month.
For example:
ContainerID | Date | Hours | Starts
1 | 2010-12-31 23:59 | 20 | 6
1 | 2011-01-15 00:59 | 23 | 6
1 | 2011-01-31 23:59 | 30 | 8
2 | 2010-12-31 23:59 | 14 | 2
2 | 2011-01-18 12:59 | 14 | 2
2 | 2011-01-31 23:59 | 19 | 3
How can I query the table to get the total number of hours and starts for each month between two specified years? (In this case 2011 and 2013.) I know that I need to take the values from the last record of one month and subtract it by the values from the last record of the previous month. I'm having a hard time coming up with a good way to do this in SQL, however.
As requested, here are the expected results:
ContainerID | Date | MonthlyHours | MonthlyStarts
1 | 2011-01-31 23:59 | 10 | 2
2 | 2011-01-31 23:59 | 5 | 1
Try this:
SELECT c1.ContainerID,
c1.Date,
c1.Hours-c3.Hours AS "MonthlyHours",
c1.Starts - c3.Starts AS "MonthlyStarts"
FROM Containers c1
LEFT OUTER JOIN Containers c2 ON
c1.ContainerID = c2.ContainerID
AND datediff(MONTH, c1.Date, c2.Date)=0
AND c2.Date > c1.Date
LEFT OUTER JOIN Containers c3 ON
c1.ContainerID = c3.ContainerID
AND datediff(MONTH, c1.Date, c3.Date)=-1
LEFT OUTER JOIN Containers c4 ON
c3.ContainerID = c4.ContainerID
AND datediff(MONTH, c3.Date, c4.Date)=0
AND c4.Date > c3.Date
WHERE
c2.ContainerID is null
AND c4.ContainerID is null
AND c3.ContainerID is not null
ORDER BY c1.ContainerID, c1.Date
Using recursive CTE and some 'creative' JOIN condition, you can fetch next month's value for each ContainterID:
WITH CTE_PREP AS
(
--RN will be 1 for last row in each month for each container
--MonthRank will be sequential number for each subsequent month (to increment easier)
SELECT
*
,ROW_NUMBER() OVER (PARTITION BY ContainerID, YEAR(Date), MONTH(DATE) ORDER BY Date DESC) RN
,DENSE_RANK() OVER (ORDER BY YEAR(Date),MONTH(Date)) MonthRank
FROM Table1
)
, RCTE AS
(
--"Zero row", last row in decembar 2010 for each container
SELECT *, Hours AS MonthlyHours, Starts AS MonthlyStarts
FROM CTE_Prep
WHERE YEAR(date) = 2010 AND MONTH(date) = 12 AND RN = 1
UNION ALL
--for each next row just join on MonthRank + 1
SELECT t.*, t.Hours - r.Hours, t.Starts - r.Starts
FROM RCTE r
INNER JOIN CTE_Prep t ON r.ContainerID = t.ContainerID AND r.MonthRank + 1 = t.MonthRank AND t.Rn = 1
)
SELECT ContainerID, Date, MonthlyHours, MonthlyStarts
FROM RCTE
WHERE Date >= '2011-01-01' --to eliminate "zero row"
ORDER BY ContainerID
SQLFiddle DEMO (I have added some data for February and March in order to test on different lengths of months)
Old version fiddle