Output 2 columns with average from year t and year t+1 - sql

I have a table that looks like this:
| playerid | season | Stat |
|-----------|---------|---------|
| 1 | 2014 | 2.3 |
| 1 | 2015 | 1.4 |
| 1 | 2016 | 3.5 |
| 2 | 2011 | 1.5 |
| 2 | 2012 | 5.5 |
| 3 | 2010 | 6.7 |
| 3 | 2011 | 2.6 |
I want a table with 2 columns which average 'STAT' for year t in column 1 and year t+1 in column 2.
IE-Column 1 would have averages of 'Stat' for:
playerid=1 & season=2014,
playerid=1 & season=2015,
playerid=2 & season=2011,
playerid=3 & season=2010.
Column 2 would have averages of 'Stat' for:
playerid=1 & season=2015,
playerid=1 & season=2016,
playerid=2 & season=2012,
playerid=3 & season=2011.

You could look up next year with a left join:
select cur.playerid
, cur.season
, cur.stat as stat_this_season
, next.stat as stat_next_season
from YourTable cur
left join
YourTable next
on cur.playerid = next.playerid
and cur.year = next.year - 1
To filter out seasons that do not have a next season, change the left join to an inner join (which you can abbreviate as join.)

You could use the analytic function ROW_NUMBER() to order the rows by year, for each player. Then you can get the requested averages by excluding the first record (for Column 1) and the last record (for Column 2). In case there is only one season, Stat will be included into Column 1 only.
SELECT t.playerid,
AVG(CASE WHEN t.ord1 != 1 OR
(t.ord1 = 1 AND t.ord2 = 1) THEN Stat END) AS Column_1,
AVG(CASE WHEN t.ord2 != 1 THEN Stat END) AS Column_2
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season DESC) AS ord1,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season ASC) AS ord2
FROM table_1 s) t
GROUP BY playerid
ORDER BY playerid
If you remove the GROUP BY from previous query, you will get only one row with both averages over all players:
SELECT AVG(CASE WHEN t.ord1 != 1 OR
(t.ord1 = 1 AND t.ord2 = 1) THEN Stat END) AS Column_1,
AVG(CASE WHEN t.ord2 != 1 THEN Stat END) AS Column_2
FROM (SELECT s.*,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season DESC) AS ord1,
ROW_NUMBER() OVER (PARTITION BY playerid ORDER BY season ASC) AS ord2
FROM table_1 s) t

Related

Find Customers With 4 Consecutive Years of Giving (Including Gaps)

I have a table similar to below:
+------------+-----------+
| CustomerID | OrderYear |
+------------+-----------+
| 1 | 2012 |
| 1 | 2013 |
| 1 | 2014 |
| 1 | 2017 |
| 1 | 2018 |
| 2 | 2012 |
| 2 | 2013 |
| 2 | 2014 |
| 2 | 2015 |
| 2 | 2017 |
+------------+-----------+
How would I identify which CustomerIDs have 4 consecutive years of giving? (In the above, only customer 2.) As you can see, some records will have gaps in order years.
I started down the row of trying to utilize some combination of ROW_NUMBER/LAG/LEAD with no luck to this point.
Very paired down/modified attempt...
WITH CTE
AS
(
SELECT T.ConstituentLookupID,
T.FISCALYEAR,
COUNT(T.FISCALYEAR) OVER (PARTITION BY T.ConstituentLookupID) AS
YearCount,
FIRST_VALUE(T.FISCALYEAR) OVER(PARTITION BY T.ConstituentLookupID ORDER
BY T.FISCALYEAR DESC) - T.FISCALYEAR + 1 as X,
ROW_NUMBER() OVER(PARTITION BY T.ConstituentLookupID ORDER BY
T.FISCALYEAR DESC) AS RN
FROM #Temp AS T)
SELECT CTE.ConstituentLookupID,
CTE.FISCALYEAR,
CTE.YearCount,
CTE.X,
CTE.RN,
FROM CTE
WHERE CTE.YearCount >= 4 --Have at least 4 years of giving
AND CTE.X - CTE.RN = 1 --Some kind of way to calculate consecutive years. Doesnt account current year and gaps...;
Assuming no duplicates, you can use lag():
select distinct customerid
from (
select t.*,
lag(orderyear, 3) over(partition by customerid order by orderyear) oderyear3
from mytable t
) t
where orderyear = orderyear3 + 3
A more conventional approach is to use some gaps-and-islands technique. This is convenient if you want the start and end of each series. Here, an island is a series of rows with "adjacent" order years, and you want islands that are at least 4 years long. We can identify the islands by comparing the order year against an incrementing sequence, then use aggregation:
select customerid, min(orderyear) firstorderyear, max(orderyear) lastorderyear
from (
select t.*,
row_number() over(partition by customerid order by orderyear) rn
from mytable t
) t
group by customerid, orderyear - rn
having count(*) >= 4
Assuming you have no more than one row per customer and year, the simplest method is lag():
select customerid, year
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
The idea is to look 3 years back. If that year is year - 3, then there are four years in a row. If your data can have duplicates, there are tweaks to the logic (they make the query more slightly more complicated).
This could return duplicates, so you might just want:
select distinct customerid
from (select t.*,
lag(orderyear, 3) over (partition by customerid order by orderyear) as prev3_year
from t
) t
where prev3_year = year - 3;
I have a simple solution using row number and group by
SELECT Max(z.customerid),
Count(z.grp)
FROM (SELECT customerid,
orderyear,
orderyear - Row_number()
OVER (
ORDER BY customerid) AS Grp
FROM mytable)z
GROUP BY z.grp
HAVING Count(z.grp) = 4

Select N rows in aggregate functions SQL Server

I have a table that looks like this:
+--------+----------+--------+------------+-------+
| ID | CHANNEL | VENDOR | num_PERIOD | SALES |
+--------+----------+--------+------------+-------+
| 000001 | Business | Shop | 1 | 40 |
| 000001 | Business | Shop | 2 | 60 |
| 000001 | Business | Shop | 3 | NULL |
+--------+----------+--------+------------+-------+
With many combinations of ID, CHANNEL and VENDOR, and sales records for each of them over time (num_PERIOD).
The idea is to obtain a new column which returns the number of NULLS in SALES column, but in the first 111 registers according to num_PERIOD column.
I have been trying something like this:
SELECT ID,
CHANNEL,
VENDOR,
sum(CASE
WHEN SALES IS NULL THEN 1
ELSE 0
END) OVER (PARTITION BY ID,
CHANNEL,
VENDOR
ORDER BY num_PERIOD ROWS BETWEEN UNBOUNDED PRECEDING AND 111 FOLLOWING) AS NULL_SALES_SET
FROM TABLE
GROUP BY ID,
CHANNEL,
VENDOR
But I'm not obtaining what I'm looking for.
So to obtain a table simillar to:
+--------+--------------+--------+----------------+
| ID | CHANNEL | VENDOR | NULL_SALES_SET |
+--------+--------------+--------+----------------+
| 000001 | Business | Shop | 1 |
| 000002 | Business | Market | 0 |
| 000002 | Non Business | Shop | 3 |
+--------+--------------+--------+----------------+
The difficulty comes when selecting these first 111 rows per ID, CHANNEL AND VENDOR ordered by num_PERIOD.
Use a CTE (Common Table Expression) with the ROW_NUMBER windowed function and you should be set:
;WITH MyCTE AS
(
SELECT
id,
channel,
vendor,
sales,
ROW_NUMBER() OVER (PARTITION BY id, channel, vendor ORDER BY num_period) AS row_num
FROM
MyTable
)
SELECT
id,
channel,
vendor,
SUM(CASE WHEN sales IS NULL THEN 1 ELSE 0 END) AS null_sales_set
FROM
MyCTE
WHERE
row_num <= 111
GROUP BY
id, channel, vendor
Do you have to use the windowing function?
SELECT ID
, CHANNEL
, VENDOR
, NULL_SALES_SET = SUM(CASE WHEN SALES IS NULL THEN 1 ELSE 0 END)
FROM Table
WHERE num_PERIOD <= 111
GROUP BY ID, CHANNEL, VENDOR
Or are you looking for the first 111 num_PERIOD values allowing for gaps in the num_PERIOD column?
SELECT t.ID
, t.CHANNEL
, t.VENDOR
, NULL_SALES_SET = SUM(CASE WHEN t.SALES IS NULL THEN 1 ELSE 0 END)
FROM Table t
INNER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID, i.CHANNEL, i.VENDOR ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
WHERE l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR
Edit: Not sure how I overlooked it, but it is necessary to JOIN on the num_PERIOD column.
Edit: Add the number of distinct num_PERIOD per ID, Channel, Vendor without affecting the NULL_SALES_SET
SELECT t.ID
, t.CHANNEL
, t.VENDOR
-- Counts the NULL Sales when the num_PERIOD is in the
-- first 111 num_PERIODs
, NULL_SALES_SET = SUM(CASE WHEN l.rowNum IS NOT NULL AND t.SALES IS NULL
THEN 1
ELSE 0 END)
-- Counts the distinct num_PERIOD values
, PERIOD_COUNT = COUNT(DISTINCT t.num_PERIOD)
FROM Table t
LEFT OUTER JOIN ( SELECT i.ID
, i.CHANNEL
, i.VENDOR
, i.num_PERIOD
, rowNum = ROW_NUMBER(PARTITION BY i.ID,
i.CHANNEL,
i.VENDOR
ORDER BY i.num_PERIOD)
FROM Table i ) l
ON t.ID = l.ID
AND t.CHANNEL = l.CHANNEL
AND t.VENDOR = l.VENDOR
AND t.num_PERIOD = l.num_PERIOD
AND l.rowNum <= 111
GROUP BY ID, CHANNEL, VENDOR

Count max number of consecutive occurrences of a value in SQL Server

I have a table with players, results and ID:
Player | Result | ID
---------------
An | W | 1
An | W | 1
An | L | 0
An | W | 1
An | W | 1
An | W | 1
Ph | L | 0
Ph | W | 1
Ph | W | 1
Ph | L | 0
Ph | W | 1
A 'W' will always have an ID of 1,
I need to create a query that will count the maximum number of consecutive 'W's for each player:
Player | MaxWinStreak
---------------------
An | 3
Ph | 2
I tried to use Rows Unbounded Preceeding but i can only get it to count the maximum number of Ws in total, and not consecutively
Select
t2.player
,max(t2.cumulative_wins) As 'Max'
From
( Select
t.Player
,Sum(ID) Over (Partition By t.Result,t.player
Order By t.GameWeek Rows Unbounded Preceding) As cumulative_wins
From
t
) t2
Group By
t2.player
Is there a different approach i can take ?
You need a column to specify the ordering. SQL tables represent unordered sets. In the below query, the ? represents this column.
You can use the difference of row numbers to get each winning streak:
select player, count(*) as numwins
from (select t.*,
row_number() over (partition by player order by ?) as seqnum,
row_number() over (partition by player, result order by ?) as seqnum_r
from t
) t
where result = 'W'
group by player, (seqnum - seqnum_r);
You can then get the maximum:
select player, max(numwins)
from (select player, count(*) as numwins
from (select t.*,
row_number() over (partition by player order by ?) as seqnum,
row_number() over (partition by player, result order by ?) as seqnum_r
from t
) t
where result = 'W'
group by player, (seqnum - seqnum_r)
) pw
group by player;

Grouping SQL Results based on order

I have table with data something like this:
ID | RowNumber | Data
------------------------------
1 | 1 | Data
2 | 2 | Data
3 | 3 | Data
4 | 1 | Data
5 | 2 | Data
6 | 1 | Data
7 | 2 | Data
8 | 3 | Data
9 | 4 | Data
I want to group each set of RowNumbers So that my result is something like this:
ID | RowNumber | Group | Data
--------------------------------------
1 | 1 | a | Data
2 | 2 | a | Data
3 | 3 | a | Data
4 | 1 | b | Data
5 | 2 | b | Data
6 | 1 | c | Data
7 | 2 | c | Data
8 | 3 | c | Data
9 | 4 | c | Data
The only way I know where each group starts and stops is when the RowNumber starts over. How can I accomplish this? It also needs to be fairly efficient since the table I need to do this on has 52 Million Rows.
Additional Info
ID is truly sequential, but RowNumber may not be. I think RowNumber will always begin with 1 but for example the RowNumbers for group1 could be "1,1,2,2,3,4" and for group2 they could be "1,2,4,6", etc.
For the clarified requirements in the comments
The rownumbers for group1 could be "1,1,2,2,3,4" and for group2 they
could be "1,2,4,6" ... a higher number followed by a lower would be a
new group.
A SQL Server 2012 solution could be as follows.
Use LAG to access the previous row and set a flag to 1 if that row is the start of a new group or 0 otherwise.
Calculate a running sum of these flags to use as the grouping value.
Code
WITH T1 AS
(
SELECT *,
LAG(RowNumber) OVER (ORDER BY ID) AS PrevRowNumber
FROM YourTable
), T2 AS
(
SELECT *,
IIF(PrevRowNumber IS NULL OR PrevRowNumber > RowNumber, 1, 0) AS NewGroup
FROM T1
)
SELECT ID,
RowNumber,
Data,
SUM(NewGroup) OVER (ORDER BY ID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
FROM T2
SQL Fiddle
Assuming ID is the clustered index the plan for this has one scan against YourTable and avoids any sort operations.
If the ids are truly sequential, you can do:
select t.*,
(id - rowNumber) as grp
from t
Also you can use recursive CTE
;WITH cte AS
(
SELECT ID, RowNumber, Data, 1 AS [Group]
FROM dbo.test1
WHERE ID = 1
UNION ALL
SELECT t.ID, t.RowNumber, t.Data,
CASE WHEN t.RowNumber != 1 THEN c.[Group] ELSE c.[Group] + 1 END
FROM dbo.test1 t JOIN cte c ON t.ID = c.ID + 1
)
SELECT *
FROM cte
Demo on SQLFiddle
How about:
select ID, RowNumber, Data, dense_rank() over (order by grp) as Grp
from (
select *, (select min(ID) from [Your Table] where ID > t.ID and RowNumber = 1) as grp
from [Your Table] t
) t
order by ID
This should work on SQL 2005. You could also use rank() instead if you don't care about consecutive numbers.

How can I calculate the remaining amount per row?

I have a table that I want to find for each row id the amount remaining from the total. However, the order of amounts is in an ascending order.
id amount
1 3
2 2
3 1
4 5
The results should look like this:
id remainder
1 10
2 8
3 5
4 0
Any thoughts on how to accomplish this? I'm guessing that the over clause is the way to go, but I can't quite piece it together.Thanks.
Since you didn't specify your RDBMS, I will just assume it's Postgresql ;-)
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl;
Output:
| ID | AMOUNT | REMAINDER |
---------------------------
| 3 | 1 | 10 |
| 2 | 2 | 8 |
| 1 | 3 | 5 |
| 4 | 5 | 0 |
How it works: http://www.sqlfiddle.com/#!1/c446a/5
It works in SQL Server 2012 too: http://www.sqlfiddle.com/#!6/c446a/1
Thinking of solution for SQL Server 2008...
Btw, is your ID just a mere row number? If it is, just do this:
select
row_number() over(order by amount) as rn
, sum(amount) over() - sum(amount) over(order by amount) as remainder
from tbl
order by rn;
Output:
| RN | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
But if you really need the ID intact and move the smallest amount on top, do this:
with a as
(
select *, sum(amount) over() - sum(amount) over(order by amount) as remainder,
row_number() over(order by id) as id_sort,
row_number() over(order by amount) as amount_sort
from tbl
)
select a.id, sort.remainder
from a
join a sort on sort.amount_sort = a.id_sort
order by a.id_sort;
Output:
| ID | REMAINDER |
------------------
| 1 | 10 |
| 2 | 8 |
| 3 | 5 |
| 4 | 0 |
See query progression here: http://www.sqlfiddle.com/#!6/c446a/11
I just want to offer a simpler way to do this in descending order:
select id, sum(amount) over (order by id desc) as Remainder
from t
This will work in Oracle, SQL Server 2012, and Postgres.
The general solution requres a self join:
select t.id, coalesce(sum(tafter.amount), 0) as Remainder
from t left outer join
t tafter
on t.id < tafter.id
group by t.id
SQL Server 2008 answer, I can't provide an SQL Fiddle, it seems it strips the begin keyword, resulting to syntax errors. I tested this on my machine though:
create function RunningTotalGuarded()
returns #ReturnTable table(
Id int,
Amount int not null,
RunningTotal int not null,
RN int identity(1,1) not null primary key clustered
)
as
begin
insert into #ReturnTable(id, amount, RunningTotal)
select id, amount, 0 from tbl order by amount;
declare #RunningTotal numeric(16,4) = 0;
declare #rn_check int = 0;
update #ReturnTable
set
#rn_check = #rn_check + 1
,#RunningTotal =
case when rn = #rn_check then
#RunningTotal + Amount
else
1 / 0
end
,RunningTotal = #RunningTotal;
return;
end;
To achieve your desired output:
with a as
(
select *, sum(amount) over() - RunningTotal as remainder
, row_number() over(order by id) as id_order
from RunningTotalGuarded()
)
select a.id, amount_order.remainder
from a
inner join a amount_order on amount_order.rn = a.id_order;
Rationale for guarded running total: http://www.ienablemuch.com/2012/05/recursive-cte-is-evil-and-cursor-is.html
Choose the lesser evil ;-)