Flip a Multiple Columns and multiple rows in table in Snowflake - sql

Unable to pivot multiple columns in snowflake and I would appreciate it if some one can help me:
I basically have the table attached in the screenshot in the left and need to change it to the format in the right. I wonder if pivot can work in this case ?
my current code:
select
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) closed_date,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE AMER,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE APAC,
IFNULL(sum(case when STAG='Closed' then REVENUE_AMOUNTS end),0) REVENUE EMEA
from REVENUE_TABLE
where 1=1
group by 1
order by 1 asc
link to screenshot

So assuming the SQL you have posted is more like this (with included fake data in a CTE)
WITH REVENUE_TABLE as (
SELECT * FROM VALUES
('Closed', 1, '2020-01-01'::date, 'amer'),
('Closed', 2, '2020-04-01'::date, 'apac'),
('Closed', 3, '2020-08-01'::date, 'emea'),
('Closed', 4, '2021-01-01'::date, 'emea')
v(stag, REVENUE_AMOUNTS, date, loc)
)
select
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) closed_date,
ZEROIFNULL(sum(IFF(loc='amer' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_AMER,
ZEROIFNULL(sum(IFF(loc='apac' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_APAC,
ZEROIFNULL(sum(IFF(loc='emea' AND STAG='Closed', REVENUE_AMOUNTS, null))) as REVENUE_EMEA
from REVENUE_TABLE
group by 1
order by 1 asc
I swapped you CASE for an IFF and puting the which column does it belong in. And I swapped IFNULL(x, 0) for ZEROIFNULL(x) while longer, it more intent clear.
which gives results that look like your existing output:
CLOSED_DATE
REVENUE_AMER
REVENUE_APAC
REVENUE_EMEA
20-Q1
1
0
0
20-Q2
0
2
0
20-Q3
0
0
3
21-Q1
0
0
4
so if that hold as "the way it is", then to get to "where you want to go" you need to find the distinct set of values or locations, and then join to your results based on that.
select l.loc,
ZEROIFNULL(sum(IFF(r.cd='20-Q1', r.REVENUE_AMOUNTS, null))) as "20-Q1",
ZEROIFNULL(sum(IFF(r.cd='20-Q2', r.REVENUE_AMOUNTS, null))) as "20-Q2",
ZEROIFNULL(sum(IFF(r.cd='20-Q3', r.REVENUE_AMOUNTS, null))) as "20-Q3",
ZEROIFNULL(sum(IFF(r.cd='21-Q1', r.REVENUE_AMOUNTS, null))) as "21-Q1"
from (
select distinct loc
FROM REVENUE_TABLE
) as l
left join (
select loc,
revenue_amounts,
CONCAT(RIGHT(TO_VARCHAR(YEAR(DATE)),2),'-Q',TO_VARCHAR(QUARTER(DATE)) ) cd
FROM REVENUE_TABLE
WHERE STAG='Closed'
) as r on l.loc = r.loc
group by 1
order by 1 asc;
gives:
LOC
20-Q1
20-Q2
20-Q3
21-Q1
amer
1
0
0
0
apac
0
2
0
0
emea
0
0
3
4
Now the downside of this pattern is you need to explicitly know the column names, but you have that problem in the PIVOT case as well. That could be worked around with Snowflake Scripting I believe.

Related

Can I left join twice to do multiple calculations?

I am trying to calculate if a member shops in January, what proportion shop again in February and what proportion shop again within 3 months. Ultimately to create a table similar to the image attached.
I have tried the below code. The first left join works, but when I add the second one to calculate within_3months the error: "FROM keyword not found where expected" is shown (for the separate line). Can I left join twice or must I do separate scripts for columns?
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
select
year_month_january21
, count(distinct A.members) as num_of_mems_shopped_january21
, count(distinct B.members)as retained_february21
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
, count(distinct C.members)/count(distinct A.members) *100 as within_3months
from
(select
members
, year_month as year_month_january21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202101
group by
members
, year_month) A
left join
(select
members
, year_month as year_month_february21
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month = 202102
group by
members
, year_month) B on A.members = B.members
left join
(select
members
, year_month as year_month_3months
from table.members t
join table.date tm on t.dt_key = tm.date_key
and year_month between 202102 and 202104
group by
members
, year_month) C on A.members = C.members
group by
year_month_january21;
I have tried left creating a separate time table and joining to this. It does not work. Doing calculations separately works but I must do this for multiple time frames so will take a long time.
The error isn't coming from the added left join, it's from the as 1month_retention_rate part, because it's an illegal name.
You can see that more simply with:
select dummy as 1month_retention_rate
from dual;
ORA-00923: FROM keyword not found where expected
You could change the column alias so it follows the naming rules (specifically here, does not start with a digit), or if that specific name is actually required then you could make it a quoted identifier - generally not a good option, but sometimes OK in the final output of a query.
fiddle
So in your code you would just change your new line
, count(distinct B.members)/count(distinct A.members) *100 as 1month_retention_rate
to something like
, count(distinct B.members)/count(distinct A.members) *100 as one_month_retention_rate
or with a quoted identifier
, count(distinct B.members)/count(distinct A.members) *100 as "1month_retention_rate"
fiddle - which still errors but now with ORA-00942 as I don't have your tables, and that is after changing your obfuscated schema/table names to something legal too.
There may be more efficient ways to perform the calculation, but that's a separate issue...
I could understand that you want to get :
count of all members who visited in Jan.
count of all members who visited in Jan and visited again in Feb.
count of all members who visited in Jan and visited again in Feb, Mars and April.
If my understanding is true then you could simplify your inner query using IF instead of LEFT JOIN .
Take a look on the following query. Assuming that table members have an ID field :
SELECT
mem_jan AS num_of_mems_shopped_january21,
mem_feb AS retained_february21,
mem_feb / mem_jan * 100 as 1month_retention_rate
mem_3m / mem_jan * 100 as within_3months
FROM(
SELECT
SUM(IF(mm_jan>0,1,0) AS mem_jan,
SUM(IF(mm_jan>0 AND mm_feb>0,1,0) AS mem_feb,
SUM(IF(mm_jan>0 AND mm_count_3m>0,1,0) AS mem_3m
FROM
(
SELECT
t.Id,
SUM(IF(year_month = 202101, 1,0)) AS mm_jan, /*visit for a member in Jan*/
SUM(IF(year_month = 202102, 1,0)) AS mm_feb, /*visit for a member in Feb*/
SUM(IF(year_month between 202102 and 202104,1,0)) AS mem_3m/*visit for a member in 3 months*/
FROM
table.members t
join table.date tm on t.dt_key = tm.date_key
WHERE
year_month between 202101 and 202104
GROUP BY
t.Id
) AS t1
) AS t2
This is not a final running query but it can explain my idea. According to your engine you may use CASE or IF THEN ELSE
Don't use multiple joins, count the shops per member per month and then use conditional aggregation.
In Oracle, that would be:
SELECT 202101 AS year_month,
COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END)
AS members_shopped_202101,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
AS members_retained_202102,
COUNT(CASE WHEN cnt_202101 > 0 AND cnt_202102 > 0 THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS one_month_retention_rate,
COUNT(CASE WHEN cnt_202101 > 0 AND (cnt_202102 > 0 OR cnt_202103 > 0 OR cnt_202104 > 0) THEN 1 END)
/ COUNT(CASE WHEN cnt_202101 > 0 THEN 1 END) * 100
AS within_3months
FROM (
SELECT members,
year_month
FROM members m
INNER JOIN date d
ON m.dt_key = d.date_key
)
PIVOT (
COUNT(*)
FOR year_month IN (
202101 AS cnt_202101,
202102 AS cnt_202102,
202103 AS cnt_202103,
202104 AS cnt_202104
)
);

Get bins range from temporary table SQL

I have a question related to my previous one.
What I have is a database that looks like:
category price date
-------------------------
Cat1 37 2019-03
Cat2 65 2019-03
Cat3 34 2019-03
Cat1 45 2019-03
Cat2 100 2019-03
Cat3 60 2019-03
This db has hundred of categories and comes from another one that has different attributes for each observation.
With this code:
WITH table AS
(
SELECT
category, price, date,
substring(date, 1, 4) AS year,
substring(date, 6, 2) as month
FROM
original_table
WHERE
(year = "2019" or year = "2020")
AND (month = "03")
AND product = "XXXXX"
ORDER BY
anno
)
-- I get this from a bigger table, but prefer to make small steps
-- that anyone in the fute can understand where this comes from as
-- the original table is expected to grow fast
SELECT
category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM
(SELECT
*,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM
table)
WHERE
next_date IS NOT NULL AND Pct_change >= 0
ORDER BY
Pct_change DESC
This code gets me a view of the data that looks like:
category Pct_change period
cat1 0.21 2019-2020
cat2 0.53 2019-2020
cat3 0.76 "
This is great! But my next view has to take this one and provide me with a range that shows how many categories are in each range.
It should look like:
range avg num_cat_in
[0.1- 0.4] 0.3 3
This last table is just an example of what I expect
I have been trying with a code that looks like this but i get nothing
WITH table AS (
SELECT category, price, date, substring(date, 1, 4) AS year, substring(date, 6, 2) as month
FROM original_table
WHERE (year= "2019" or year= "2020") and (month= "03") and product = "XXXXX"
order by anno
)
-- I get this from a bigger table, but prefer to make small steps that anyone in the future can understand where this comes from as the original table is expected to grow fast
SELECT category,
ROUND(1.0 * next_price/ price - 1, 2) Pct_change,
SUBSTR(Date, 1, 4) || '-' || SUBSTR(next_date, 1, 4) Period,
tipo_establecimiento
FROM (
SELECT *,
LEAD(Price) OVER (PARTITION BY category ORDER BY year) next_price,
LEAD(year) OVER (PARTITION BY category ORDER BY year) next_date,
CASE
WHEN (category_2>= 35) AND (category_2 <= 61)
THEN 'S'
ELSE 'N'
END 'tipo_establecimiento'
FROM table
)
WHERE next_date IS NOT NULL AND Pct_change>=0
ORDER BY Pct_change DESC
WHERE next_date IS NOT NULL AND Pct_change>=0
)
SELECT
count(CASE WHEN Pct_change> 0.12 AND Pct_change <= 0.22 THEN 1 END) AS [12 - 22],
count(CASE WHEN Pct_change> 0.22 AND Pct_change <= 0.32 THEN 1 END) AS [22 - 32],
count(CASE WHEN Pct_change> 0.32 AND Pct_change <= 0.42 THEN 1 END) AS [32 - 42],
count(CASE WHEN Pct_change> 0.42 AND Pct_change <= 0.52 THEN 1 END) AS [42 - 52],
count(CASE WHEN Pct_change> 0.52 AND Pct_change <= 0.62 THEN 1 END) AS [52 - 62],
count(CASE WHEN Pct_change> 0.62 AND Pct_change <= 0.72 THEN 1 END) AS [62 - 72],
count(CASE WHEN Pct_change> 0.72 AND Pct_change <= 0.82 THEN 1 END) AS [72 - 82]
Thank you!!!
cf. my comment, I'm first assuming that your ranges are not hard-coded and that you wish to evenly split your data across quantiles of Prc_change. What this means is the calculation will figure out the ranges which split your sample as uniformly as possible. In this case, the following would work (where theview is the name of your previous view which calculates percentages):
select
concat('[',min(Pct_change),'-',min(Pct_change),']') as `range`
, avg(Pct_change) as `avg`
, count(*) as num_cat_in
from(
select *
, ntile(5)over(order by Pct_change) as bin
from theview
) t
group by bin
order by bin;
Here is a fiddle.
If on the other hand your ranges are hard-coded, I assume the ranges are in a table such as the one I create:
create table theranges (lower DOUBLE, upper DOUBLE);
insert into theranges values (0,0.2),(0.2,0.4),(0.4,0.6),(0.6,0.8),(0.8,1);
(You have to make sure that the ranges are non-overlapping. By convention I include percentages in the range from the lower bound included to the upper bound excluded, except for the upper bound of 1 which is included.) It is then a matter of left-joining the tables:
select
concat('[',lower,'-',upper,']') as `range`
, avg(Pct_change) as `avg`
, sum(if(Pct_change is null, 0, 1)) as num_cat_in
from theranges left join theview on (Pct_change>=lower and if(upper=1,true,Pct_change<upper))
group by lower, upper
order by lower;
(Note that in the bit that says upper=1, you must change 1 to whatever your highest hard-coded range is; here I am assuming your percentages are between 0 and 1.)
Here is the second fiddle.

Datediff on 2 rows of a table with a condition

My data looks like the following
TicketID OwnedbyTeamT Createddate ClosedDate
1234 A
1234 A 01/01/2019 01/05/2019
1234 A 10/05/2018 10/07/2018
1234 B 10/04/2019 10/08/2018
1234 finance 11/01/2018 11/11/2018
1234 B 12/02/2018
Now, I want to calculate the datediff between the closeddates for teams A, and B, if the max closeddate for team A is greater than max closeddate team B. If it is smaller or null I don't want to see them. So, for example,I want to see only one record like this :
TicketID (Datediff)result-days
1234 86
and for another tickets, display the info. For example, if the conditions aren't met then:
TicketID (Datediff)result-days
2456 -1111111
Data sample for 2456:
TicketID OwnedbyTeamT Createddate ClosedDate
2456 A
2456 A 10/01/2019 10/05/2019
2456 B 08/05/2018 08/07/2018
2456 B 06/04/2019 06/08/2018
2456 finance 11/01/2018 11/11/2018
2456 B 12/02/2018
I want to see the difference in days between 01/05/2019 for team A, and
10/08/2018 for team B.
Here is the query that I wrote, however, all I see is -1111111, any help please?:
SELECT A.incidentid,
( CASE
WHEN Max(B.[build validation]) <> 'No data'
AND Max(A.crfs) <> 'No data'
AND Max(B.[build validation]) < Max(A.crfs) THEN
Datediff(day, Max(B.[build validation]), Max(A.crfs))
ELSE -1111111
END ) AS 'Days-CRF-diff'
FROM (SELECT DISTINCT incidentid,
Iif(( ownedbyteam = 'B'
AND titlet LIKE '%Build validation%' ), Cast(
closeddatetimet AS NVARCHAR(255)), 'No data') AS
'Build Validation'
FROM incidentticketspecifics) B
INNER JOIN (SELECT incidentid,
Iif(( ownedbyteamt = 'B'
OR ownedbyteamt =
'Finance' ),
Cast(
closeddatetimet AS NVARCHAR(255)), 'No data') AS
'CRFS'
FROM incidentticketspecifics
GROUP BY incidentid,
ownedbyteamt,
closeddatetimet) CRF
ON A.incidentid = B.incidentid
GROUP BY A.incidentid
I hope the following answer will be of help.
With two subqueries for the two teams (A and B), the max date for every Ticket is brought. A left join between these two tables is performed to have these information in the same row in order to perform DATEDIFF. The last WHERE clause keeps the row with the dates greater for A team than team B.
Please change [YourDB] and [MytableName] in the following code with your names.
--Select the items to be viewed in the final view along with the difference in days
SELECT A.[TicketID],A.[OwnedbyTeamT], A.[Max_DateA],B.[OwnedbyTeamT], B.[Max_DateB], DATEDIFF(dd,B.[Max_DateB],A.[Max_DateA]) AS My_Diff
FROM
(
--The following subquery creates a table A with the max date for every project for team A
SELECT [TicketID]
,[OwnedbyTeamT]
,MAX([ClosedDate]) AS Max_DateA
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
HAVING [OwnedbyTeamT]='A')A
--A join between view A and B to bring the max dates for every project
LEFT JOIN (
--The max date for every project for team B
SELECT [TicketID]
,[OwnedbyTeamT]
,MAX([ClosedDate]) AS Max_DateB
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
HAVING [OwnedbyTeamT]='B')B
ON A.[TicketID]=B.[TicketID]
--Fill out the rows on the max dates for the teams
WHERE A.Max_DateA>B.Max_DateB
You might be able to do with a PIVOT. I am leaving a working example.
SELECT [TicketID], "A", "B", DATEDIFF(dd,"B","A") AS My_Date_Diff
FROM
(
SELECT [TicketID],[OwnedbyTeamT],MAX([ClosedDate]) AS My_Max
FROM [YourDB].[dbo].[MytableName]
GROUP BY [TicketID],[OwnedbyTeamT]
)Temp
PIVOT
(
MAX(My_Max)
FOR Temp.[OwnedbyTeamT] in ("A","B")
)PIV
WHERE "A">"B"
Your sample query is quite complicated and has conditions not mentioned in the text. It doesn't really help.
I want to calculate the datediff between the closeddates for teams A, and B, if the max closeddate for team A is greater than max closeddate team B. If it is smaller or null I don't want to see them.
I think you want this per TicketId. You can do this using conditional aggregation:
SELECT TicketId,
DATEDIFF(day,
MAX(CASE WHEN OwnedbyTeamT = 'B' THEN ClosedDate END),
MAX(CASE WHEN OwnedbyTeamT = 'A' THEN ClosedDate END) as diff
)
FROM incidentticketspecifics its
GROUP BY TicketId
HAVING MAX(CASE WHEN OwnedbyTeamT = 'A' THEN ClosedDate END) >
MAX(CASE WHEN OwnedbyTeamT = 'B' THEN ClosedDate END)

Full Outer Join, Coalesce, and Group By (Oh My!)

I'm going to ask this in two parts, because my logic may be way off, and if so, the syntax doesn't really matter.
I have 10 queries. Each query returns month, supplier, and count(some metric). The queries use various tables, joins, etc. Not all month/supplier combinations exist in the output for each query. I would like to combine these into a single data set that can be exported and pivoted on in Excel.
I'd like the output to look like this:
Month | Supplier | Metric1 |Metric2 |..| Metric 10
2018-01 | Supp1 | _value_ | _value_ |...| _value_ |
2018-01 | Supp2 | NULL | _value_ |...| NULL
What is the best / easiest / most efficient way to accomplish this?
I've tried various methods to accomplish the above, but I can't seem to get the syntax quite right. I wanted to make a very simple test case and build upon it, but I only have select privileges on the db, so I am unable to test it out. I was able to create a query that at least doesn't result in any squiggly red error lines, but applying the same logic to the bigger problem doesn't work.
This is what I've got:
create table test1(name varchar(20),credit int);
insert into test1 (name, credit) values ('Ed',1),('Ann',1),('Jim',1),('Ed',1),('Ann',1);
create table test2 (name varchar(10), debit int);
insert into test2 (name, debit) values ('Ann',1),('Sue',1),('Sue',1),('Sue',1);
select
coalesce(a.name, b.name) as name,
cred,
deb
from
(select name, count(credit) as cred
from test1
group by name) a
full outer join
(select name, count(debit) as deb
from test2
group by name) b on
a.name =b.name;
Am I headed down the right path?
UPDATE: Based on Gordon's input, I tried this on the first two queries:
select Month, Supp,
sum(case when which = 1 then metric end) as Exceptions,
sum(case when which = 2 then metric end) as BackOrders
from (
(
select Month, Supp, metric, 1 as which
from (
select (convert(char(4),E.PostDateTime,120)+'-'+convert(char(2),E.PostDateTime,101)) as Month, E.TradingPartner as Supp, count(distinct(E.excNum)) as metric
from db..TrexcMangr E
where (E.DSHERep in ('AVR','BTB') OR E.ReleasedBy in ('AVR','BTB')) AND year(E.PostDateTime) >= '2018'
) a
)
union all
(
select Month, Supp, metric, 2 as which
from (
select (convert(char(4),T.UpdatedDateTime,120)+'-'+convert(char(2),T.UpdatedDateTime,101)) as Month, P.Supplier as Supp, count(*) as metric
from db1..trordertext T
inner join mdid_Tran..trOrderPO P on P.PONum = T.RefNum
where T.TextType = 'BO' AND (T.CreatedBy in ('AVR','BTB') OR T.UpdatedBy in ('AVR','BTB')) AND year(UpdatedDateTime) >=2018
) b
)
) q
group by Month, Supp
... but I'm getting a group by error.
One method uses union all and group by:
select month, supplier,
sum(case when which = 1 then metric end) as metric_01,
sum(case when which = 2 then metric end) as metric_02,
. . .
from ((select Month, Supplier, Metric, 1 as which
from (<query1>) q
. . .
) union all
(select Month, Supplier, Metric, 2 as which
from (<query2>) q
. . .
) union all
. . .
) q
group by month, supplier;
SELECT
CalendarMonthStart,
Supp,
SUM(CASE WHEN metric_id = 1 THEN metric END) as Exceptions,
SUM(CASE WHEN metric_id = 2 THEN metric END) as BackOrders
FROM
(
SELECT
DATEADD(month, DATEDIFF(month, 0, E.PostDateTime), 0) AS CalendarMonthStart,
E.TradingPartner AS Supp,
COUNT(DISTINCT(E.excNum)) AS metric,
1 AS metric_id
FROM
db..TrexcMangr E
WHERE
( E.DSHERep in ('AVR','BTB')
OR E.ReleasedBy in ('AVR','BTB')
)
AND E.PostDateTime >= '2018-01-01'
GROUP BY
1, 2
UNION ALL
SELECT
DATEADD(month, DATEDIFF(month, 0, T.UpdatedDateTime), 0) AS CalendarMonthStart,
T.UpdatedDateTime,
P.Supplier AS Supp,
COUNT(*) AS metric,
2 AS metric_id
FROM
db1..trordertext T
INNER JOIN
mdid_Tran..trOrderPO P
ON P.PONum = T.RefNum
WHERE
( T.CreatedBy in ('AVR','BTB')
OR T.UpdatedBy in ('AVR','BTB')
)
AND T.TextType = 'BO'
AND T.UpdatedDateTime >= '2018-01-01'
GROUP BY
1, 2
)
combined
GROUP BY
CalendarMonthStart,
Supp

SQL Deduct value from multiple rows

I would like to apply total $10.00 discount for each customers.The discount should be applied to multiple transactions until all $10.00 used.
Example:
CustomerID Transaction Amount Discount TransactionID
1 $8.00 $8.00 1
1 $6.00 $2.00 2
1 $5.00 $0.00 3
1 $1.00 $0.00 4
2 $5.00 $5.00 5
2 $2.00 $2.00 6
2 $2.00 $2.00 7
3 $45.00 $10.00 8
3 $6.00 $0.00 9
The query below keeps track of the running sum and calculates the discount depending on whether the running sum is greater than or less than the discount amount.
select
customerid, transaction_amount, transactionid,
(case when 10 > (sum_amount - transaction_amount)
then (case when transaction_amount >= 10 - (sum_amount - transaction_amount)
then 10 - (sum_amount - transaction_amount)
else transaction_amount end)
else 0 end) discount
from (
select customerid, transaction_amount, transactionid,
sum(transaction_amount) over (partition by customerid order by transactionid) sum_amount
from Table1
) t1 order by customerid, transactionid
http://sqlfiddle.com/#!6/552c2/7
same query with a self join which should work on most db's including mssql 2008
select
customerid, transaction_amount, transactionid,
(case when 10 > (sum_amount - transaction_amount)
then (case when transaction_amount >= 10 - (sum_amount - transaction_amount)
then 10 - (sum_amount - transaction_amount)
else transaction_amount end)
else 0 end) discount
from (
select t1.customerid, t1.transaction_amount, t1.transactionid,
sum(t2.transaction_amount) sum_amount
from Table1 t1
join Table1 t2 on t1.customerid = t2.customerid
and t1.transactionid >= t2.transactionid
group by t1.customerid, t1.transaction_amount, t1.transactionid
) t1 order by customerid, transactionid
http://sqlfiddle.com/#!3/552c2/2
You can do this with recursive common table expressions, although it isn't particularly pretty. SQL Server stuggles to optimize these types of query. See Sum of minutes between multiple date ranges for some discussion.
If you wanted to go further with this approach, you'd probably need to make a temporary table of x, so you can index it on (customerid, rn)
;with x as (
select
tx.*,
row_number() over (
partition by customerid
order by transaction_amount desc, transactionid
) rn
from
tx
), y as (
select
x.transactionid,
x.customerid,
x.transaction_amount,
case
when 10 >= x.transaction_amount then x.transaction_amount
else 10
end as discount,
case
when 10 >= x.transaction_amount then 10 - x.transaction_amount
else 0
end as remainder,
x.rn as rn
from
x
where
rn = 1
union all
select
x.transactionid,
x.customerid,
x.transaction_amount,
case
when y.remainder >= x.transaction_amount then x.transaction_amount
else y.remainder
end,
case
when y.remainder >= x.transaction_amount then y.remainder - x.transaction_amount
else 0
end,
x.rn
from
y
inner join
x
on y.rn = x.rn - 1 and y.customerid = x.customerid
where
y.remainder > 0
)
update
tx
set
discount = y.discount
from
tx
inner join
y
on tx.transactionid = y.transactionid;
Example SQLFiddle
I usually like to setup a test environment for such questions. I will use a local temporary table. Please note, I made the data un-ordered since it is not guaranteed in a real life.
-- play table
if exists (select 1 from tempdb.sys.tables where name like '%transactions%')
drop table #transactions
go
-- play table
create table #transactions
(
trans_id int identity(1,1) primary key,
customer_id int,
trans_amt smallmoney
)
go
-- add data
insert into #transactions
values
(1,$8.00),
(2,$5.00),
(3,$45.00),
(1,$6.00),
(2,$2.00),
(1,$5.00),
(2,$2.00),
(1,$1.00),
(3,$6.00);
go
I am going to give you two answers.
First, in 2014 there are new windows functions for rows preceding. This allows us to get a running total (rt) and a rt adjusted by one entry. Give these two values, we can determine if the maximum discount has been exceeded or not.
-- Two running totals for 2014
;
with cte_running_total
as
(
select
*,
SUM(trans_amt)
OVER (PARTITION BY customer_id
ORDER BY trans_id
ROWS BETWEEN UNBOUNDED PRECEDING AND
0 PRECEDING) as running_tot_p0,
SUM(trans_amt)
OVER (PARTITION BY customer_id
ORDER BY trans_id
ROWS BETWEEN UNBOUNDED PRECEDING AND
1 PRECEDING) as running_tot_p1
from
#transactions
)
select
*
,
case
when coalesce(running_tot_p1, 0) <= 10 and running_tot_p0 <= 10 then
trans_amt
when coalesce(running_tot_p1, 0) <= 10 and running_tot_p0 > 10 then
10 - coalesce(running_tot_p1, 0)
else 0
end as discount_amt
from cte_running_total;
Again, the above version is using a common table expression and advanced windowing to get the totals.
Do not fret! The same can be done all the way down to SQL 2000.
Second solution, I am just going to use the order by, sub-queries, and a temporary table to store the information that is normally in the CTE. You can switch the temporary table for a CTE in SQL 2008 if you want.
-- w/o any fancy functions - save to temp table
select *,
(
select count(*) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id <= o.trans_id
) as sys_rn,
(
select sum(trans_amt) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id <= o.trans_id
) as sys_tot_p0,
(
select sum(trans_amt) from #transactions i
where i.customer_id = o.customer_id
and i.trans_id < o.trans_id
) as sys_tot_p1
into #results
from #transactions o
order by customer_id, trans_id
go
-- report off temp table
select
trans_id,
customer_id,
trans_amt,
case
when coalesce(sys_tot_p1, 0) <= 10 and sys_tot_p0 <= 10 then
trans_amt
when coalesce(sys_tot_p1, 0) <= 10 and sys_tot_p0 > 10 then
10 - coalesce(sys_tot_p1, 0)
else 0
end as discount_amt
from #results
order by customer_id, trans_id
go
In short, your answer is show in the following screen shot. Cut and paste the code into SSMS and have some fun.