I am working with a Raiser's Edge database using SQL Server 2005. I have written SQL that will produce a temporary table containing details of direct debit instalments. Below is a small table containing the key variables for the question I'm going to ask, with some fictional data:
Donor_ID Instalment_ID Instalment_Date Amount
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00
1234 1116 01/06/2011 £0.00
2345 2111 01/01/2011 £0.00
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00
2345 2115 01/05/2011 £0.00
2345 2116 01/06/2011 £0.00
As you will see, some of the values in the Amount column are £0.00. This can occur when a donor has insufficient funds in their account, for example.
What I'd like to do is write a SQL query that will create a field containing an incremental count of consecutive £0.00 payments that resets after a non-£0.00 payment or after a change in Donor_ID. I have reproduced the above data below, with the field I'd like to see.
Donor_ID Instalment_ID Instalment_Date Amount New_Field
1234 1111 01/01/2011 £5.00
1234 1112 01/02/2011 £0.00 1
1234 1113 01/03/2011 £5.00
1234 1114 01/04/2011 £5.00
1234 1115 01/05/2011 £0.00 1
1234 1116 01/06/2011 £0.00 2
2345 2111 01/01/2011 £0.00 1
2345 2112 01/02/2011 £5.00
2345 2113 01/03/2011 £5.00
2345 2114 01/04/2011 £0.00 1
2345 2115 01/05/2011 £0.00 2
2345 2116 01/06/2011 £0.00 3
To help clarify what I'm looking for, I think what I'm looking to do would be similar to a winning streak field on a list of a football team's results. For example:
Opponent Score Winning_Streak
Arsenal 1-0 1
Liverpool 0-0
Swansea 3-1 1
Chelsea 2-1 2
Fulham 4-0 3
Stoke 0-0
Man Utd 1-3
Reading 2-1 1
I've considered various options, but have made no progress. Unless I've missed something obvious, I think that a solution more advanced than my current SQL programming level might be required.
If I am thinking about this problem correctly, I believe that you want a row number when the Amount is 0.00 pounds.
Select 0 as As InsufficientCount
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount > 0.00
Union
Select Row_Number() Over (Partition By Donor_ID Order By Installment_ID)
, Donor_ID
, Installment_ID
, Amount
From [Table]
Where Amount = 0.00
This union select should only give you 'ranks' where the Amount equals 0.
Am calling your new field streakAmount
ALTER TABLE instalments ADD streakAmount int NULL;
Then, to update the value:
UPDATE instalments
SET streakAmount =
(SELECT
COUNT(*)
FROM
instalments streak
WHERE
streak.donor_id = instalments.donor_id
AND
streak.instalment_date <= instalments.instalment_date
AND
(streak.instalment_date >
-- find previous instalment date, if any exists
COALESCE(
(
SELECT
MAX(instalment_date)
FROM
instalments prev
WHERE
prev.donor_id = instalments.donor_id
AND
prev.amount > 0
AND
prev.instalment_date < instalments.instalment_date
)
-- otherwise min date
, cast('1753-1-1' AS date))
)
)
WHERE
amount = 0;
http://sqlfiddle.com/#!6/a571f/18
Related
I have a slowly changing table,a new row is created each time any of the source fields are changed. Some metadata is added to show when that version was valid. This is a simplified example(dates are dd/mm/yyyy format) that doesn't show the fields which have changed.
Startdate
Enddate
Currentrecord
unique id
serial_number
15/12/2020
31/12/2020
0
1
2345
15/12/2020
8/3/2021
0
2
1234
19/9/2020
15/2/2021
0
3
2345
15/12/2020
8/3/2021
0
4
3456
9/3/2021
10/3/2021
0
5
3456
16/2/2021
10/3/2021
0
6
2345
9/3/2021
26/3/2021
0
7
1234
27/3/2021
2/5/2021
0
8
1234
11/3/2021
17/5/2021
0
9
3456
3/3/2021
27/4/2021
0
10
4567
20/1/2021
7/4/2021
0
11
5678
3/5/2021
30/6/2021
1
12
1234
25/5/2021
31/5/2021
0
13
2345
8/4/2021
22/5/2021
0
14
5678
1/6/2021
26/6/2021
0
15
2345
18/5/2021
3/6/2021
0
16
3456
27/6/2021
2/8/2021
0
17
2345
28/4/2021
28/6/2021
0
18
4567
23/5/2021
6/9/2021
0
19
5678
4/6/2021
28/6/2021
0
20
3456
29/6/2021
25/7/2021
0
21
3456
3/8/2021
31/12/9999
1
22
2345
26/7/2021
31/12/9999
1
23
3456
15/10/2021
31/12/9999
1
24
4567
7/9/2021
1/11/2021
0
25
5678
22/9/2021
10/11/2021
0
26
6789
2/11/2021
16/11/2021
0
27
5678
17/11/2021
21/11/2021
0
28
5678
15/7/2021
31/12/9999
1
29
7891
22/11/2021
31/12/9999
1
30
5678
26/11/2021
31/12/9999
1
31
6789
15/6/2021
31/12/9999
1
32
8912
There is only one record for each serial_number for any given point in time (i.e. the dates ranges will not overlap for identical serial_numbers) but there might be gaps between episodes for a some serial_numbers (representing something leaving and returning after a gap in service).
I want to supply an arbitrary list of datetimes, say midnight on 01/01/2021, 15/03/2021, 27/05/2021. 23/10/2021. I want to return a set of records, containing every record which was in effect on each of the dates, with each row labelled with the date it was selected by. So the above example should return this.
date
unique id
serial_number
1/1/2021
2
1234
1/1/2021
3
2345
1/1/2021
4
3456
15/3/2021
7
1234
15/3/2021
9
3456
15/3/2021
10
4567
15/3/2021
11
5678
27/5/2021
12
1234
27/5/2021
13
2345
27/5/2021
16
3456
27/5/2021
18
4567
27/5/2021
19
5678
23/10/2021
22
2345
23/10/2021
23
3456
23/10/2021
24
4567
23/10/2021
25
5678
23/10/2021
26
6789
23/10/2021
29
7891
23/10/2021
32
8912
I can see how to do this with a cursor, stepping through each date putting them into a variable and using something like
select #date, [unique id], serial_number
from example
where #date between start_date and end_date
to get the rows.
I can’t work out a pattern that would do it in a set based approach. My preferred SQL version is TSQL. Sorry as this is almost certainly a repeat, but I can't find a form of words that hits a worked example.
You can use a temporary table to accomplish this.
CREATE TABLE #RequestedDates([Date] DATE)
You insert your dates you want into a temporary table.
INSERT INTO #RequestedDates([Date])
VALUES ('2021-01-01'), ('2021-03-15') /*Other dates*/
And then you join with the temporary table and use the between clause to get the valid results.
SELECT rd.[Date]
, t.UniqueId
, t.SerialNumber
FROM MyTable t
INNER JOIN #RequestedDates rd on rd.[Date] BETWEEN t.StartDate AND t.EndDate
ORDER BY rd.[Date]
, t.UniqueId
, t.SerialNumber
You can join to VALUES with the dates you need.
Then join the datetimes on the range.
SELECT
datetimes.dt as [date]
, t.[unique id]
, t.serial_number
FROM example t
JOIN (VALUES
(cast('2021-01-01 00:00:00' as datetime)),
('2021-03-15 00:00:00'),
('2021-05-27 00:00:00'),
('2021-10-23 00:00:00')
) datetimes(dt)
ON datetimes.dt >= t.start_date
AND datetimes.dt <= t.end_date
ORDER BY datetimes.dt, t.[unique id], t.serial_number
I have a transaction table that looks like that:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:11:00 001 101 -45 Y
2021-03-01 10:11:00 001 105 -25 Y
2021-03-01 10:11:00 001 109 -40 Y
The table does not have an id column; the transaction_start for a given store_no will never be the same.
Whenever a transaction is post voided, the transaction is then repeated with the same store_no, item_no but with a negative/minus amount and an equal or higher transaction_start. Also, the column post_voided is then equal to 'Y'.
In the example above, the rows 1-3 have the same transaction_start and store_no, thus belonging to the same receipt, containing three different items (101, 105, 109). The same logic is applied to the other rows: rows 4-5 belong to a same receipt, and so on. In the example, 4 different receipts can be seen. The last receipt, given by the last three rows, is a post voided of the first receipt (rows 1-3).
What I want to do is to change the transaction_start for the post_voided = 'Y' transactions (in my example, only one receipt - represented by the last three rows - has it) to the next/closest datetime of a similar receipt that has the variables store_no, item_no and (negative) amount (but post_voided = 'N') (in my example, the similar ticket is given by the first three rows - store_no, all item_no and (positive) amount match). The transaction_start for the post voided receipt is always equal or higher than the "original" receipt.
Desired output:
transaction_start store_no item_no amount post_voided
2021-03-01 10:00:00 001 101 45 N
2021-03-01 10:00:00 001 105 25 N
2021-03-01 10:00:00 001 109 40 N
2021-03-01 10:05:00 002 103 35 N
2021-03-01 10:05:00 002 135 20 N
2021-03-01 10:08:00 001 140 2 N
2021-03-01 10:00:00 001 101 -45 Y
2021-03-01 10:00:00 001 105 -25 Y
2021-03-01 10:00:00 001 109 -40 Y
Here a link of the table: https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=26142fa24e46acb4213b96c86f4eb94b
Thanks in advance!
Consider below
select a.* replace(ifnull(b.transaction_start, a.transaction_start) as transaction_start)
from `project.dataset.table` a
left join (
select * replace(-amount as amount)
from `project.dataset.table`
where post_voided = 'N'
) b
using (store_no, item_no)
if applied to sample data in your question - output is
Consider below for new / extended example (https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=91f9f180fd672e7c357aa48d18ced5fd)
select x.* replace(ifnull(y.original_transaction_start, x.transaction_start) as transaction_start)
from `project.dataset.table` x
left join (
select b.transaction_start, b.store_no, b.item_no, b.amount amount,
max(a.transaction_start) original_transaction_start
from `project.dataset.table` a
join `project.dataset.table` b
on a.store_no = b.store_no
and a.item_no = b.item_no
and a.amount = -b.amount
and a.post_voided = 'N'
and b.post_voided = 'Y'
and a.transaction_start < b.transaction_start
group by b.transaction_start, b.store_no, b.item_no, b.amount
) y
using (store_no, item_no, amount, transaction_start)
with output
I have 2 dimension tables and 1 fact table as follows:
user_dim
user_id
user_name
user_joining_date
1
Steve
2013-01-04
2
Adam
2012-11-01
3
John
2013-05-05
4
Tony
2012-01-01
5
Dan
2010-01-01
6
Alex
2019-01-01
7
Kim
2019-01-01
bundle_dim
bundle_id
bundle_name
bundle_type
bundle_cost_per_day
101
movies and TV
prime
5.5
102
TV and sports
prime
6.5
103
Cooking
prime
7
104
Sports and news
prime
5
105
kids movie
extra
2
106
kids educative
extra
3.5
107
spanish news
extra
2.5
108
Spanish TV and sports
extra
3.5
109
Travel
extra
2
plans_fact
user_id
bundle_id
bundle_start_date
bundle_end_date
1
101
2019-10-10
2020-10-10
2
107
2020-01-15
(null)
2
106
2020-01-15
2020-12-31
2
101
2020-01-15
(null)
2
103
2020-01-15
2020-02-15
1
101
2020-10-11
(null)
1
107
2019-10-10
2020-10-10
1
105
2019-10-10
2020-10-10
4
101
2021-01-01
2021-02-01
3
104
2020-02-17
2020-03-17
2
108
2020-01-15
(null)
4
102
2021-01-01
(null)
4
103
2021-01-01
(null)
4
108
2021-01-01
(null)
5
103
2020-01-15
(null)
5
101
2020-01-15
2020-02-15
6
101
2021-01-01
2021-01-17
6
101
2021-01-20
(null)
6
108
2021-01-01
(null)
7
104
2020-02-17
(null)
7
103
2020-01-17
2020-01-18
1
102
2020-12-11
(null)
2
106
2021-01-01
(null)
7
107
2020-01-15
(null)
note: NULL bundle_end_date refers to active subscription.
user active days can be calculated as: bundle_end_date - bundle_start_date (for the given bundle)
total revenue per user could be calculated as : total no. of active days * bundle rate per day
I am looking to write a query to find revenue generated per user per year.
Here is what I have for the overall revenue per user:
select pf.user_id
, sum(datediff(day, pf.bundle_start_date, coalesce(pf.bundle_end_date, getdate())) * bd.price_per_day) total_cost_per_bundle
from plans_fact pf
inner join bundle_dim bd on bd.bundle_id = pf.bundle_id
group by pf.user_id
order by pf.user_id;
You need a 'year' table to help parse out each multi-year spanning row into it's seperate years. For each year, you need to also recalculate the start and end dates. That's what I do in the yearParsed cte in the code below. I hard code the years into the join statement that creates y. You probably will do it different but however you get those values will work.
After that, pretty much sum as you did before, just adding the year column to your grouping.
Aside from that, all I did was move the null coalesce logic to the cte to make the overall logic simpler.
with yearParsed as (
select pf.*,
y.year,
startDt = iif(pf.bundle_start_date > y.startDt, pf.bundle_start_date, y.startDt),
endDt = iif(ap.bundle_end_date < y.endDt, ap.bundle_end_date, y.endDt)
from plans_fact pf
cross apply (select bundle_end_date = isnull(pf.bundle_end_date, getdate())) ap
join (values
(2019, '2019-01-01', '2019-12-31'),
(2020, '2020-01-01', '2020-12-31'),
(2021, '2021-01-01', '2021-12-31')
) y (year, startDt, endDt)
on pf.bundle_start_date <= y.endDt
and ap.bundle_end_date >= y.startDt
)
select yp.user_id,
yp.year,
total_cost_per_bundle = sum(datediff(day, yp.startDt, yp.endDt) * bd.bundle_cost_per_day)
from yearParsed yp
join bundle_dim bd on bd.bundle_id = yp.bundle_id
group by yp.user_id,
yp.year
order by yp.user_id,
yp.year;
Now, if this is common, you should probably create a base-table for your 'year' table. But if it's not common, but for this report you don't want to have to keep coming back to hard-code the year information into the y table, you can do this:
declare #yearTable table (
year int,
startDt char(10),
endDt char(10)
);
with y as (
select year = year(min(pf.bundle_start_date))
from #plans_fact pf
union all
select year + 1
from y
where year < year(getdate())
)
insert #yearTable
select year,
startDt = convert(char(4),year) + '-01-01',
endDt = convert(char(4),year) + '-12-31'
from y;
and it will create the appropriate years for you. But you can see why creating a base table may be preferred if you have this or a similar need often.
I have three tables in my database Sales, SalesPeople and Appliances.
Sales
SaleDate EmployeeID AppID Qty
---------- ---------- ----- -----------
2010-01-01 1412 150 1
2010-01-05 3231 110 1
2010-01-03 2920 110 2
2010-01-13 1412 100 1
2010-01-25 1235 150 2
2010-01-22 1235 100 2
2010-01-12 2920 150 3
2010-01-14 3231 100 1
2010-01-15 1235 300 1
2010-01-03 2920 200 2
2010-01-31 2920 310 1
2010-01-05 1412 420 1
2010-01-15 3231 400 2
SalesPeople
EmployeeID EmployeeName CommRate BaseSalary SupervisorID
---------- ------------------------------ ----------- ----------- ------------
1235 Linda Smith 15 1200 1412
1412 Anne Green 12 1800 NULL
2920 Charles Brown 10 1150 1412
3231 Harry Purple 18 1700 1412
Appliances
ID AppType StoreID Cost Price
---- -------------------- ------- ------------- -------------
100 Refrigerator 22 150 250
110 Refrigerator 20 175 300
150 Television 27 225 340
200 Microwave Oven 22 120 180
300 Washer 27 200 325
310 Washer 22 280 400
400 Dryer 20 150 220
420 Dryer 22 240 360
How can I obtain this result? (That displays the profitability of each of the salespeople ordered from the most profitable to the least. Gross is simply the sum of the quantity of items sold multiplied by the price. Commission is calculated from the gross minus the cost of those items (i.e. from
qty*(price-cost)). Net profit is the total profit minus commission.)
Name Gross Commission Net Profit
------------- ----- ---------- ---------
Charles Brown 2380 83.5 751.5
Linda Smith 1505 83.25 471.75
Harry Purple 990 65.7 299.3
Anne Green 950 40.2 294.8
My attempt:
CREATE PROC Profitability AS
SELECT
sp.EmployeeName, (sum(s.Qty) * a.Price) as [Gross],
[Gross] - a.Cost, as [Commision],
SOMETHING as [Net Profit]
FROM
Salespeople sp, Appliances a, Sales s
WHERE
s.AppID = a.ID
AND sp.EmployeeID = s.EmployeeID
GROUP BY
sp.EmployeeName
GO
EXEC Profitability
Simple rule: Never use commas in the FROM clause. Always use explicit JOIN syntax.
In addition to fixing the JOIN syntax, your query needs a few other enhancements for the aggregation functions:
SELECT sp.EmployeeName, sum(s.Qty * a.Price) as Gross,
SUM(s.Qty * (a.Price - a.Cost)) * sp.CommRate / 100.0 as Commission,
SUM(s.Qty * (a.Price - a.Cost)) * (1 - sp.CommRate / 100.0) as NetProfit
FROM Sales s JOIN
Salespeople sp
ON sp.EmployeeID = s.EmployeeID JOIN
Appliances a
ON s.AppID = a.ID
GROUP BY sp.EmployeeName sp.CommRate
ORDER BY NetProfit DESC;
Sorry I did post a question similar earlier, but I was not that clear. I have a table with the fields, Customer, ID_Date, Pstng_Date, SUMOfAmount, Days_BetweenMax and days_between Min.
What I want is a query that shows me the date difference between the pstng_date and the ID_Date where the pstng_date is the max value for that customer and another column that shows the same calculation where the pstng_date is the minimum value for that customer. Those customers with only one Pstng_date should display as zero
So the Query should display the results like this:
Customer ID_Date Pstng_Date SumOfAmount Days_BetweenMAX days_betweenMIN
-------- ---------- ---------- ----------- ------------
Holmes 31/01/2014 10/01/2014 $21,545.59 0 0
James 31/01/2014 10/01/2014 -$21,197.89 0 21
James 31/01/2014 5/01/2014 -$7,823.14 0 0
James 31/01/2014 24/01/2014 $308.00 7 0
Rod 31/01/2014 17/01/2014 -$2,603.95 0 0
Lisa 31/01/2014 17/01/2014 $22,019.49 0 0
Assuming that your existing table is called [Postings], you could create a query to calculate the MIN() and MAX() values of [Pstng_Date]
SELECT
Customer,
MIN(Pstng_Date) AS MinOfPstng_Date,
MAX(Pstng_Date) AS MaxOfPstng_Date
FROM Postings
GROUP BY Customer
returning
Customer MinOfPstng_Date MaxOfPstng_Date
-------- --------------- ---------------
Holmes 2014-01-10 2014-01-10
James 2014-01-05 2014-01-24
Lisa 2014-01-17 2014-01-17
Rod 2014-01-17 2014-01-17
Then you could use that as a subquery in the query to calculate the date differences
SELECT
p.Customer,
p.ID_Date,
p.Pstng_Date,
p.SumOfAmount,
IIf(q.MaxOfPstng_Date=q.MinOfPstng_Date,0,IIf(p.Pstng_Date=q.MaxOfPstng_Date,DateDiff("d",p.Pstng_Date,p.ID_Date),0)) AS Days_BetweenMAX,
IIf(q.MaxOfPstng_Date=q.MinOfPstng_Date,0,IIf(p.Pstng_Date=q.MinOfPstng_Date,DateDiff("d",p.Pstng_Date,p.ID_Date),0)) AS Days_BetweenMIN
FROM
Postings AS p
INNER JOIN
(
SELECT
Customer,
MIN(Pstng_Date) AS MinOfPstng_Date,
MAX(Pstng_Date) AS MaxOfPstng_Date
FROM Postings
GROUP BY Customer
) AS q
ON p.Customer = q.Customer
returning
Customer ID_Date Pstng_Date SumOfAmount Days_BetweenMAX Days_BetweenMIN
-------- ---------- ---------- ----------- --------------- ---------------
Holmes 2014-01-31 2014-01-10 21545.59 0 0
James 2014-01-31 2014-01-10 -21197.89 0 0
James 2014-01-31 2014-01-05 -7823.14 0 26
James 2014-01-31 2014-01-24 308.00 7 0
Rod 2014-01-31 2014-01-17 -2603.95 0 0
Lisa 2014-01-31 2014-01-17 22019.49 0 0