T-SQL calculate the percent increase or decrease between the earliest and latest for each project - sql

I have a table like below, I am trying to run a query in T-SQL to get the earliest and latest costs for each project_id according to the date column and calculate the percent cost increase or decrease and return the data-set show in the second table (I have simplified the table in this question).
project_id date cost
-------------------------------
123 7/1/17 5000
123 8/1/17 6000
123 9/1/17 7000
123 10/1/17 8000
123 11/1/17 9000
456 7/1/17 10000
456 8/1/17 9000
456 9/1/17 8000
876 1/1/17 8000
876 6/1/17 5000
876 8/1/17 10000
876 11/1/17 8000
Result:
(Edit: Fixed the result)
project_id "cost incr/decr pct"
------------------------------------------------
123 80% which is (9000-5000)/5000
456 -20%
876 0%
Whatever query I run I get duplicates.
This is what I tried:
select distinct
p1.Proj_ID, p1.date, p2.[cost], p3.cost,
(nullif(p2.cost, 0) / nullif(p1.cost, 0)) * 100 as 'OVER UNDER'
from
[PROJECT] p1
inner join
(select
[Proj_ID], [cost], min([date]) min_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p2 on p1.Proj_ID = p2.Proj_ID
inner join
(select
[Proj_ID], [cost], max([date]) max_date
from
[PROJECT]
group by
[Proj_ID], [cost]) p3 on p1.Proj_ID = p3.Proj_ID
where
p1.date in (p2.min_date, p3.max_date)

Unfortunately, SQL Server does not have a first_value() aggregation function. It does have an analytic function, though. So, you can do:
select distinct project_id,
first_value(cost) over (partition by project_id order by date asc) as first_cost,
first_value(cost) over (partition by project_id order by date desc) as last_cost,
(first_value(cost) over (partition by project_id order by date desc) /
first_value(cost) over (partition by project_id order by date asc)
) - 1 as ratio
from project;
If cost is an integer, you may need to convert to a representation with decimal places.

You can use row_number and OUTER APPLY over top 1 ... prior to SQL 2012
select
min_.projectid,
latest_.cost - min_.cost [Calculation]
from
(select
row_number() over (partition by projectid order by date) rn
,projectid
,cost
from projectable) min_ -- get the first dates per project
outer apply (
select
top 1
cost
from projectable
where
projectid = min_.projectid -- get the latest cost for each project
order by date desc
) latest_
where min_.rn = 1

This might perform a little better
;with costs as (
select *,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date) mincost,
ROW_NUMBER() over (PARTITION BY project_id ORDER BY date desc) maxcost
from table1
)
select project_id,
min(case when mincost = 1 then cost end) as cost1,
max(case when maxcost = 1 then cost end) as cost2,
(max(case when maxcost = 1 then cost end) - min(case when mincost = 1 then cost end)) * 100 / min(case when mincost = 1 then cost end) as [OVER UNDER]
from costs a
group by project_id

Related

How to do a Min and Max of date but following the changes in price points

I'm not really sure how to word this question better so I'll provide the data that I have and the result that I'm after.
This is the data that I have
sku sales qty date
A 100 1 1-Jan-19
A 200 2 2-Jan-19
A 100 1 3-Jan-19
A 240 2 4-Jan-19
A 360 3 5-Jan-19
A 360 4 6-Jan-19
A 200 2 7-Jan-19
A 90 1 8-Jan-19
B 100 1 9-Jan-19
B 200 2 10-Jan-19
And this is the result that I'm after
sku price sum(qty) sum(sales) min(date) max(date)
A 100 4 400 1-Jan-19 3-Jan-19
A 120 5 600 4-Jan-19 5-Jan-19
A 90 4 360 6-Jan-19 6-Jan-19
A 100 2 200 7-Jan-19 7-Jan-19
A 90 1 90 8-Jan-19 8-Jan-19
B 100 3 300 9-Jan-19 10-Jan-19
As you can see, I'm trying to get the min and max date of each price point, where price = sales/qty. At this point, I can get the min and max date of the same price but I can separate it when there's another price in between. I think I have to use some sort of min(date) over (partition by sales/qty order by date) but I can't figure it out yet.
I'm using Redshift SQL
This is a gaps-and-islands query. You can do this by generating a sequence and subtracting that from the date. Then aggregate:
select sku, price, sum(qty), sum(sales),
min(date), max(date)
from (select t.*,
row_number() over (partition by sku, price order by date) as seqnum
from t
) t
group by sku, price, (date - seqnum * interval '1 day')
order by sku, price, min(date);
You can do with Sub Query and LAG
FIDDLE DEMO
SELECT SKU, Price, SUM(Qty) SumQty, SUM(Sales) SumSales, MIN(date) MinDate, MAX(date) MaxDate
FROM (
SELECT SKU,Price,SUM(is_change) OVER(order by SKU, date) is_change,Sales, Qty,date
FROM (SELECT SKU, Sales/Qty AS Price, Sales, Qty,date,
CASE WHEN Sales/Qty = lag(Sales/Qty) over (order by SKU, date)
and SKU = lag(SKU) OVER (order by SKU, date) then 0 ELSE 1 END AS is_change
FROM Tbl
)InnerSelect
) X GROUP BY sku, price,is_change
ORDER BY SKU,MIN(date)
Output

SQL - How to count number of distinct values (payments), after sum of rows where they have another column value (Due Date) in common

My 'deals_payments' table is:
Due Date Payment ID
1-Mar-19 1,000.00 123
1-Apr-19 1,000.00 123
1-May-19 1,000.00 123
1-Jun-19 1,000.00 123
1-Jul-19 1,000.00 123
1-Aug-19 1,000.00 123
1-Jun-19 500.00 456
1-Jul-19 500.00 456
1-Aug-19 500.00 456
I have the SQL code:
select
count(*), payment
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp
which gives me what I want - the number of payments on each distinct amount - (here I only asked for ID 123):
COUNT(*) PAYMENT
6 1000.00
But now I need the sum of payments of the two ID's (123 and 456), where the due dates are the same, and count the number of payments on each distinct amount, as:
COUNT(*) PAYMENT
3 1000.00
3 1500.00
I tried the below but it gives me the 'missing right parenthesis' error. What is wrong??
select
count(*),
(select
sum(total) total
from (select distinct
due_date,
(select
sum(payment)
from deals_payments
where (due_date = a.due_date)) as total
from deals_payments a
where a.id in (123, 456)
and payment > 0)
group by due_date
order by due_date) b
from (select deals_payments.*,
(row_number() over (order by due_date) -
row_number() over (partition by payment order by due_date)
) as grp
from deals_payments
where id = 123
) deals_payments
group by grp, payment
order by grp
Taking your earlier comments into consideration, I agree that the SQL can be simplified to get the intended result. My understanding is that the expected output is the frequency of the total payment of a subset of IDs on any given date.
select count(*) as PaymentFrequency, TotalPaidOnDueDate from
(
select due_date, sum(payment) as TotalPaidOnDueDate from #deals_payments
where ID in (123, 456)
group by due_date
) a
group by a.TotalPaidOnDueDate
Here is a sql fiddle I used to verify: http://sqlfiddle.com/#!18/6b04f/1
This seems really strange. I don't understand why your logic is so complicated.
How about this?
select id, count(*), max(payment)
from (select dp.*,
count(*) over (partition by due_date) as cnt
from deal_payments dp
where dp.id in (123, 456)
) dp
where cnt = 2
group by id;
An interesting question. Could this do the trick???
select payment, count(*)
from deals_payments
where due_date in
(select due_date
from deals_payments
group by due_date
having count(*) > 1)
group by payment;
You can add a filter by id if you want, of course.

Additional condition withing partition over

https://www.db-fiddle.com/f/rgLXTu3VysD3kRwBAQK3a4/3
My problem here is that I want function partition over to start counting the rows only from certain time range.
In this example, if I would add rn = 1 at the end, order_id = 5 would be excluded from the results (because partition is ordering by paid_date and there's order_id = 6 with earlier date) but it shouldn't be as I want that time range for partition starts from '2019-01-10'.
Adding condition rn = 1expected output should be order_id 3,5,11,15, now its only 3,11,15
it should include only orders with is_paid = 0 that are the first one within given time range (if there's preceeding order with is_paid = 1 it shouldn't be counted)
use correlated subquery with not exists
DEMO
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date,order_id) rn
FROM orders o
WHERE paid_date between '2019-01-10'
and '2019-01-15'
) x where rn=1 and not exists (select 1 from orders o1 where x.order_id=o1.order_id
and is_paid=1)
OUTPUT:
order_id customer_id amount is_paid paid_date rn
3 101 30 0 10/01/2019 00:00:00 1
5 102 15 0 10/01/2019 00:00:00 1
11 104 31 0 10/01/2019 00:00:00 1
15 105 11 0 10/01/2019 00:00:00 1
If priority should be given to order_id then put that before paid date in the partition function order by clause, this will solve your issue.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY order_id,paid_date) rn
FROM orders o
) x WHERE is_paid = 0 and paid_date between
'2019-01-10' and '2019-01-15' and rn=1
Since you need the paid date to be ordered first you need to imply a where condition in the partitioning table in order to avoid unnecessary dates interrupting the partition function.
SELECT order_id, customer_id, amount, is_paid, paid_date, rn FROM (
SELECT o.*,
ROW_NUMBER() OVER(PARTITION BY customer_id ORDER BY paid_date, order_id) rn
FROM orders o
where paid_date between '2019-01-10' and '2019-01-15'
) x WHERE is_paid = 0 and rn=1

SQL - Same Table join to calculate profit from last entry

I have a table of transactions for various products. I want to calculate the profit made on each
Product Date Profit Incremental Profit
--------------------- --------------------------- -----------
Apple 2016-05-21 100
Banana 2016-05-21 60
Apple 2016-06-15 30
Apple 2016-08-20 10
Banana 2016-08-20 5
Can I create a SQL query that can group based on product and give me incremental profit on every date for each product. For example on 21-05-2015 since it is first date so incremental profit will be 0. But on 15-06-2016 it will be -70 (30-100).
The expected output is:
Product Date Profit Incremental Profit
--------------------- --------------------------- -----------
Apple 2016-05-21 100 0
Banana 2016-05-21 60 0
Apple 2016-06-15 30 -70
Apple 2016-08-20 10 -20
Banana 2016-08-20 5 -55
maybe u can use this.
select
a.product
,a.date
,a.profit
,isnull(a.profit - (select top 1 x.profit from profit x where x.product = a.product and x.date < a.date),0) as profit
from PROFIT a
order by product, date
Try this
DECLARE #Tbl TABLE (Product NVARCHAR(50), Date_ DATETIME, Profit INT)
INSERT INTO #Tbl
VALUES
('Apple' , '2016-05-21', 100),
('Banana', '2016-05-21', 60 ),
('Apple', '2016-06-15', 30 ),
('Apple', '2016-08-20', 10 ),
('Banana', '2016-08-20', 5 )
;WITH CTE
AS
(
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY Product ORDER BY Date_) RowId
FROM #Tbl
)
SELECT
CurrentRow.Product ,
CurrentRow.Date_ ,
CurrentRow.Profit ,
CurrentRow.Profit - ISNULL(PrevRow.Profit, CurrentRow.Profit) 'Incremental Profit'
FROM
CTE CurrentRow LEFT JOIN
(SELECT CTE.Product ,CTE.Profit, CTE.RowId + 1 RowId FROM CTE) PrevRow ON CurrentRow.Product = PrevRow.product AND
CurrentRow.RowId = PrevRow.RowId
ORDER BY CurrentRow.Date_
Result:
Product Date_ Profit Incremental Profit
Apple 2016-05-21 100 0
Banana 2016-05-21 60 0
Apple 2016-06-15 30 -70
Apple 2016-08-20 10 -20
Banana 2016-08-20 5 -55
Edit:
UPDATE #Tbl
SET [Incremental Profit] = A.[Incremental Profit]
FROM
(
SELECT
CurrentRow.Product ,
CurrentRow.Date_ ,
CurrentRow.Profit ,
CurrentRow.Profit - ISNULL(PrevRow.Profit, CurrentRow.Profit) 'Incremental Profit'
FROM
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY Date_) RowId FROM #Tbl) CurrentRow LEFT JOIN
(SELECT *, ROW_NUMBER() OVER (PARTITION BY Product ORDER BY Date_) + 1 RowId FROM #Tbl) PrevRow ON CurrentRow.Product = PrevRow.Product AND
CurrentRow.RowId = PrevRow.RowId
) A
WHERE
[#Tbl].Product = A.Product AND
[#Tbl].Date_ = A.Date_

Finding the interval between dates in SQL Server

I have a table including more than 5 million rows of sales transactions. I would like to find sum of date intervals between each customer three recent purchases.
Suppose my table looks like this :
CustomerID ProductID ServiceStartDate ServiceExpiryDate
A X1 2010-01-01 2010-06-01
A X2 2010-08-12 2010-12-30
B X4 2011-10-01 2012-01-15
B X3 2012-04-01 2012-06-01
B X7 2012-08-01 2013-10-01
A X5 2013-01-01 2015-06-01
The Result that I'm looking for may looks like this :
CustomerID IntervalDays
A 802
B 135
I know the query need to first retrieve 3 resent transactions of each customer (based on ServiceStartDate) and then calculate the interval between startDate and ExpiryDate of his/her transactions.
You want to calculate the difference between the previous row's ServiceExpiryDate and the current row's ServiceStartDate based on descending dates and then sum up the last two differences:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc
, ServiceExpiryDate desc -- don't know if this 2nd column is necessary
) as rn
from tab
)
select t2.customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte as t2 left join cte as t1
on t1.customerId = t2.customerId
and t1.rn = t2.rn+1 -- previous and current row
where t2.rn <= 3 -- last three rows
group by t2.customerId;
Same result using LEAD:
with cte as
(
select tab.*,
row_number()
over (partition by customerId
order by ServiceStartDate desc) as rn
,lead(ServiceExpiryDate)
over (partition by customerId
order by ServiceStartDate desc
) as prevEnd
from tab
)
select customerId,
sum(datediff(day, prevEnd, ServiceStartDate)) as Intervaldays
,count(*) as purchases
from cte
where rn <= 3
group by customerId;
Both will not return the expected result unless you subtract purchases (or max(rn)) from Intervaldays. But as you only sum two differences this seems to be not correct for me either...
Additional logic must be applied based on your rules regarding:
customer has less than 3 purchases
overlapping intervals
Assuming there are no overlaps, I think you want this:
select customerId,
sum(datediff(day, ServiceStartDate, ServieEndDate) as Intervaldays
from (select t.*, row_number() over (partition by customerId
order by ServiceStartDate desc) as seqnum
from table t
) t
where seqnum <= 3
group by customerId;
Try this:
SELECT dt.CustomerID,
SUM(DATEDIFF(DAY, dt.PrevExpiry, dt.ServiceStartDate)) As IntervalDays
FROM (
SELECT *
, ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY ServiceStartDate DESC) AS rn
, (SELECT Max(ti.ServiceExpiryDate)
FROM yourTable ti
WHERE t.CustomerID = ti.CustomerID
AND ti.ServiceStartDate < t.ServiceStartDate) As PrevExpiry
FROM yourTable t )dt
GROUP BY dt.CustomerID
Result will be:
CustomerId | IntervalDays
-----------+--------------
A | 805
B | 138