Use last value with operations in SQL Server - sql

Let's assume I have a table in SQL Server called Budget_Spend like this
I know, with proper group by, sum and order by reach the next table (it's pretty obvious)
However, I don't how to replicate "Aviable" column, constructed following the logic:
For the first month, it's Budget - Spend - Taxes
For the following months is computed like PREVIOUS(Aviable)-CURRENT(Spend)-CURRENT(Taxes)
I've tried to use LAG function without succes (most of my tries didn't run due to syntax problems).
Any idea of doing? I imagine I need LAG and maybe a CASE in order to get the first value.
This is the DDL for creating the table
/* CREATE TABLE */
CREATE TABLE Budget_Spend(
Month DOUBLE,
Budget DOUBLE,
Spend DOUBLE,
Taxes DOUBLE);
/* INSERT */
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(1, 1000, 75, 11.25);
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(1, 1000, 25, 3.75);
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(2, 1000, 200, 30);
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(3, 1000, 150, 22.5);
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(4, 1000, 10, 1.5);
INSERT INTO Budget_Spend(Month, Budget, Spend, Taxes) VALUES
(4, 1000, 10, 1.5);

You need window function :
select bs.*,
Budget - sum(Spend + Taxes) over (order by month) as Available
from (select month, Budget, sum(Spend) as Spend, sum(Taxes) as Taxes
from Budget_Spend bs
group by month, Budget
) bs;

Related

Get userwise balance and first transaction date of users in SQL

I have created a Transaction table with columns card_id, amount, created_at. There may be more than 1 row of one user so I want to return the value card_id, sum(amount), first created_at date of all users.
CREATE TABLE Transactions(card_id int, amount money, created_at date)
INSERT INTO Transactions(card_id, amount, created_at)
SELECT 1, 500, '2016-01-01' union all
SELECT 1, 100, '2016-01-01' union all
SELECT 1, 100, '2016-01-01' union all
SELECT 1, 200, '2016-01-02' union all
SELECT 1, 300, '2016-01-03' union all
SELECT 2, 100, '2016-01-04' union all
SELECT 2, 200, '2016-01-05' union all
SELECT 3, 700, '2016-01-06' union all
SELECT 1, 100, '2016-01-07' union all
SELECT 2, 100, '2016-01-07' union all
SELECT 3, 100, '2016-01-07'
I have created function for that but one of my client says I need query not function. Can anyone here suggest what query to use?
CREATE FUNCTION [dbo].[card_id_data]()
RETURNS #t TABLE
(
card_id text,
amount money,
dateOfFirstTransaction date
)
AS
BEGIN
INSERT INTO #t(card_id)
SELECT DISTINCT(card_id) FROM Transactions;
UPDATE #t
SET dateOfFirstTransaction = b.createdat
FROM
(SELECT DISTINCT(card_id) cardid,
MIN(created_at) createdat
FROM Transactions
WHERE amount < 0
GROUP BY card_id) b
WHERE card_id = b.cardid;
UPDATE #t
SET amount = T.AMOUNT
FROM
(SELECT
card_id AS cardid, SUM(MIN(AMOUNT)) AMOUNT, created_at
FROM Transactions
WHERE amount < 0
GROUP BY card_id, created_at) T
WHERE card_id = cardid
AND dateOfFirstTransaction = created_at;
RETURN
END
I want a result as shown in this screenshot:
You can use DENSE_RANK for this. It will number the rows, taking into account tied places (same dates)
SELECT
t.card_id,
SumAmount = SUM(amount),
FirstDate = MIN(t.created_at)
FROM (
SELECT *,
rn = DENSE_RANK() OVER (PARTITION BY t.card_id ORDER BY t.created_at)
FROM dbo.Transactions t
) t
WHERE t.rn = 1
GROUP BY t.card_id;
If the dates are actually dates and times, and you want to sum the whole day, change t.created_at to CAST(t.created_at AS date)
Try this:
/*
CREATE TABLE dbo.Transactions
(
card_id INT,
amount MONEY,
created_at DATE
);
INSERT INTO dbo.Transactions (card_id, amount, created_at)
VALUES (1, 500, '2016-01-01'),
(1, 100, '2016-01-01'),
(1, 100, '2016-01-01'),
(1, 200, '2016-01-02'),
(1, 300, '2016-01-03'),
(2, 100, '2016-01-04'),
(2, 200, '2016-01-05'),
(3, 700, '2016-01-06'),
(1, 100, '2016-01-07'),
(2, 100, '2016-01-07'),
(3, 100, '2016-01-07');
*/
WITH FirstDatePerCard AS
(
SELECT
card_id,
FirstDate = MIN(created_at)
FROM
dbo.Transactions
GROUP BY
card_id
)
SELECT DISTINCT
t.card_id,
SumAmount = SUM(amount) OVER (PARTITION BY t.card_id),
FirstDate = f.FirstDate
FROM
FirstDatePerCard f
INNER JOIN
dbo.Transactions t ON f.card_id = t.card_id AND f.FirstDate = t.created_at
You'll get an output something like this:
card_id SumAmount FirstDate
--------------------------------
1 700.00 2016-01-01
2 100.00 2016-01-04
3 700.00 2016-01-06
Is that what you're looking for??
UPDATE: OK, so you want to sum the amount only for the first_date, for every card_id - is that correct? (wasn't clear from the original question)
Updated my solution accordingly

SQL Server - How can I query to return products only when their sales exceeds a certain percentage?

The basic requirement is this: We capture sales by day of week and product. If more than half* of the day's sales came from one product, we want to capture that. Else we show "none".
So image we sell shoes, pants and shirts. On Monday, we sold $100 of each. So it was a three way split, and each category accounted for 33.3% of sales. We show "none". On Tuesday though, half of our sales came from shoes, and on Wednesday, 80% from shirts. So we want to see that.
The query below returns the desired result, but I'm not a fan of a queries within queries within queries. They can be inefficient and hard to read, and I feel like there's a cleaner way. Can this be improved upon?
*The requirement for half will be a parameter (#threshold here). In some cases, we might want to show only when it's 75% or more of sales. Obviously that parameter has to be >= 50%.
declare #sales as table (day_of_week varchar(16), product varchar(8), sales_amt int)
insert into #sales values ('monday', 'shoes', 100)
insert into #sales values ('monday', 'pants', 100)
insert into #sales values ('monday', 'shirts', 100)
insert into #sales values ('tuesday', 'shoes', 500)
insert into #sales values ('tuesday', 'pants', 300)
insert into #sales values ('tuesday', 'shirts', 200)
insert into #sales values ('wednesday', 'shoes', 100)
insert into #sales values ('wednesday', 'pants', 100)
insert into #sales values ('wednesday', 'shirts', 800)
declare #threshold as decimal(3,2) = 0.5
select day_of_week, case when pct_of_day >= #threshold then product else 'none' end half_of_sales from (
select day_of_week, product, pct_of_day, row_number() over (partition by day_of_week order by pct_of_day desc) _rn
from (
select day_of_week, product, sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day
from #sales
group by day_of_week, product
) x
) z
where _rn = 1
maybe a little easier to read?
DECLARE #threshold AS decimal(3, 2) = 0.5;
WITH ssum
AS (SELECT
day_of_week,
SUM(sales_amt) sa
FROM #sales
GROUP BY day_of_week)
SELECT
s.day_of_week,
MAX(CASE WHEN s.sales_amt * 1.0 / ssum.sa >= #threshold THEN s.product ELSE 'none' END) threshold
FROM ssum
INNER JOIN #sales AS s
ON ssum.day_of_week = s.day_of_week
GROUP BY s.day_of_week
Firstly, you can place the nested queries in CTEs, which can make them easier to read. It won't make them more efficient, but then nested queries are not necessarily inefficient in themselves, not sure why you think so
Second, the query could be optimized, because the row-numbering is equally valid on the non-percentaged sum(sales_amt) value, so it can be on the same level as the windowed sum over
declare #threshold as decimal(3,2) = 0.5;
with GroupedSales as (
select
day_of_week,
product,
sum(sales_amt) * 1.0 / sum(sum(sales_amt)) over (partition by day_of_week) pct_of_day,
row_number() over (partition by day_of_week order by sum(sales_amt) desc) _rn
from #sales
group by
day_of_week,
product
)
select
day_of_week,
case when pct_of_day >= #threshold
then product
else 'none'
end half_of_sales
from GroupedSales
where _rn = 1;

How do I make an aggregate on an integer with a grouped column, for which I only want some included?

I have a table prices holding all prices that some products have had:
CREATE TABLE prices (
id INT,
product_id INT, /*Foreign key*/
created_at TIMESTAMP,
price INT
);
The first entity for a product_id is it's initial sales price. If the product is then reduced, a new entity will be added.
I would like to find the mean and total price change per day across all products.
This is some sample data:
INSERT INTO prices (id, product_id, created_at, price) VALUES (1, 1, '2020-01-01', 11000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (2, 2, '2020-01-01', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (3, 3, '2020-01-01', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (4, 4, '2020-01-01', 2000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (5, 1, '2020-01-02', 9999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (6, 2, '2020-01-02', 2999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (7, 5, '2020-01-02', 2999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (8, 1, '2020-01-03', 8999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (9, 1, '2020-01-03 10:00:00', 7000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (10, 5, '2020-01-03', 4000);
INSERT INTO prices (id, product_id, created_at, price) VALUES (11, 6, '2020-01-03', 3999);
INSERT INTO prices (id, product_id, created_at, price) VALUES (12, 3, '2020-01-03', 6999);
The expected result should be:
date mean_price_change total_price_change
2020-01-01 0 0
2020-01-02 1000.5 2001
2020-01-03 1666 4998
Explanation:
Mean price reduction and total on '2020-01-01' was 0 as all products were new on that date.
On '2020-01-02' however the mean price change was: (11000-9999 + 3999-2999)/2 = 1000.5 as both product_id 1 and 2 has been reduced to 9999 and 2999 on that day, and their previous prices were 11000 and 3999 and there total reduction would be: (11000-9999 + 3999-2999) = 2001.
On '2020-01-03' only product_id 1, 3 and 5 were changed. 1 at two different times on the day: 9999 => 8999 => 7000 (last one governing) and 3: going from 9999 => 6999 a then 5: going up from 2999 => 4000. This gives a total of: (9999-7000 + 9999-6999 + 2999-4000) = 4998 and a mean price reduction on that day of: 1666
I have added the data here too: https://www.db-fiddle.com/f/tJgoKFMJxcyg5gLDZMEP77/1
I stated to play around with some DISTINCT ON but that does not seem to do it...
You seem to want lag() and aggregation:
select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
from prices p
) p
group by created_at
order by created_at;
You have two prices for product 1 on 2020-01-03. Once I fix that, I get the same results as in your question. Here is the db<>fiddle.
EDIT:
To handle multiple prices per day:
select created_at, avg(prev_price - price), sum(prev_price - price)
from (select p.*, lag(price) over (partition by product_id order by created_at) as prev_price
from (select distinct on (product_id, created_at::date) p.*
from prices p
order by product_id, created_at::date
) p
) p
group by created_at
order by created_at;
try this
select
created_at,
avg(change),
sum(change)
from
(
with cte as
(
select
id,
product_id,
created_at,
lag(created_at) over(order by product_id, created_at) as last_date,
price
from prices
)
select
c.id,
c.product_id,
c.created_at,
c.last_date,
p.price as last_price,
c.price,
COALESCE(p.price - c.price,0) as change
from cte c
left join prices p on c.product_id =p.product_id and c.last_date =p.created_at
where p.price != c.price or p.price is null
) tmp
group by created_at
order by created_at
The query below tracks all price changes, notice that we join current and earlier based on
their product being the same
earlier is indeed earlier than current
earlier is the latest item on a date earlier than current
current is the latest item on its own date
select today.product_id, (today.price - coalesce(earlier.price)), today.created_at as difference
from prices current
join prices earlier
on today.product_id = earlier.product_id and earlier.created_at < current.created_at
where not exists (
select 1
from prices later
where later.product_id = today.product_id and
(
((today.created_at = later.created_at) and (today.id < later.id)) or
((earlier.created_at <= later.created_at) and (earlier.id < later.id))
)
);
Now, let's do some aggregation:
select created_at, avg(today.price - coalesce(earlier.price)) as mean, sum(today.price - coalesce(earlier.price)) as total
from prices current
left join prices earlier
on today.product_id = earlier.product_id and earlier.created_at < current.created_at
where not exists (
select 1
from prices later
where later.product_id = today.product_id and
(
((today.created_at = later.created_at) and (today.id < later.id)) or
((earlier.created_at <= later.created_at) and (earlier.id < later.id))
)
)
group by created_at
order by created_at;

Find the customers who bought ProductA in any month and then bought ProductB in the immediate next month

Consider this table,
CREATE TABLE ProductSale
(
cust INT,
[Month] INT,
amt INT,
product VARCHAR(255)
)
INSERT INTO ProductSale (cust, Month, amt, product)
VALUES (103, 11, 493, 'pizza'), (103, 12, 304, 'drink'),
(103, 10, 189, 'drink'), (100, 12, 270, 'pizza'),
(100, 11, 187, 'drink'), (102, 8, 378, 'drink'),
(101, 10, 490, 'drink'), (101, 9, 123, 'Pizza')
Customer buy one product in a month and followup with buying another product next month.
I would like to get records of customers who bought Pizza in any month and then bought drink in the immediate next month.
For example, 103 is such customer. 100 looks like one, but he is not.
How can I achieve this using a SQL query?
You may achieve this by using cross apply.
select p.* from ProductSale as p
cross apply (
select * from ProductSale as ps
where p.cust=ps.cust
and p.month+1=ps.month
and ps.product = 'drink'
and p.product='pizza' ) as pg

Get month over month increase in usage for each customer

I have the following table:
DECLARE #MyTable TABLE (
CustomerName nvarchar(max),
[Date] date,
[Service] nvarchar(max),
UniqueUsersForService int
)
INSERT INTO #MyTable VALUES
('CompanyA', '2016-07-14', 'Service1', 100),
('CompanyA', '2016-07-15', 'Service1', 110),
('CompanyA', '2016-07-16', 'Service1', 120),
('CompanyA', '2016-07-14', 'Service2', 200),
('CompanyA', '2016-07-15', 'Service2', 220),
('CompanyA', '2016-07-16', 'Service2', 500),
('CompanyB', '2016-07-14', 'Service1', 10000),
('CompanyB', '2016-07-15', 'Service1', 10500),
('CompanyB', '2016-07-16', 'Service1', 11000),
('CompanyB', '2016-07-14', 'Service2', 200),
('CompanyB', '2016-07-15', 'Service2', 300),
('CompanyB', '2016-07-16', 'Service2', 300)
Basically it's a list that shows how many people used each service for each company. For instance, in CopmanyA, on the 14th of July, 100 unique users used Service1. The actual table contains thousands of customers and dates going back to the 1st of Jan 2015.
I've been researching online for a way to be able to calculate the usage increase month-over-month for each service per customer. What I managed to do so far: I grouped the dates by months.
For instance the date 7/14/2016 is 201607 (the 7th month of 2016) and selected the maximum usage for the respective month. So now I need to figure out how to calculate the difference in usage between June and July for example.
To somehow subtract the usage of June from the one in July. And so on for each month. The end goal is to identify the customers that had the biggest increase in usage - percentagewise. I want to be able to look at the data and say CompanyA was using 100 licenses in March and in April he jumped to 1000. That's a 1000% increase.
I apologize for the way I phrased the question, I am very new to SQL and coding in general and I thank you in advance for any help I might get.
If you are using SQL Server 2012 (and up) you can use LAG function:
;WITH cte AS (
SELECT CustomerName,
LEFT(REPLACE(CONVERT(nvarchar(10),[Date],120),'-',''),6) as [month],
[Service],
MAX(UniqueUsersForService) as MaxUniqueUsersForService
FROM #MyTable
GROUP BY CustomerName,
LEFT(REPLACE(CONVERT(nvarchar(10),[Date],120),'-',''),6),
[Service]
)
SELECT *,
LAG(MaxUniqueUsersForService,1,NULL) OVER (PARTITION BY CustomerName, [Service] ORDER BY [month]) as prevUniqueUsersForService
FROM cte
ORDER BY CustomerName, [month], [Service]
In SQL Server 2008:
;WITH cte AS (
SELECT CustomerName,
LEFT(REPLACE(CONVERT(nvarchar(10),[Date],120),'-',''),6) as [month],
[Service],
MAX(UniqueUsersForService) as MaxUniqueUsersForService
FROM #MyTable
GROUP BY CustomerName,
LEFT(REPLACE(CONVERT(nvarchar(10),[Date],120),'-',''),6),
[Service]
)
SELECT c.*,
p.MaxUniqueUsersForService as prevUniqueUsersForService
FROM cte c
OUTER APPLY (SELECT TOP 1 * FROM cte WHERE CustomerName = c.CustomerName AND [Service] = c.[Service] and [month] < c.[month]) as p
If you're using SQL Server 2012 or newer, try this:
SELECT *
, CASE
WHEN uniqueUsersPrevMonth = 0 THEN uniqueUsersInMonth
ELSE CAST(uniqueUsersInMonth - uniqueUsersPrevMonth as decimal) / uniqueUsersPrevMonth * 100
END AS Increase
FROM (
SELECT customer, service, DATEPART(MONTH, [date]) as [month]
, SUM(uniqueUsers) AS uniqueUsersInMonth
, LAG(SUM(uniqueUsers),1,0) OVER(PARTITION BY customer, service ORDER BY DATEPART(MONTH, [date])) as uniqueUsersPrevMonth
FROM #tbl AS t
GROUP BY customer, service, DATEPART(MONTH, [date])
) AS t1
ORDER BY customer, service, [month]