It's been a while since I used SQL so I'm a bit rusty. Let's say you want to compare the cost of things purchased from the previous month to this month. So an example would be a data table like this...
An item purchased on October cost $3 but the same item cost in September was $2 and $1. So you'd get the max cost of the max date (which would then be the $2 not $1). This would happen for every row of data.
I've done this with a stored scalar-value function, but when handling 100K+ rows of data, speeds are no where near fast. How would you do this with a select query in itself? What I did before was select both the max's in a select statement and only return 1, then call that function in a select statement. I want to do the same without stored procedures or functions for speed reasons. I know the following query won't work because you can only return 1 value, but it's something that I'm going for.
Select
Purchase, Item, USD,
(select MAX(Purchase), MAX(USD) from Table
where Item = 845 and MONTH(Purchase) = MONTH(Purchase) -1) LastCost
from Table
An example of what it should display can be portrayed as this.
What would be the best way to approach this?
Attention:
Select MAX(Purchase), MAX(USD) from Table will not return the highest cost for the highest date, but will return the highest date and the highest cost (no matter of what date).
This is how I would do this (on at least SQL Server 2012):
To get only one record per month and item (with the highest cost on the latest date), I use a numbering for the purchase date and cost (per item and month) with a descending sort order, first by date, then by cost. In the next step, I filter out only those records where the numbering is 1 (max cost for max date per item and month) and use the LAG function to access the previous cost:
WITH
numbering (Purchase, Item, Cost, p_no) AS (
SELECT Purchase,Item, Cost
,ROW_NUMBER() OVER (PARTITION BY Item, EOMONTH(Purchase) ORDER BY Purchase DESC, Cost DESC)
FROM tbl
)
SELECT Purchase, Item, Cost
, LAG(Cost) OVER (PARTITION BY Item ORDER BY Purchase) AS LastCost
FROM numbering
WHERE p_no = 1
SELECT Date, item, usd,
LAG(Date, 1) OVER(Order by date asc) as FormerDate,
LAG(usd, 1) OVER(Order by date asc) as FormerUsd
from (select date, item, max(usd) as usd from Data group by date, item) t
This basically returns the day before the current entry with its max price.
For SQL server 2017 below query will work for sample data
select purchase,item,
substring(usd,CHARINDEX(',',usd),len(usd)) as USD,
substring(usd,1,CHARINDEX(',',usd)) as lastcost from
(select max(purchase) as purchase,item, STRING_AGG (usd, ',') AS usd
from
(
select purchase,item,max(usd) as usd from t
group by purchase,item
) as T group by item
) T1
For your results, you need to use MAX() and ROW_NUMBER() with OVER(). Then partition the records by Item, Year, and Month. This will assure that the sort will be on each item, by each year, by each month. The ROW_NUMBER() will act as as simple way to put the last records at the top of the results, so you'll call row number 1 for each item to get the latest cost. After that, you use it as subquery to refine it as needed. For a start (your sample), you'll need to use CASE in order to split the USD (previous and Last cost). then you do the rest from there (simple methods).
I need to note that it's important to sort the records by year first, then month. then if you need to include the day, include it. This way you'll insure the records will be sorted correctly.
So, the query would look like something like this :
SELECT
MAX(Purchase) Purchase
, MAX(Item) Item
, MAX(CASE WHEN LastCost > USD THEN LastCost ELSE NULL END) USD
, MAX(CASE WHEN LastCost = USD THEN LastCost ELSE NULL END) LastCost
FROM (
SELECT
Purchase
, Item
, USD
, MAX(USD) OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase)) LastCost
, ROW_NUMBER() OVER(PARTITION BY Item, YEAR(Purchase), MONTH(Purchase) ORDER BY MONTH(Purchase)) RN
FROM Table
) D
WHERE
RN = 1
with data as (
select Item, eomonth(Purchase) as PurchaseMonth, max(USD) as MaxUSD
from T
group by Item, eomonth(Purchase)
)
select
PurchaseMonth, Item,
lag(MaxUSD) over (partition by Item order by PurchaseMonth) as PriorUSD
from data;
Related
I have a table with different invoice values (column "invoice") aggregated by month (column "date") and partner (column "partner"). I need to have as output a running sum that starts when my invoice value is negative and continues until the running sum becomes positive. Below is the representation of what would be the result I need:
In the date 2022-05-01, the invoice value is 100, so it should not be considered in the running sum. In the date 2022-06-01, the invoice value is negative (-250) so the running sum should start until it turns into a positive value (which happens in the date 2022-09-01). The same logic should happen continuously: In the date 2022-11-01 the running sum starts again and goes until 2023-01-01 when it becomes positive again.
What SQL query could get me the output I need?
I tried to perform a partition sum based on the partner and ordered by date when the invoice value is negative, however the running sum starts and stop in the negative values and don't consider the positive values in the sequence.
Select
date
,partner
,case when invoice > 0 then 1 else 0 end as index
,sum(invoice) over (partition by partner,index order by date)
from table t
The logic you describe requires iterating over the dataset. Window functions do not provide this feature: in SQL, we would use a recursive query to address this problem.
Redshift recently added support for recursive CTEs: yay! Consider:
with recursive
data (date, partner, invoice, rn) as (
select date, partner, invoice,
row_number() over(partition by partner order by date) rn
from mytable
),
cte (date, partner, invoice, rn, rsum) as (
select date, partner, invoice, rn, invoice
from data
where rn = 1
union all
select d.date, d.partner, d.invoice, d.rn,
case when c.rsum >= 0 and c.rn > 1 then d.invoice else c.rsum + d.invoice end
from cte c
inner join data d on d.partner = c.partner and d.rn = c.rn + 1
)
select * from cte order by partner, date
Here is my dataset,
It has a reservation (unique ID) a reservation_dt a fiscal year (all the same year for the most part) month both numerical and name as well as a reservation status then it has total number reserved followed by a counter (basically
1 for each reservation row)
these are my guidelines (they need to be turned into columns by Month)
Requested - Count of All Distinct reservations
Num_Requested (sum total_number_requested by month)
Booked (count of All Distinct reservations status is order created)
Num_Booked (sum total_number_requested by month) where status is order created
Not_Booked (count of All Distinct reservations where status unfulfilled)
Not_Num_Booked, (sum total_number_requested by month where status is unfulfilled)
I am looking to translate this into a pivot table and this is what I've got so far and can't figure out why its not working.
I figured I would turn each of the above guidlines into a column, using either sum(total_number_Requested) or count(total_requested) where reseravation status is ... and such.
I'm open to any other ideas of how to make this simpler and make it work.
SELECT [month_name],
fyear AS fyear,
Requested,
Num_Requested
FROM (SELECT reservation,
reservation_status,
total_number_requested,
fyear,
[month_name],
[month],
total_requested
FROM #temp2) SourceTable
PIVOT (SUM(total_number_requested)
FOR reservation_status IN ([Requested])) PivotNumbRequested PIVOT(COUNT(reservation)
FOR total_requested IN ([Num_Requested])) PivotCountRequested
WHERE [month] = 7
ORDER BY fyear,
[month];
Use conditional expressions to emulate data pivot. Example:
SELECT fyear, Month, Monthname, Count(*) AS CountALL, Sum(total_number_requested) AS TotNum,
Sum(IIf(reservation_status = "Order Created", total_number_Requested, Null)) AS SumCreated
FROM tablename
GROUP BY fyear, Month, MonthName
More info:
SQLServer - Multiple PIVOT on same columns
Crosstab Query on multiple data points
Using SQL Server Management Studio.
Let's say I have a table with transactions that contains User, Date, Transaction amount. I want a query that will return the date when a certain amount is reached - let's say 100.
For example the same user performs 10 transactions for 10 EUR. I want the query to select the date of the last transaction because that's when his volume reached 100. Of course, once 100 is reached, the query shouldn't change the date with the latest transaction anymore, but leave it at when 100 was reached.
Wrote this on pgadmin but I think syntax should be the same.
with cumulative as
(
select customer_id,
sum(amount) over (partition by customer_id order by payment_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) cum_amt,
payment_date
from payment
)
select customer_id
, min(payment_date) as threshold_reached
from cumulative
where cum_amt>=100
group by customer_id
case when sum(amt) over (partition by user order by date) - amt < 100
and sum(amt) over (partition by user order by date) >= 100
then 1 else 0 end
I'm calculating a moving average of the last 100 sales of a particular item. I'd like to know if user X has spent more than 5 times everyone else combined, on that item in the last 100 sales window.
--how much has the current row user spent on this item over the last 100 sales?
SUM(saleprice) OVER(PARTITION BY item, user ORDER BY saledate ROWS BETWEEN 100 PRECEDING AND CURRENT ROW)
--pseudocode: how much has everyone else, excluding this user, spent on that item over the last 100 sales?
SUM(saleprice) OVER(PARTITION BY item ORDER BY saledate ROWS BETWEEN 100 PRECEDING AND CURRENT ROW WHERE preceding_row.user <> current_row.ruser)
Ultimately, I don't want the purchases made by my big spender to be factored into the total spend by the little spenders. Is there a technique that can exclude rows from a window, if they don't meet some comparison criteria versus the current row? (in my case, don't sum the saleprice from the preceding row if it bears the same user as the current row)
This first one looks fine to me, except you're counting 101 sales. (100 preceding AND the current row)
--how much has the current row user spent on this item over the last 100 sales?
SUM(saleprice)
OVER (
PARTITION BY item, user
ORDER BY saledate
ROWS BETWEEN 100 PRECEDING AND 1 PRECEDING -- 100 excluding this sale
ROWS BETWEEN 99 PRECEDING AND CURRENT ROW -- 100 including this sale
)
(Just use one of the two suggested ROWS BETWEEN clauses)
In the second expression, you can't add a WHERE clause. You can change the aggregation, the partition and the sorting, but I can't see how that would help you. I think you need a correlated sub-query and/or use of OUTER APPLY...
SELECT
*,
SUM(saleprice)
OVER (
PARTITION BY item, user
ORDER BY saledate
ROWS BETWEEN 99 PRECEDING AND CURRENT ROW -- 100 including this sale
)
AS user_total_100_purchases_to_date,
others_sales_top_100_total.sale_price
FROM
sales_data
OUTER APPLY
(
SELECT
SUM(saleprice) AS saleprice
FROM
(
SELECT TOP(100) saleprice
FROM sales_data others_sales
WHERE others_sales.user <> sales_data.user
AND others_sales.item = sales_data.item
AND others_sales.saledate <= sales_data.saledate
ORDER BY others_sales.saledate DESC
)
AS others_sales_top_100
)
AS others_sales_top_100_total
EDIT: Another way to look at it, to make things come consistent
SELECT
*,
usr_last100_saletotal,
all_last100_saletotal,
CASE WHEN usr_last100_saletotal > all_last100_saletotal * 0.8
THEN 'user spent 80%, or more, of last 100 sales'
ELSE 'user spent under 80% of last 100 sales'
END
AS
FROM
sales_data
OUTER APPLY
(
SELECT
SUM(CASE top100.user WHEN sales_data.user THEN top100.saleprice END) AS usr_last100_saletotal,
SUM( top100.saleprice ) AS all_last100_saletotal
FROM
(
SELECT TOP(100) user, saleprice
FROM sales_data AS others_sales
WHERE others_sales.item = sales_data.item
AND others_sales.saledate <= sales_data.saledate
ORDER BY others_sales.saledate DESC
)
AS top100
)
AS top100_summary
I'm having an odd problem
I have a table with the columns product_id, sales and day
Not all products have sales every day. I'd like to get the average number of sales that each product had in the last 10 days where it had sales
Usually I'd get the average like this
SELECT product_id, AVG(sales)
FROM table
GROUP BY product_id
Is there a way to limit the amount of rows to be taken into consideration for each product?
I'm afraid it's not possible but I wanted to check if someone has an idea
Update to clarify:
Product may be sold on days 1,3,5,10,15,17,20.
Since I don't want to get an the average of all days but only the average of the days where the product did actually get sold doing something like
SELECT product_id, AVG(sales)
FROM table
WHERE day > '01/01/2009'
GROUP BY product_id
won't work
If you want the last 10 calendar day since products had a sale:
SELECT product_id, AVG(sales)
FROM table t
JOIN (
SELECT product_id, MAX(sales_date) as max_sales_date
FROM table
GROUP BY product_id
) t_max ON t.product_id = t_max.product_id
AND DATEDIFF(day, t.sales_date, t_max.max_sales_date) < 10
GROUP BY product_id;
The date difference is SQL server specific, you'd have to replace it with your server syntax for date difference functions.
To get the last 10 days when the product had any sale:
SELECT product_id, AVG(sales)
FROM (
SELECT product_id, sales, DENSE_RANK() OVER
(PARTITION BY product_id ORDER BY sales_date DESC) AS rn
FROM Table
) As t_rn
WHERE rn <= 10
GROUP BY product_id;
This asumes sales_date is a date, not a datetime. You'd have to extract the date part if the field is datetime.
And finaly a windowing function free version:
SELECT product_id, AVG(sales)
FROM Table t
WHERE sales_date IN (
SELECT TOP(10) sales_date
FROM Table s
WHERE t.product_id = s.product_id
ORDER BY sales_date DESC)
GROUP BY product_id;
Again, sales_date is asumed to be date, not datetime. Use other limiting syntax if TOP is not suported by your server.
Give this a whirl. The sub-query selects the last ten days of a product where there was a sale, the outer query does the aggregation.
SELECT t1.product_id, SUM(t1.sales) / COUNT(t1.*)
FROM table t1
INNER JOIN (
SELECT TOP 10 day, Product_ID
FROM table t2
WHERE (t2.product_ID=t1.Product_ID)
ORDER BY DAY DESC
)
ON (t2.day=t1.day)
GROUP BY t1.product_id
BTW: This approach uses a correlated subquery, which may not be very performant, but it should work in theory.
I'm not sure if I get it right but If you'd like to get the average of sales for last 10 days for you products you can do as follows :
SELECT Product_Id,Sum(Sales)/Count(*) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id HAVING Count(*)>0
OR You can use AVG Aggregate function which is easier :
SELECT Product_Id,AVG(Sales) FROM (SELECT ProductId,Sales FROM Table WHERE SaleDAte>=#Date) table GROUP BY Product_id
Updated
Now I got what you meant ,As far as I know it is not possible to do this in one query.It could be possible if we could do something like this(Northwind database):
select a.CustomerId,count(a.OrderId)
from Orders a INNER JOIN(SELECT CustomerId,OrderDate FROM Orders Order By OrderDate) AS b ON a.CustomerId=b.CustomerId GROUP BY a.CustomerId Having count(a.OrderId)<10
but you can't use order by in subqueries unless you use TOP which is not suitable for this case.But maybe you can do it as follows:
SELECT PorductId,Sales INTO #temp FROM table Order By Day
select a.ProductId,Sum(a.Sales) /Count(a.Sales)
from table a INNER JOIN #temp AS b ON a.ProductId=b.ProductId GROUP BY a.ProductId Having count(a.Sales)<=10
If this is a table of sales transactions, then there should not be any rows in there for days on which there were no Sales. I.e., If ProductId 21 had no sales on 1 June, then this table should not have any rows with productId = 21 and day = '1 June'... Therefore you should not have to filter anything out - there should not be anything to filter out
Select ProductId, Avg(Sales) AvgSales
From Table
Group By ProductId
should work fine. So if it's not, then you have not explained the problem completely or accurately.
Also, in yr question, you show Avg(Sales) in the example SQL query but then in the text you mention "average number of sales that each product ... " Do you want the average sales amount, or the average count of sales transactions? And do you want this average by Product alone (i.e., one output value reported for each product) or do you want the average per product per day ?
If you want the average per product alone, for just thpse sales in the ten days prior to now? or the ten days prior to the date of the last sale for each product?
If the latter then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where day > (Select Max(Day) - 10
From Table
Where ProductId = T.ProductID)
Group By ProductId
If you want the average per product alone, for just those sales in the ten days with sales prior to the date of the last sale for each product, then
Select ProductId, Avg(Sales) AvgSales
From Table T
Where (Select Count(Distinct day) From Table
Where ProductId = T.ProductID
And Day > T.Day) <= 10
Group By ProductId