Before&After purchase of a product - sql

I have two tables:
orders_product: all the orders. Each line is a product sold with some details about the order in which it was included. So, if the order has more than 1 product, there are more than 1 line for this order.
orders_grouped: each line is an order with some details about this specific order.
I would like know if there was a previous purchase and a following purchase for each product.
SELECT
product_name,
last_value(product_all_grouped_list) over (partition by ord.customer_id order by created_at asc rows between unbounded preceding and 1 preceding ) as last_order,
last_value(product_all_grouped_list) over (partition by ord.customer_id order by created_at desc rows between unbounded preceding and 1 preceding ) as next_order_products,
last_value(basket_size) over (partition by ord.customer_id order by created_at desc rows between unbounded preceding and 1 preceding ) as next_order_basket_size
FROM
`orders_product` ord
left join `orders_grouped` ordgroup
on ord.order_number=ordgroup.order_number
When the order has only one product (basket_size=1), everything is correct but when the basket_size>1, the results for the first product of this order is OK but for the rest of products of the order is wrong.
Can someone help me?

Because several orders items are present and thus several rows the windows function has to be different.
RANGE instead of ROWS in the over statement.
Also use window at the end:
With tbl as (
Select * from unnest(generate_timestamp_array("2022-09-01","2022-09-15",interval 1 hour)) update_time
)
SELECT
*,
LAST_VALUE(update_time) OVER (ORDER BY update_time ASC ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING ),
timestamp_diff(update_time,timestamp("1999-01-01"),second) ,
LAST_VALUE(update_time) OVER SETUP_window
FROM
tbl
window SETUP_window as (ORDER BY timestamp_diff(update_time,timestamp("1999-01-01"),second) ASC RANGE BETWEEN UNBOUNDED PRECEDING AND 36000 PRECEDING )
order by update_time desc

Related

Faster alternative of MIN/MAX in SQL Server

I need the lowest/highest price of stocks for the past n days. The following query works really slow. I would appreciate faster alternative:
SELECT
*,
MIN(Close) OVER (PARTITION BY Ticker ORDER BY PriceDate ROWS BETWEEN 14 PRECEDING AND 1 PRECEDING) AS MinPrice14d,
MAX(Close) OVER (PARTITION BY Ticker ORDER BY PriceDate ROWS BETWEEN 14 PRECEDING AND 1 PRECEDING) AS MaxPrice14d
FROM
(SELECT CompanyID, Ticker, PriceDate, Close
FROM price.PriceHistoryDaily) a
I need the columns specified.
It is trailing, so I need it day by day.
As for period, I will limit it to one year.
Although it doesn't affect the performance, no subquery is needed. So start with the simpler version:
SELECT phd.CompanyID, phd.Ticker, phd.PriceDate, phd.Close,
min(Close) over (partition by Ticker
order by PriceDate
rows between 14 preceding and 1 preceding
) as MinPrice14d,
max(Close) over (partition by Ticker
order by PriceDate
rows between 14 preceding and 1 preceding
) as MaxPrice14d
FROM price.PriceHistoryDaily phd;
Then try adding an index: PriceHistoryDaily(Ticker, PriceDate).
Note: That this returns all rows from PriceHistoryDaily and -- depending on the size of the table -- that might be what is driving the performance.

Last_Value in SQL Server

with cte
as
(
SELECT
year(h.orderdate)*100+month(h.orderdate) as yearmonth,
YEAR(h.orderdate) as orderyear,
sum(d.OrderQty*d.UnitPrice) as amount
FROM [AdventureWorks].[Sales].[SalesOrderDetail] d
inner join sales.SalesOrderHeader h
on d.SalesOrderID=h.SalesOrderID
group by
year(h.orderdate)*100+month(h.orderdate),
year(h.orderdate)
)
select
c.*,
last_value(c.amount) over (partition by c.orderyear order by c.yearmonth) as lastvalue,
first_value(c.amount) over (partition by c.orderyear order by c.yearmonth) as firstvalue
from cte c
order by c.yearmonth
I am expecting to see the lastvalue of each year (say december value), similar to the firstvalue of each year (jan value). however, last_value is not working at all. It just returns the same value of that month. What did I do wrong?
Thanks for the help.
Your problem is that the default row range for LAST_VALUE is RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, so the value you are getting is the current month's value (that being the last value in that range). To get LAST_VALUE to look at all values in the partition you need to expand the range to include the rows after the current row as well. So you need to change your query to:
last_value(c.amount) over (partition by c.orderyear order by c.yearmonth
rows between unbounded preceding and unbounded following) as lastvalue,

How to group same multiple window functions as one and call by an alias wherever needed in the query?

How can I address the issue of having the same window function multiple times in a single SQL query for different aggregations? Is there any way I can alias it and call it multiple times as needed in the query.
I tried using 'Window' clause for the same but SQL Server currently doesn't support the 'Window' clause.
select empid, qty,
sum(qty) over (partition by empid order by month rows between unbounded preceding and current row) as running_sum,
avg(qty) over (partition by empid order by month rows between unbounded preceding and current row) as running_avg,
min(qty) over (partition by empid order by month rows between unbounded preceding and current row) as running_min,
max(qty) over (partition by empid order by month rows between unbounded preceding and current row) as running_max
from employee
Is there a way to remove the redundancy in the code?
Not in SQL Server, ANSI SQL supports a WINDOWS clause for defining windows which can be re-used. However, SQL Server does not support it.
I think you can slightly simplify your logic:
select empid, qty,
sum(qty) over (partition by empid order by month) as running_sum,
avg(qty) over (partition by empid order by month) as running_avg,
min(qty) over (partition by empid order by month) as running_min,
max(qty) over (partition by empid order by month) as running_max
from employee;

Number of sales relative to historical date in previous year

I have a database containing sales transactions. These are in the following (simplified) format:
sales_id | customer_id | sales_date | number_of_units | total_price
The goal for my query is for each of these transactions, to get the number of sales that this specific customer_id made before the current record, during the whole history of this database, but also during the 365 days before the current record.
Lifetime sales works right now, but the last 365 days part has me stuck. My query right now can identify IF a record had at least one sale in the previous 365 days, and I do it like so:
SELECT sales_id ,customer_id,sales_date,number_of_units,total_price,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY sales_date ASC) as 'LifeTimeSales' ,
CASE WHEN DATEDIFF(DAY,sales_date,LAG(sales_date, 1) OVER (PARTITION BY customer_id ORDER BY sales_date ASC)) > -365
THEN 1 ELSE 0 END as 'Last365Sales'
FROM sales_db
+ some non-important WHERE clauses. After which I aggregate the result of this query in some other ways.
But this does not tell me if this purchase is for example the 4th sale in the previous 365 days of a customer.
Note:
This is a query that runs daily on the full database with 6 million records and growing. I drop and recreate this table right now, which is obviously not efficient. Updating the table when new sales come in would be ideal, but right now this is not possible to create. Any ideas?
Some test data:
sales_id,customer_id,sales_date,number_of_units,total_price
1001,2001,2016-01-01,1,86
1002,2001,2016-08-01,3,98
1003,2001,2017-06-01,2,87
1004,2002,2017-06-01,2,15
+ expected result:
sales_id,customer_id,sales_date,number_of_units,total_price,LifeTimeSales,Last365Sales
1001,2001,2016-01-01,1,86,0,0
1002,2001,2016-08-01,3,98,1,1
1003,2001,2017-06-01,2,87,2,1
1004,2002,2017-06-01,2,15,0,0
For the count of sales before a sale you could use correlated subqueries.
SELECT s1.sales_id,
s1.customer_id,
s1.sales_date,
s1.number_of_units,
s1.total_price,
(SELECT count(*)
FROM sales_db s2
WHERE s2.customer_id = s1.customer_id
AND s2.sales_date <= s1.sales_date) - 1 lifetimesales,
(SELECT count(*)
FROM sales_db s2
WHERE s2.customer_id = s1.customer_id
AND s2.sales_date <= s1.sales_date
AND s2.sales_date >= dateadd(day, s1.sales_date, -356)) - 1 last365sales
FROM sales_db s1;
(I used s2.sales_date <= s1.sales_date and then subtracted 1 from the reuslt, so that multiple sales on the same day, if such data exists, are also counted. But as this also counts the sale of the current row, it has to be decremented by 1.)
I create report view where all required fields are available.
Select all that you need:
with all_history_statistics as
(select customer_id, sales_id, sales_date, number_of_units, total_price,
max(sales_date) over (partition by customer_id order by (select null)) as last_sale_date,
count(sales_id) over (partition by customer_id order by (select null)) total_number_of_sales,
count(sales_id) over (partition by customer_id order by sales_date asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_of_sales_for_current_date,
sum(number_of_units) over (partition by customer_id order by (select null)) total_number_saled_units,
sum(number_of_units) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_saled_units_for_current_date,
sum(total_price) over (partition by customer_id order by (select null)) as total_earned,
sum(total_price) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) earned_for_current_date)
from sales_db),
with last_year_statistics as
(select customer_id, sales_id, sales_date, number_of_units, total_price,
max(sales_date) over (partition by customer_id order by (select null)) as last_sale_date,
count(sales_id) over (partition by customer_id order by (select null)) total_number_of_sales,
count(sales_id) over (partition by customer_id order by sales_date asc ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_of_sales_for_current_date,
sum(number_of_units) over (partition by customer_id order by (select null)) total_number_saled_units,
sum(number_of_units) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) number_saled_units_for_current_date,
sum(total_price) over (partition by customer_id order by (select null)) as total_earned,
sum(total_price) over (partition by customer_id order by sales_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) earned_for_current_date)
from sales_db)
select <specify list of fields which you need>
from all_history_statistics t1 inner join last_year_statistics
on t1.customer_id = t2.cutomer_id
;

Alternative to GROUP BY, when a WINDOW isn't working properly.. SQL

This I feel is a simple answer, but I just can't seem to get it right..
Initially I ran two queries, because I couldn't figure out how to fix this problem in one.. This was the query for my initial table "free2":
WITH prep AS (
SELECT *,
(((odds - 1)/div) + 1) AS ew_odds,
(odds*size) AS possible_win_returns,
(((odds - 1)/div) + 1)*size AS possible_ew_returns
FROM scratch.free
),
prof AS(
SELECT *,
(possible_ew_returns+possible_win_returns) AS possible_total_win,
(possible_win_returns*win) - size AS win_profit,
(possible_ew_returns*places) - size AS ew_profit
FROM prep
)
SELECT
date_trunc(prof.date, DAY) AS DAY,
SUM(ew_odds) AS ew_odds,
SUM(size) AS size,
SUM(odds) AS odds,
SUM(places) AS places,
SUM(div) AS divisor,
SUM (total_size) AS total_size,
SUM(won) AS profit,
SUM(ew_profit) AS ew_prof,
SUM(win_profit) AS win_prof,
SUM(possible_total_win) AS pos_tot_win,
SUM(possible_ew_returns) AS pos_ew_ret,
SUM(possible_win_returns) AS pos_win_ret
FROM prof
GROUP BY 1
ORDER BY day DESC
which grouped all my sums by day, which is what I'm trying to do. Then I LEFT JOINED the second table onto the first by running this second query:
SELECT d.*,
f.ew_odds,
f.size,
f.odds,
f.places,
f.divisor,
f.total_size,
f.profit,
f.ew_prof AS ew_profit,
f.win_prof AS win_profit,
f.pos_tot_win AS possible_total_win,
f.pos_ew_ret AS possible_ew_returns,
f.pos_win_ret AS possible_win_returns,
date_trunc(d.day, week) AS week,
date_trunc(d.day, month) AS month,
date_trunc(d.day, year) AS year,
date_trunc(d.day, quarter) AS quarter
FROM scratch.free2 AS f
LEFT JOIN accounts.daily_movement AS d
ON d.day = f.day
Which as I said, worked fine. However, I need to replicate this as a whole in one query. I can't do this directly, since a GROUP BY clause interferes with LEFT JOIN. So i tried to make all of the first table's values, a window function:
prof AS (
SELECT *,
(possible_ew_returns+possible_win_returns) AS possible_total_win,
(possible_win_returns*win) - size AS win_profit,
(possible_ew_returns*places) - size AS ew_profit,
date_trunc(date, DAY) AS day
FROM calculations
)
sum AS (
SELECT prof.day,
SUM(prof.ew_odds)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS ew_odds,
SUM(prof.size)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS size,
SUM(prof.odds)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS odds,
SUM(prof.places)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS places,
SUM(prof.div)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS divisor,
SUM(prof.total_size)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS total_size,
SUM(prof.won)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS won,
SUM(prof.rico)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS rico,
SUM(prof.won)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS profit,
SUM(prof.ew_profit)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS ew_prof,
SUM(prof.win_profit)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS win_prof,
SUM(prof.possible_total_win)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS possible_tot_win,
SUM(prof.possible_ew_returns)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS possible_ew_returns,
SUM(prof.possible_win_returns)
OVER (PARTITION BY prof.day RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
AS possible_win_returns
FROM prof)
SELECT
sum.*
d.total_euros,
d.profit_bf_exp,
d.percentage_profit,
d.profit_aft_exp,
d.brendan_profit,
d.brendan_transactions,
d.brendan_daily,
d.brendan_percentage,
d.michael_profit,
d.michael_transactions,
d.michael_daily,
d.michael_percentage,
d.general_expenses,
d.thiago_payment,
d.pedro_payment,
d.rodrigues_payment,
d.felipe_payment,
d.expenses_notes,
d.details
FROM sum
LEFT JOIN accounts.daily_movement AS d ON d.day = sum.day
ORDER BY sum.day DESC
I tried changing the RANGE of each WINDOW to ROWS.. but it's still wrong.
What is happening is that all of the grouping per day is not happening correctly, it is showing the date to be singular, and the sum of each value as the same, but there will be about 10-20 rows of the exact same SUMS and DAYs data..
This is what the "size" column and first 5 rows of "day DESC" should look like:
Row day size
1 2017-04-30 1679.27
2 2017-04-29 7292.809999999996
3 2017-04-28 3247.04
4 2017-04-27 2209.2000000000003
5 2017-04-26 2932.42
but instead, it comes out like this:
Row day size
1 2017-04-30 1679.27
2 2017-04-30 1679.27
3 2017-04-30 1679.27
4 2017-04-30 1679.27
5 2017-04-30 1679.27
How do I prevent the repetition of SUMs and days in the data?
Which as I said, worked fine. However, I need to replicate this as a whole in one query...
Try below (for BigQuery StandardSQL)
This is just simple assembly of your two steps in one as you wanted!
If, as you claim, they work for you separatelly - below must work for you too!
#standardSQL
WITH prep AS (
SELECT *,
(((odds - 1)/DIV) + 1) AS ew_odds,
(odds*size) AS possible_win_returns,
(((odds - 1)/DIV) + 1)*size AS possible_ew_returns
FROM scratch.free
),
prof AS(
SELECT *,
(possible_ew_returns+possible_win_returns) AS possible_total_win,
(possible_win_returns*win) - size AS win_profit,
(possible_ew_returns*places) - size AS ew_profit
FROM prep
),
free2 AS (
SELECT
DATE_TRUNC(prof.date, DAY) AS DAY,
SUM(ew_odds) AS ew_odds,
SUM(size) AS size,
SUM(odds) AS odds,
SUM(places) AS places,
SUM(DIV) AS divisor,
SUM (total_size) AS total_size,
SUM(won) AS profit,
SUM(ew_profit) AS ew_prof,
SUM(win_profit) AS win_prof,
SUM(possible_total_win) AS pos_tot_win,
SUM(possible_ew_returns) AS pos_ew_ret,
SUM(possible_win_returns) AS pos_win_ret
FROM prof
GROUP BY 1
)
SELECT d.*,
f.ew_odds,
f.size,
f.odds,
f.places,
f.divisor,
f.total_size,
f.profit,
f.ew_prof AS ew_profit,
f.win_prof AS win_profit,
f.pos_tot_win AS possible_total_win,
f.pos_ew_ret AS possible_ew_returns,
f.pos_win_ret AS possible_win_returns,
DATE_TRUNC(d.day, week) AS week,
DATE_TRUNC(d.day, month) AS month,
DATE_TRUNC(d.day, year) AS year,
DATE_TRUNC(d.day, quarter) AS quarter
FROM free2 AS f
LEFT JOIN accounts.daily_movement AS d
ON d.day = f.day