PostgreSQL window function "lag()" only pulls from current resultset - sql

I'm making a stock ticker as a learning experience for PostgreSQL and AngularJS.
In my ticker query, I attempt to discover the change in price from the previous day. I'm implementing the DB queries in PHP right now for ease of testing and I'll port to AngularJS later.
DB Setup
prices
--pk
--fund (foreign key to funds.pk)
--price
--price_date
funds
--pk
--fund_name
--summary
Query
Get the latest price and the price before it (as well as other info) for each fund with an entry in the prices table.
This $query is a single line in my PHP file.
$query = 'SELECT prices.price_date,
prices.price,
(lag(prices.price) over (ORDER BY prices.price_date DESC)) as last_price,
prices.fund,
funds.fund_name
FROM prices
INNER JOIN funds ON prices.fund=funds.pk
WHERE price_date=(SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1)';
Result
[
{"price_date":"2015-09-08","price":"17.5901","last_price":null,"fund":"1","fund_name":"L Income"},
{"price_date":"2015-09-08","price":"22.8859","last_price":"17.5901","fund":"2","fund_name":"L 2020"},
{"price_date":"2015-09-08","price":"24.6693","last_price":"22.8859","fund":"3","fund_name":"L 2030"},
{"price_date":"2015-09-08","price":"26.1456","last_price":"24.6693","fund":"4","fund_name":"L 2040"},
{"price_date":"2015-09-08","price":"14.7756","last_price":"26.1456","fund":"5","fund_name":"L 2050"},
{"price_date":"2015-09-08","price":"14.8181","last_price":"14.7756","fund":"6","fund_name":"G Fund"},
{"price_date":"2015-09-08","price":"16.93","last_price":"14.8181","fund":"7","fund_name":"F Fund"},
{"price_date":"2015-09-08","price":"26.369","last_price":"16.93","fund":"8","fund_name":"C Fund"},
{"price_date":"2015-09-08","price":"35.9595","last_price":"26.369","fund":"9","fund_name":"S Fund"},
{"price_date":"2015-09-08","price":"24.0362","last_price":"35.9595","fund":"10","fund_name":"I Fund"}
]
As you can see, the lag() window function is only drawing on the current resultset for pulling the previous record's prices.price field.
I am at a loss now. Does anyone have guidance?

I am guessing that you want the previous day's price for the fund. This requires a partition by clause:
SELECT p.price_date, p.price,
lag(p.price) over (PARTITION BY p.fund ORDER BY p.price_date DESC) as last_price,
p.fund, p.fund_name
FROM prices p INNER JOIN
funds f
ON p.fund = f.pk ;
If you then want this only for the last date, then use a subquery:
SELECT pf.*
FROM (SELECT p.price_date, p.price,
lag(p.price) over (PARTITION BY p.fund ORDER BY p.price_date DESC) as last_price,
p.fund, p.fund_name
FROM prices p INNER JOIN
funds f
ON p.fund = f.pk
) pf
WHERE price_date = (SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1);
The WHERE clause is evaluated before the analytic functions, so the filtering affects which record (if any) is chosen by the LAG(). Note: this assumes that the max price_date is the same for all funds, but this is the logic in the question.

If you need to compare it to the price from previous day, you should use a conditional so it always picks up values from the previous day.
SELECT prices.price_date,
prices.price,
case when price_date = (select max(prices_date) from prices) then
lag(prices.price) over (ORDER BY prices.price_date)
end as last_price,
prices.fund,
funds.fund_name
FROM prices
INNER JOIN funds ON prices.fund = funds.pk
WHERE price_date=(SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1)

Related

How do I apply a condition on a related table for the last records?

I have three tables related this way:
Store (a store has many products)
Products (a product has many product stock histories)
Product_Stock_History
Products has a field named status. It has the current stock status. The possible values are 1 (in stock) or any other value (not in stock).
Product_Stock_History also has a status field, with the same possible values.
The query I want to build in SQL is:
For all stores, I want to get all products not in stock, which have the latest 2 records in their history not in stock either.
In short, I want to know which products have been out of stock for 3 days.
I would also like to know how to build the index, so this query runs efficiently.
select p.product_id
from Products p inner join Product_Stock_History ph
on ph.product_id = p.product_id
where p.status <> 1 and ph.status <> 1 and ph.date > current_date - interval '3 days'
group by p.product_id
having count(*) = 2
Without referencing current date:
select p.product_id
from Products p inner join Product_Stock_History ph
on ph.product_id = p.product_id
where p.status <> 1
qualify
row_number() over (
partition by p.product_id order by date desc
) = 1 and
count(*) over (
partition by p.product_id order by date desc
rows between current row and 1 following
) filter (ph.status <> 1) = 2
As Postgres doesn't allow qualify you'll have to wrap those values up first.
with data as (
select p.product_id,
row_number() over (
partition by p.product_id order by dt desc
) rn,
count(*) filter (where ph.status <> 1) over (
partition by p.product_id order by dt desc
rows between current row and 1 following
) ct
from Products p inner join Product_Stock_History ph
on ph.product_id = p.product_id
where p.status <> 1
)
select * from data where rn = 1 and ct = 2;
https://dbfiddle.uk/?rdbms=postgres_14&fiddle=ad252c27153626eb6c3e33fae5ab1eb7
Try this
Select p.* from products p where productid in
Select productid from (
(Select PSH.productid,
row_number() over (partition by PSH.productid order by versionid desc) rn from Product_Stock_History psh where status<>1
)
where
rn<=2) where date_col= current_date-3 and status<>1
You can do what you want with a window query like Himanshu or a group by/having like shawnt00. Or you can reorganize your schema to keep it simple.
Instead of storing a flag, store two timestamps: stocked_at and out_of_stock_at.
stores
products
store_products
store_id references stores
product_id references products
unique(store_id, product_id)
stocked_at timestamp,
out_of_stock_at timestamp,
check (stocked_at != out_of_stock_at)
Calculate its status from them.
select
stocked_at > out_of_stock_at as in_stock
from store_products
You can make this convenient with a generated column.
in_stock boolean generated always as (stocked_at > out_of_stock_at) stored
In short, I want to know which products have been out of stock for 3 days.
select product_id
from store_products
where not in_stock
and out_of_stock_at < current_timestamp - '3 days'::interval
I would also like to know how to build the index, so this query runs efficiently.
Make a composite index on (out_of_stock_at, stocked_at).
Status flags can often be replaced by join tables.
We can make one critical observation.
A store's catalog is different from its inventory.
So we have...
There are products.
There are stores.
Stores have a catalog of products they offer.
Stores have an inventory of products they have in stock.
Expressed as tables and constraints...
stores
products
store_product_catalog
store_id references stores
product_id references products
unique(store_id, product_id)
-- This allows a store to have inventory not in their catalog.
-- If you don't want that, give store_product_catalog an id
-- and relate store_product_inventory to store_product_catalog
store_product_inventory
store_id references stores
product_id references products
unique(store_id, product_id)
quantity
updated_at
Write an update trigger to change store_product_inventory.updated_at when the store_product_inventory.quantity changes.
In short, I want to know which products have been out of stock for 3 days.
select product_id
from store_product_inventory
where quantity = 0
and updated_at < current_timestamp - '3 days'::interval
I would also like to know how to build the index, so this query runs efficiently.
Make a composite index on (quantity, updated_at).

SQL How to select customers with highest transaction amount by state

I am trying to write a SQL query that returns the name and purchase amount of the five customers in each state who have spent the most money.
Table schemas
customers
|_state
|_customer_id
|_customer_name
transactions
|_customer_id
|_transact_amt
Attempts look something like this
SELECT state, Sum(transact_amt) AS HighestSum
FROM (
SELECT name, transactions.transact_amt, SUM(transactions.transact_amt) AS HighestSum
FROM customers
INNER JOIN customers ON transactions.customer_id = customers.customer_id
GROUP BY state
) Q
GROUP BY transact_amt
ORDER BY HighestSum
I'm lost. Thank you.
Expected results are the names of customers with the top 5 highest transactions in each state.
ERROR: table name "customers" specified more than once
SQL state: 42712
First, you need for your JOIN to be correct. Second, you want to use window functions:
SELECT ct.*
FROM (SELECT c.customer_id, c.name, c.state, SUM(t.transact_amt) AS total,
ROW_NUMBER() OVER (PARTITION BY c.state ORDER BY SUM(t.transact_amt) DESC) as seqnum
FROM customers c JOIN
transaactions t
ON t.customer_id = c.customer_id
GROUP BY c.customer_id, c.name, c.state
) ct
WHERE seqnum <= 5;
You seem to have several issues with SQL. I would start with understanding aggregation functions. You have a SUM() with the alias HighestSum. It is simply the total per customer.
You can get them using aggregation and then by using the RANK() window function. For example:
select
state,
rk,
customer_name
from (
select
*,
rank() over(partition by state order by total desc) as rk
from (
select
c.customer_id,
c.customer_name,
c.state,
sum(t.transact_amt) as total
from customers c
join transactions t on t.customer_id = c.customer_id
group by c.customer_id
) x
) y
where rk <= 5
order by state, rk
There are two valid answers already. Here's a third:
SELECT *
FROM (
SELECT c.state, c.customer_name, t.*
, row_number() OVER (PARTITION BY c.state ORDER BY t.transact_sum DESC NULLS LAST, customer_id) AS rn
FROM (
SELECT customer_id, sum(transact_amt) AS transact_sum
FROM transactions
GROUP BY customer_id
) t
JOIN customers c USING (customer_id)
) sub
WHERE rn < 6
ORDER BY state, rn;
Major points
When aggregating all or most rows of a big table, it's typically substantially faster to aggregate before the join. Assuming referential integrity (FK constraints), we won't be aggregating rows that would be filtered otherwise. This might change from nice-to-have to a pure necessity when joining to more aggregated tables. Related:
Why does the following join increase the query time significantly?
Two SQL LEFT JOINS produce incorrect result
Add additional ORDER BY item(s) in the window function to define which rows to pick from ties. In my example, it's simply customer_id. If you have no tiebreaker, results are arbitrary in case of a tie, which may be OK. But every other execution might return different results, which typically is a problem. Or you include all ties in the result. Then we are back to rank() instead of row_number(). See:
PostgreSQL equivalent for TOP n WITH TIES: LIMIT "with ties"?
While transact_amt can be NULL (has not been ruled out) any sum may end up to be NULL as well. With an an unsuspecting ORDER BY t.transact_sum DESC those customers come out on top as NULL comes first in descending order. Use DESC NULLS LAST to avoid this pitfall. (Or define the column transact_amt as NOT NULL.)
PostgreSQL sort by datetime asc, null first?

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

How do I proceed on this query

I want to know if there's a way to display more than one column on an aggregate result but without it affecting the group by.
I need to display the name alongside an aggregate result, but I have no idea what I am missing here.
This is the data I'm working with:
It is the result of the following query:
select * from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
I need to find the salesman that sold the most stuff (total price) for a specific year.
This is what I have so far:
select DATEPART(year,sale.sale_date)'year', Salesman.First_Name,sum(sale.price)
from Salesman, Sale,Buyer
where Salesman.ID = Buyer.Salesman_ID and Buyer.ID = sale.Buyer_ID
group by DATEPART(year,sale.sale_date),Salesman.First_Name
This returns me the total sales made by each salesman.
How do I continue from here to get the top salesman of each year?
Maybe the query I am doing is completely wrong and there is a better way?
Any advice would be helpful.
Thanks.
This should work for you:
select *
from(
select DATEPART(year,s.sale_date) as SalesYear -- Avoid reserved words for object names
,sm.First_Name
,sum(s.price) as TotalSales
,row_number() over (partition by DATEPART(year,s.sale_date) -- Rank the data within the same year as this data row.
order by sum(s.price) desc -- Order by the sum total of sales price, with the largest first (Descending). This means that rank 1 is the highest amount.
) as SalesRank -- Orders your salesmen by the total sales within each year, with 1 as the best.
from Buyer b
inner join Sale s
on(b.ID = s.Buyer_ID)
inner join Salesman sm
on(sm.ID = b.Salesman_ID)
group by DATEPART(year,s.sale_date)
,sm.First_Name
) a
where SalesRank = 1 -- This means you only get the top salesman for each year.
First, never use commas in the FROM clause. Always use explicit JOIN syntax.
The answer to your question is to use window functions. If there is a tie and you wand all values, then RANK() or DENSE_RANK(). If you always want exactly one -- even if there are ties -- then ROW_NUMBER().
select ss.*
from (select year(s.sale_date) as yyyy, sm.First_Name, sum(s.price) as total_price,
row_number() over (partition by year(s.sale_date)
order by sum(s.price) desc
) as seqnum
from Salesman sm join
Sale s
on sm.ID = s.Salesman_ID
group by year(s.sale_date), sm.First_Name
) ss
where seqnum = 1;
Note that the Buyers table is unnecessary for this query.

SQL join using TOP condition in subquery

I'm writing a stock program to improve my programming skills and I've hit a roadblock.
I have two tables I'm working with for:
**stocks**
---------
id
name
symbol
ipo_year
sector
industry
**stock_trends**
----------------
stock_id
trend_id
direction_id
date
price
breakout_price
**trends**
----------
id
type
An entry is made into the stock_trends table for that stock when a condition of one of my four trends are met.
What I'm looking to do is create a query that will return all the information from the stock table and the date from the stock_trends table where the most recent entry in stock_trends for that stock is the trend_id I'm interested in looking at.
I have this query that works great which returns the most recent trend if for a single stock.
SELECT top 1 stock_id, trend_id, [timestamp], price, breakout_price from stock_trends
WHERE stock_id = #stock_id and trend_id = #trend_id order by [timestamp] desc
I just haven't been able to figure out how to write a query that returns the stocks whose top entry in the stock_trends table is the trend I wish to analyze.
Thanks in advance for your help!
Edit
So I have made some progress and I'm almost there. I'm using this query to return the max "timestamp" (it's really a date, just have to fix it) for each stock.
select s.*, v.latest_trend_date from stocks s
join(select stock_id, MAX(timestamp) as latest_trend_date from stock_trends st
group by st.stock_id) v on v.stock_id = s.id
Now if I could only find a way to determine which trend_id "latest_trend_date" is associated with I would be all set!
select stock_id
from stock_trends
where trend_id = (select top 1 trend_id
from stock_trends
order by [timestamp] desc)
This will select all the stock_id that are in the stock_trends table, with the same trend_id as the most recent entry in the stock_trends table.
See if something like this works:
with TrendsRanked as (
select
*,
rank() over (
partition by stock_id
order by [date] desc
) as daterank_by_stock
from stock_trends
)
select
s.id, s.name, s.symbol,
TrendsRanked.[date]
from stocks as S
join TrendsRanked as T
on T.stock_id = S.id
where T.daterank_by_stock = 1
and T.trend_id = #my_trend
The idea here is to add a date ranking to the stock_trends table: for a given stock, daterank_by_stock will equal 1 for the most recent stock_trends row (including ties) for that stock.
Then in the main query, the only results will be those that match the trend you're following (#my_trend) for a row in stock_trends ranked #1.
This gives what I think you want - stock information for stocks whose latest stock_trends entry happens to be an entry for the trend you're following.
I'm assuming that [timestamp] in your original query is "date" from your table model.
select s.*, st1.trend_id, st1.timestamp
from stocks as s
inner join (
select top 1 stock_id, trend_id, [timestamp] as timestamp
from stock_trends
where stock_id = #stock_id and trend_id = #trend_id
order by [timestamp] desc
) as st1
on s.id = st1.stock_id
Putting the sub-query in as a joined in-line view will allow you easily to put the date into the results as you were looking to do in your specification.
"... return all the information from the stock table and the date from the stock_trends table ..."