SQL join using TOP condition in subquery - sql

I'm writing a stock program to improve my programming skills and I've hit a roadblock.
I have two tables I'm working with for:
**stocks**
---------
id
name
symbol
ipo_year
sector
industry
**stock_trends**
----------------
stock_id
trend_id
direction_id
date
price
breakout_price
**trends**
----------
id
type
An entry is made into the stock_trends table for that stock when a condition of one of my four trends are met.
What I'm looking to do is create a query that will return all the information from the stock table and the date from the stock_trends table where the most recent entry in stock_trends for that stock is the trend_id I'm interested in looking at.
I have this query that works great which returns the most recent trend if for a single stock.
SELECT top 1 stock_id, trend_id, [timestamp], price, breakout_price from stock_trends
WHERE stock_id = #stock_id and trend_id = #trend_id order by [timestamp] desc
I just haven't been able to figure out how to write a query that returns the stocks whose top entry in the stock_trends table is the trend I wish to analyze.
Thanks in advance for your help!
Edit
So I have made some progress and I'm almost there. I'm using this query to return the max "timestamp" (it's really a date, just have to fix it) for each stock.
select s.*, v.latest_trend_date from stocks s
join(select stock_id, MAX(timestamp) as latest_trend_date from stock_trends st
group by st.stock_id) v on v.stock_id = s.id
Now if I could only find a way to determine which trend_id "latest_trend_date" is associated with I would be all set!

select stock_id
from stock_trends
where trend_id = (select top 1 trend_id
from stock_trends
order by [timestamp] desc)
This will select all the stock_id that are in the stock_trends table, with the same trend_id as the most recent entry in the stock_trends table.

See if something like this works:
with TrendsRanked as (
select
*,
rank() over (
partition by stock_id
order by [date] desc
) as daterank_by_stock
from stock_trends
)
select
s.id, s.name, s.symbol,
TrendsRanked.[date]
from stocks as S
join TrendsRanked as T
on T.stock_id = S.id
where T.daterank_by_stock = 1
and T.trend_id = #my_trend
The idea here is to add a date ranking to the stock_trends table: for a given stock, daterank_by_stock will equal 1 for the most recent stock_trends row (including ties) for that stock.
Then in the main query, the only results will be those that match the trend you're following (#my_trend) for a row in stock_trends ranked #1.
This gives what I think you want - stock information for stocks whose latest stock_trends entry happens to be an entry for the trend you're following.

I'm assuming that [timestamp] in your original query is "date" from your table model.
select s.*, st1.trend_id, st1.timestamp
from stocks as s
inner join (
select top 1 stock_id, trend_id, [timestamp] as timestamp
from stock_trends
where stock_id = #stock_id and trend_id = #trend_id
order by [timestamp] desc
) as st1
on s.id = st1.stock_id
Putting the sub-query in as a joined in-line view will allow you easily to put the date into the results as you were looking to do in your specification.
"... return all the information from the stock table and the date from the stock_trends table ..."

Related

Improve performance of select on select query using temp table

As for Table structure, the table has weekly product prices for per country.
My goal here is to select the lowest price of each product for the most recent week/year per country per product.
The query below fulfills this goal, but is pretty slow performance wise. I was wondering if there is a more efficient way of doing the same task.
In the first part Im selecting the latest Year and week of prices per country. I included the CASE When to account for new year.
Im saving this in a #temptable.
Then I am selecting the min price based on the previous selected Year, Week and Country combo.
DECLARE #date DATE SET #date=getdate()
SELECT YearNb, Max(WeekNb) AS WeekNb, ISOCountryCode INTO #TempTable FROM PriceBenchWeekly
WHERE PriceBenchWeekly.YearNb = CASE WHEN DATEPART(ww,#date) = 1 THEN
Year(#date)-1
ELSE
Year(#date)
END
GROUP BY YearNb, ISOCountryCode
SELECT ProdNb,Min(WeeklyPrice) AS MinPrice, MarketPlayerCode, 'MKT' AS PriceOriginTypeCode, NatCoCode
FROM CE.PriceBenchWeekly INNER JOIN #TempTable ON PriceBenchWeekly.YearNb = #TempTable.YearNb AND
PriceBenchWeekly.WeekNb = #TempTable.WeekNb AND PriceBenchWeekly.ISOCountryCode = #TempTable.ISOCountryCode
GROUP BY PriceBenchweekly.YearNb, PriceBenchWeekly.ISOCountryCode, BNCode, MarketPlayerCode
the table has weekly product prices for per country. My goal here is to select the lowest price of each product for the most recent week/year per country per product.
Use window functions. Without sample data and desired results, it is a little hard to figure out what you really want. But the following gets the minimum price for each product from the most recent week in the data:
select pbw.*
from (select pbw.*,
min(weeklyprice) over (partition by prodnb) as min_weeklyprice
from (select pbw.*,
dense_rank() over (order by year desc, weeknb desc) as seqnum
from CE.PriceBenchWeekly pbw
) pbw
where seqnum = 1
) pbw
where weeklyprice = min_weeklyprice;
If you want to go with temp tables, do not create it using select into, use CREATE TABLE #TempTable instead, then you can create a non clustered index for Year, Week and Country code...
Anyway, I would prefer outer apply
SELECT DISTINCT A.ProductCode, A.CountryCode, B.YearNo, B.WeekNo, B.MinPrice
FROM YourTable A
OUTER APPLY (
SELECT TOP 1 YearNo, WeekNo, Min(Price) AS MinPrice
FROM YourTable
WHERE ProductCode = A.ProductCode AND CountryCode = B.CountryCode
GROUP BY YearNo, WeekNo
ORDER BY YearNo DESC, WeekNo DESC
) B

SQL: Take 1 value per grouping

I have a very simplified table / view like below to illustrate the issue:
The stock column represents the current stock quantity of the style at the retailer. The reason the stock column is included is to avoid joins for reporting. (the table is created for reporting only)
I want to query the table to get what is currently in stock, grouped by stylenumber (across retailers). Like:
select stylenumber,sum(sold) as sold,Max(stock) as stockcount
from MGTest
I Expect to get Stylenumber, Total Sold, Most Recent Stock Total:
A, 6, 15
B, 1, 6
But using ...Max(Stock) I get 10, and with (Sum) I get 25....
I have tried with over(partition.....) also without any luck...
How do I solve this?
I would answer this using window functions:
SELECT Stylenumber, Date, TotalStock
FROM (SELECT M.Stylenumber, M.Date, SUM(M.Stock) as TotalStock,
ROW_NUMBER() OVER (PARTITION BY M.Stylenumber ORDER BY M.Date DESC) as seqnum
FROM MGTest M
GROUP BY M.Stylenumber, M.Date
) m
WHERE seqnum = 1;
The query is a bit tricky since you want a cumulative total of the Sold column, but only the total of the Stock column for the most recent date. I didn't actually try running this, but something like the query below should work. However, because of the shape of your schema this isn't the most performant query in the world since it is scanning your table multiple times to join all of the data together:
SELECT MDate.Stylenumber, MDate.TotalSold, MStock.TotalStock
FROM (SELECT M.Stylenumber, MAX(M.Date) MostRecentDate, SUM(M.Sold) TotalSold
FROM [MGTest] M
GROUP BY M.Stylenumber) MDate
INNER JOIN (SELECT M.Stylenumber, M.Date, SUM(M.Stock) TotalStock
FROM [MGTest] M
GROUP BY M.Stylenumber, M.Date) MStock ON MDate.Stylenumber = MStock.Stylenumber AND MDate.MostRecentDate = MStock.Date
You can do something like this
SELECT B.Stylenumber,SUM(B.Sold),SUM(B.Stock) FROM
(SELECT Stylenumber AS 'Stylenumber',SUM(Sold) AS 'Sold',MAX(Stock) AS 'Stock'
FROM MGTest A
GROUP BY RetailerId,Stylenumber) B
GROUP BY B.Stylenumber
if you don't want to use joins
My solution, like that of Gordon Linoff, will use the window functions. But in my case, everything will turn around the RANK window function.
SELECT stylenumber, sold, SUM(stock) totalstock
FROM (
SELECT
stylenumber,
SUM(sold) OVER(PARTITION BY stylenumber) sold,
RANK() OVER(PARTITION BY stylenumber ORDER BY [Date] DESC) r,
stock
FROM MGTest
) T
WHERE r = 1
GROUP BY stylenumber, sold

PostgreSQL window function "lag()" only pulls from current resultset

I'm making a stock ticker as a learning experience for PostgreSQL and AngularJS.
In my ticker query, I attempt to discover the change in price from the previous day. I'm implementing the DB queries in PHP right now for ease of testing and I'll port to AngularJS later.
DB Setup
prices
--pk
--fund (foreign key to funds.pk)
--price
--price_date
funds
--pk
--fund_name
--summary
Query
Get the latest price and the price before it (as well as other info) for each fund with an entry in the prices table.
This $query is a single line in my PHP file.
$query = 'SELECT prices.price_date,
prices.price,
(lag(prices.price) over (ORDER BY prices.price_date DESC)) as last_price,
prices.fund,
funds.fund_name
FROM prices
INNER JOIN funds ON prices.fund=funds.pk
WHERE price_date=(SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1)';
Result
[
{"price_date":"2015-09-08","price":"17.5901","last_price":null,"fund":"1","fund_name":"L Income"},
{"price_date":"2015-09-08","price":"22.8859","last_price":"17.5901","fund":"2","fund_name":"L 2020"},
{"price_date":"2015-09-08","price":"24.6693","last_price":"22.8859","fund":"3","fund_name":"L 2030"},
{"price_date":"2015-09-08","price":"26.1456","last_price":"24.6693","fund":"4","fund_name":"L 2040"},
{"price_date":"2015-09-08","price":"14.7756","last_price":"26.1456","fund":"5","fund_name":"L 2050"},
{"price_date":"2015-09-08","price":"14.8181","last_price":"14.7756","fund":"6","fund_name":"G Fund"},
{"price_date":"2015-09-08","price":"16.93","last_price":"14.8181","fund":"7","fund_name":"F Fund"},
{"price_date":"2015-09-08","price":"26.369","last_price":"16.93","fund":"8","fund_name":"C Fund"},
{"price_date":"2015-09-08","price":"35.9595","last_price":"26.369","fund":"9","fund_name":"S Fund"},
{"price_date":"2015-09-08","price":"24.0362","last_price":"35.9595","fund":"10","fund_name":"I Fund"}
]
As you can see, the lag() window function is only drawing on the current resultset for pulling the previous record's prices.price field.
I am at a loss now. Does anyone have guidance?
I am guessing that you want the previous day's price for the fund. This requires a partition by clause:
SELECT p.price_date, p.price,
lag(p.price) over (PARTITION BY p.fund ORDER BY p.price_date DESC) as last_price,
p.fund, p.fund_name
FROM prices p INNER JOIN
funds f
ON p.fund = f.pk ;
If you then want this only for the last date, then use a subquery:
SELECT pf.*
FROM (SELECT p.price_date, p.price,
lag(p.price) over (PARTITION BY p.fund ORDER BY p.price_date DESC) as last_price,
p.fund, p.fund_name
FROM prices p INNER JOIN
funds f
ON p.fund = f.pk
) pf
WHERE price_date = (SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1);
The WHERE clause is evaluated before the analytic functions, so the filtering affects which record (if any) is chosen by the LAG(). Note: this assumes that the max price_date is the same for all funds, but this is the logic in the question.
If you need to compare it to the price from previous day, you should use a conditional so it always picks up values from the previous day.
SELECT prices.price_date,
prices.price,
case when price_date = (select max(prices_date) from prices) then
lag(prices.price) over (ORDER BY prices.price_date)
end as last_price,
prices.fund,
funds.fund_name
FROM prices
INNER JOIN funds ON prices.fund = funds.pk
WHERE price_date=(SELECT price_date FROM prices ORDER BY price_date DESC LIMIT 1)

select multiple records based on order by

i have a table with a bunch of customer IDs. in a customer table is also these IDs but each id can be on multiple records for the same customer. i want to select the most recently used record which i can get by doing order by <my_field> desc
say i have 100 customer IDs in this table and in the customers table there is 120 records with these IDs (some are duplicates). how can i apply my order by condition to only get the most recent matching records?
dbms is sql server 2000.
table is basically like this:
loc_nbr and cust_nbr are primary keys
a customer shops at location 1. they get assigned loc_nbr = 1 and cust_nbr = 1
then a customer_id of 1.
they shop again but this time at location 2. so they get assigned loc_nbr = 2 and cust_Nbr = 1. then the same customer_id of 1 based on their other attributes like name and address.
because they shopped at location 2 AFTER location 1, it will have a more recent rec_alt_ts value, which is the record i would want to retrieve.
You want to use the ROW_NUMBER() function with a Common Table Expression (CTE).
Here's a basic example. You should be able to use a similar query with your data.
;WITH TheLatest AS
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY group-by-fields ORDER BY sorting-fields) AS ItemCount
FROM TheTable
)
SELECT *
FROM TheLatest
WHERE ItemCount = 1
UPDATE: I just noticed that this was tagged with sql-server-2000. This will only work on SQL Server 2005 and later.
Since you didn't give real table and field names, this is just psuedo code for a solution.
select *
from customer_table t2
inner join location_table t1
on t1.some_key = t2.some_key
where t1.LocationKey = (select top 1 (LocationKey) as LatestLocationKey from location_table where cust_id = t1.cust_id order by some_field)
Use an aggregate function in the query to group by customer IDs:
SELECT cust_Nbr, MAX(rec_alt_ts) AS most_recent_transaction, other_fields
FROM tableName
GROUP BY cust_Nbr, other_fields
ORDER BY cust_Nbr DESC;
This assumes that rec_alt_ts increases every time, thus the max entry for that cust_Nbr would be the most recent entry.
By using time and date we can take out the recent detail for the customer.
use the column from where you take out the date and the time for the customer.
eg:
SQL> select ename , to_date(hiredate,'dd-mm-yyyy hh24:mi:ss') from emp order by to_date(hiredate,'dd-mm-yyyy hh24:mi:ss');

Product sales by month - SQL

I just created a small data warehouse with the following details.
Fact Table
Sales
Dimensions
Supplier
Products
Time (Range is one year)
Stores
I want to query which product has the max sales by month, I mean the output to be like
Month - Product Code - Num_Of_Items
JAN xxxx xxxxx
FEB xxxx xxxxx
I tried the following query
with product_sales as(
SELECT dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num
FROM fact_sales fs
INNER JOIN dim_products dp
ON fs.p_id = dp.p_id
INNER JOIN dim_date dd
ON dd.date_id = fs.date_id
GROUP BY dd.month,
fs.p_id,
dp.title
)
select distinct month,movie_id,max(num)
from product_sales
group by movie_id,title, month;
Instead of max of 12 rows, I am having 132 records. I need guidance with this. Thanks.
There are a few things about your query that don't make sense, such as:
Where does movie_id come from?
What is from abc? Should it be from product_sales?
That said, if you need the maximum product sales by month and you need to include the product code (or movie ID or whatever), you need an analytical query. Yours would go something like this:
WITH product_sales AS (
SELECT
dd.month,
fs.p_id,
dp.title,
SUM(number_of_items) Num,
RANK() OVER (PARTITION BY dd.month ORDER BY SUM(number_of_items) DESC) NumRank
FROM fact_sales fs
INNER JOIN dim_products dp ON fs.p_id = dp.p_id
INNER JOIN dim_date dd ON dd.date_id = fs.date_id
GROUP BY dd.month, fs.p_id, dp.title
)
SELECT month, p_id, title, num
FROM product_sales
WHERE NumRank = 1
Note that if there's a tie for top sales in any month, this query will show all top sales for the month. In other words, if product codes AAAA and BBBB are tied for top sales in January, the query results will have a January row for both products.
If you want just one row per month even if there's a tie, use ROW_NUMBER instead of RANK(), but note that ROW_NUMBER will arbitrarily pick a winner unless you define a tie-breaker. For example, to have the lowest p_id be the tie-breaker, define the NumRank column like this:
ROW_NUMBER() OVER (
PARTITION BY dd.month
ORDER BY SUM(number_of_items) DESC, p_id
) NumRank
you can user MAX () KEEP (DENSE_RANK FIRST ORDER BY ) to select the movie_id with the max value of num
...
select
month,
MAX(movie_id) KEEP (DENSE_RANK FIRST order by num desc) as movie_id,
MAX(num)
from
abc
group by month
;