SQL Query GroupBy with date parameter - sql

Suppose I have a table, TeamRatings, that looks something like this
|---Team----|--ValuationDate--|-Rating-|
|--Saints---|---10/15/2012----|---81.1-|
|--Broncos--|---10/15/2012----|---91.1-|
|--Ravens---|---10/16/2012----|--101.1-|
|--Broncos--|---10/22/2012----|---82.1-|
|--Ravens---|---10/22/2012----|---83.1-|
|--Saints---|---10/29/2012----|---84.1-|
|--Broncos--|---10/28/2012----|---85.1-|
|--Ravens---|---10/29/2012----|---86.1-|
Also, it is assumed that a team's rating remains unchanged until they play a new game, (representing a new record). E.g. The Broncos' rating on date 10/21/2012 is assumed to be 102.8
I want a query with a date parameter, that will return one record per team represnting that team's most recent game prior to the date specified. For instance,
If I input 10/23/2012 as my date parameter, the query should return
|---Team---|-ValuationDate---|-Rating-|
|--Saints--|---10/15/2012----|---81.1-|
|--Broncos-|---10/22/2012----|---82.1-|
|--Ravens--|---10/22/2012----|---83.1-|
Any help is greatly appreciated. Thanks!

On MS SQL Server 2005 or greater you can use a cte with ROW_NUMBER function:
WITH x
AS (SELECT team,
valuationdate,
rating,
rn = Row_number()
OVER(
partition BY team
ORDER BY valuationdate DESC)
FROM teamratings
WHERE valuationdate < #DateParam)
SELECT team,
valuationdate,
rating
FROM x
WHERE rn = 1

You can use a more general query like this:
select Team, x.ValuationDate, Rating
from TeamRatings inner join
(
select Team, max(ValuationDate) as ValuationDate
from TeamRatings
where ValuationDate < #dateParameter
group by Team
) x on TeamRatings.Team = x.Team and TeamRatings.ValuationDate = x.ValuationDate

Related

Getting value from MAX(Date) Row in SQL Server

I am trying to get the last supplier of an item, by using the MAX function. What I am trying to achieve is showing what the supplier name was for the row with the MAX(Date) for all the stock items (shown below as account links).
The code I am using bring up multiple dates for the same accountlink, and I am struggling to see why. My code is:
SELECT
MAX(TxDate) AS Date,
ST.AccountLink,
V.Account AS Supplier
FROM _bvSTTransactionsFull AS ST
JOIN Vendor V on ST.DrCrAccount = V.DCLink
WHERE Module = 'AP'
AND Id = 'OGrv'
GROUP BY ST.AccountLink, V.Account
ORDER BY AccountLink
But my results look like the below
Try this out
select AccountLink,Supplier,date from(SELECT
ST.AccountLink,
V.Account AS Supplier,
TxDate as [date],
row_number()over(partition by ST.AccountLink order by TxDate desc)rownum
FROM _bvSTTransactionsFull AS ST
JOIN Vendor V on ST.DrCrAccount = V.DCLink
WHERE Module = 'AP'
AND Id = 'OGrv')t
where t.rownum = 1
The group by has been removed and ranking function is used to achieve the output
You need a simple subquery to select the last supplier.
select X.supplier as LastSupplier, X.Date as lastDate, X.AccountLink
from _bvSTTransactionsFull X
where X.Date = (select max(date)
from _bvSTTransactionsFull Y
where Y.AccountLink=X.AccountLink)
The subquery extracts the last date for any accountLink, so you can use it on the outer where condition.

Microsoft SQL Server : return only the rows with the most recent date for each unique ID

I've got some data that has a DeviceID column, a scan time column and some other columns.
For each of the deviceIDs, I want to return only the most recent row based on the scan time.
I am trying to create this query so that I can use it as a view and report on the data.
The database is a Microsoft SQL Server database and I'm running the query from SQL Server 2014 Management Studio.
The closest I've gotten to getting this to work is this :
SELECT
DeviceID,
AVSolutionName,
DefinitionsUpToDate,
ScanningEnabled,
Expired,
ScanTime
FROM
dbo.fact_AVSecurity
WHERE
(ScanTime IN (SELECT DISTINCT MAX(ScanTime) AS LastScan
FROM dbo.fact_AVSecurity AS Avs
GROUP BY DeviceID))
Unfortunately this is returning multiple values for the same ID.
ScanTime ScanningEnabled Expired DeviceID DefinitionsUpToDate AVSolutionName
10/12/2018 10:13 TRUE FALSE 15994 TRUE Webroot SecureAnywhere
4/12/2018 14:30 TRUE TRUE 15994 TRUE Webroot SecureAnywhere
What I'd like returned is just that first most recent row:
ScanTime ScanningEnabled Expired DeviceID DefinitionsUpToDate AVSolutionName
10/12/2018 10:13 TRUE FALSE 15994 TRUE Webroot SecureAnywhere
I've tried different approaches like :
SQL - Returning only the most recent row
But can't seem to get them working. I'm not sure if it's something I'm doing wrong or if the specific brand of SQL I'm using doesn't do the "top 1" thing.
Is there a way to do what I'm after? How close am I with what I have?
use a window function with a CTE?
With CTE AS (
SELECT t.DeviceID
, t.AVSolutionName
, t.DefinitionsUpToDate
, t.ScanningEnabled
, t.Expired
, t.ScanTime
, Row_Number() over (partition by DeviceID order by scanTime Desc) RN
FROM dbo.fact_AVSecurity t)
SELECT *
FROM CTE
WHERE RN=1
You are close to the solution. You just need a few changes in your correlated subquery :
add a WHERE condition in your subquery that limits the search to the current DeviceID
No need to use an IN clause to match the subquery, equality should be fine as only one record is expected anyway
No need to use DISTINCT as you are already using a GROUP BY
Query :
SELECT
t.DeviceID,
t.AVSolutionName,
t.DefinitionsUpToDate,
t.ScanningEnabled,
t.Expired,
t.ScanTime
FROM dbo.fact_AVSecurity AS t
WHERE t.ScanTime =
(SELECT MAX(ScanTime) AS LastScan
FROM dbo.fact_AVSecurity AS Avs
WHERE deviceID = t.deviceID
GROUP BY DeviceID
)
Check this:
SELECT t.DeviceID, t.AVSolutionName, t.DefinitionsUpToDate, t.ScanningEnabled, t.Expired, t.ScanTime
FROM dbo.fact_AVSecurity AS t
WHERE t.ScanTime = (SELECT MAX(Avs.ScanTime) FROM dbo.fact_AVSecurity AS Avs WHERE Avs.DeviceID = t.DeviceID)
for each DeviceID fetches the row that has ScanTime = MAX(ScanTime)
If you have an auto-increment column on your table (you generally should have one on every table), use that instead of timestamps, since SQL Server DateTime type only has a resolution of 1/300th of a second and should not be assumed to be a unique timestamp.
SELECT X.LastEntryID, DeviceID = Y.ID, ...
FROM
(
SELECT LastEntryID = MAX(ID)--latest entry for the device
FROM dbo.fact_AVSecurity
GROUP BY DeviceID--you don't even need to return DeviceID since ID is auto-increment and thus unique in the table
) AS X
INNER JOIN dbo.fact_AVSecurity AS Y ON
Y.ID = X.LastEntryID
This presumes you don't backdate your data or populate using IDENTITY_INSERT
Just one final option because I didn't see it mentioned
You can use the WITH TIES in concert with Row_Number()
That said, xQbert's solution (+1) would be more performant, especially with larger tables
Example
SELECT Top 1 with ties *
FROM dbo.fact_AVSecurity
Order By Row_Number() over (partition by DeviceID order by scanTime Desc)

SQL Max by date and term

I have some data that look like this:
table
I would like to get, by term, the most recent upload date
I've tried this but I know I have placed the term parameters in the wrong place because I only got the max date of the whole group instead of max date within both the 1 term and the 4 term.
SELECT Inst, Term, Year, FreezeDate, UploadDate, RecordCount, ErrorCount, FileName, System,
FROM table
WHERE UploadDate=(
SELECT MAX(UploadDate) FROM table WHERE System = ('a') and Year = ('2017') and Inst = ('123') and (Term = ('1') or Term = ('4')))
My ideal output would be this:
Could someone assist?
Create a subquery grouped by term and with max upload date, then join on your table
SELECT t.*
FROM table t
JOIN (SELECT Term, MAX(UploadDate) as MaxUploadDate FROM table GROUP BY Term) tmud
ON t.term = tmud.term AND t.UploadDate = tmud.MaxUploadDate

Oracle SQL query, getting a a maximum of a sum

Hey, guys. I'm struggling to solve one query, just cant get around it.
Basically, I got a some tables from data mart :
DimTheatre(TheatreId(PK), TheatreNo, Name, Address, MainTel);
DimTrow(TrowId(PK), TrowNo, RowName, RowType);
DimProduction(ProductionId(PK), ProductionNo, Title, ProductionDir, PlayAuthor);
DimTime(TimeId(PK), Year, Month, Day, Hour);
TicketPurchaseFact( TheatreId(FK), TimeId(FK), TrowId(FK),
PId(FK), TicketAmount);
The thing I'm trying to achieve in oracle is - I need to retrieve the most popular row type in each theatre by value of ticket sale
Thing I'm doing now is :
SELECT dthr.theatreid, dthr.name, max(tr.rowtype) keep(dense_rank last order
by tpf.ticketamount), sum(tpf.ticketamount) TotalSale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid
GROUP BY dthr.theatreid, dthr.name;
It does give me the output, but the 'TotalSale' column is totally out of place, it gives much way higher numbers than they should be.. How could I approach this issue :) ?
I am not sure how MAX() KEEP () would help your case if I understand the problem correctly. But the below approach should work:
SELECT x.theatreid, x.name, x.rowtype, x.total_sale
FROM
(SELECT z.theatreid, z.name, z.rowtype, z.total_sale, DENSE_RANK() OVER (PARTITION BY z.theatreid, z.name ORDER BY z.total_sale DESC) as popular_row_rank
FROM
(SELECT dthr.theatreid, dthr.name, tr.rowtype, SUM(tpf.ticketamount) as total_sale
FROM TicketPurchaseFact tpf, DimTheatre dthr, DimTrow tr
WHERE dthr.theatreid = tpf.theatreid AND tr.trowid = tpf.trowid
GROUP BY dthr.theatreid, dthr.name, tr.rowtype) z
) x
WHERE x.popular_row_rank = 1;
You want the row type per theatre with the highest ticket amount. So join purchases and rows and then aggregate to get the total per rowtype. Use RANK to rank your row types per theatre and stay with the best ranked ones. At last join with the theatre table to get the theatre name.
select
theatreid,
t.name,
tr.trowid
from
(
select
p.theatreid,
r.rowtype,
rank() over (partition by p.theatreid order by sum(p.ticketamount) desc) as rn
from ticketpurchasefact p
join dimtrow r using (trowid)
group by p.theatreid, r.rowtype
) tr
join dimtheatre t using (theatreid)
where tr.rn = 1;

SQL Server get customer with 7 consecutive transactions

I am trying to write a query that would get the customers with 7 consecutive transactions given a list of CustomerKeys.
I am currently doing a self join on Customer fact table that has 700 Million records in SQL Server 2008.
This is is what I came up with but its taking a long time to run. I have an clustered index as (CustomerKey, TranDateKey)
SELECT
ct1.CustomerKey,ct1.TranDateKey
FROM
CustomerTransactionFact ct1
INNER JOIN
#CRTCustomerList dl ON ct1.CustomerKey = dl.CustomerKey --temp table with customer list
INNER JOIN
dbo.CustomerTransactionFact ct2 ON ct1.CustomerKey = ct2.CustomerKey -- Same Customer
AND ct2.TranDateKey >= ct1.TranDateKey
AND ct2.TranDateKey <= CONVERT(VARCHAR(8), (dateadd(d, 6, ct1.TranDateTime), 112) -- Consecutive Transactions in the last 7 days
WHERE
ct1.LogID >= 82800000
AND ct2.LogID >= 82800000
AND ct1.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
AND ct2.TranDateKey between dl.BeginTranDateKey and dl.EndTranDateKey
GROUP BY
ct1.CustomerKey,ct1.TranDateKey
HAVING
COUNT(*) = 7
Please help make it more efficient. Is there a better way to write this query in 2008?
You can do this using window functions, which should be much faster. Assuming that TranDateKey is a number and you can subtract a sequential number from it, then the difference constant for consecutive days.
You can put this in a query like this:
SELECT CustomerKey, MIN(TranDateKey), MAX(TranDateKey)
FROM (SELECT ct.CustomerKey, ct.TranDateKey,
(ct.TranDateKey -
DENSE_RANK() OVER (PARTITION BY ct.CustomerKey, ct.TranDateKey)
) as grp
FROM CustomerTransactionFact ct INNER JOIN
#CRTCustomerList dl
ON ct.CustomerKey = dl.CustomerKey
) t
GROUP BY CustomerKey, grp
HAVING COUNT(*) = 7;
If your date key is something else, there is probably a way to modify the query to handle that, but you might have to join to the dimension table.
This would be a perfect task for a COUNT(*) OVER (RANGE ...), but SQL Server 2008 supports only a limited syntax for Windowed Aggregate Functions.
SELECT CustomerKey, MIN(TranDateKey), COUNT(*)
FROM
(
SELECT CustomerKey, TranDateKey,
dateadd(d,-ROW_NUMBER()
OVER (PARTITION BY CustomerKey
ORDER BY TranDateKey),TranDateTime) AS dummyDate
FROM CustomerTransactionFact
) AS dt
GROUP BY CustomerKey, dummyDate
HAVING COUNT(*) >= 7
The dateadd calculates the difference between the current TranDateTime and a Row_Number over all date per customer. The resulting dummyDatehas no actual meaning, but is the same meaningless date for consecutive dates.