Cross apply a table valued function - sql

A real mind bender here guys!
I have a table which basically positions users in a league:
LeagueID Stake League_EntryID UserID TotalPoints TotalBonusPoints Prize
13028 2.00 58659 2812 15 5 NULL
13028 2.00 58662 3043 8 3 NULL
13029 5.00 58665 2812 8 3 NULL
The League_EntryID is the unique field here but you will see this query returns multiple leagues that user is entered for that day.
I also have a table value function which returns the current prize standings for the league and this accepts the LeagueID as a parameter and returns the people who qualify for prize money. This is a complex function which ideally I would like to keep as the function accepting the LeagueID. The result of this is as below:
UserID Position League_EntryID WinPerc Prize
2812 1 58659 36.000000 14.00
3043 6 58662 2.933333 4.40
3075 6 58664 2.933333 4.40
Essentially what I want to do is to join the table value function to the topmost query by passing in the LeagueID to essentially update the Prize Field for that League_EntryID i.e.
SELECT * FROM [League]
INNER JOIN [League_Entry] ON [League].[LeagueID] = [League_Entry].[LeagueID]
INNER JOIN [dbo].[GetPrizesForLeague]([League].[LeagueID]) ....
I'm not sure if a CROSS APPLY would work here but essentially I believe I need to JOIN on both the LeagueID and the League_EntryID to give me my value for the Prize. Not sure on the best way to do this without visiting a scalar function which will in turn call the table value function and obtain the Prize from that.
Speed is worrying me here.
P.S. Not all League_EntryID's will exist as a part of the table value function output so maybe an OUTER JOIN/APPLY can be used?
EDIT See the query below
SELECT DISTINCT [LeagueID],
[CourseName],
[Refunded],
[EntryID],
[Stake],
d.[League_EntryID],
d.[UserID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastRace],
[TotalBonusPointsLastRace],
d.[Prize],
[LeagueSizeID],
[TotalPool],
d.[Position],
[PositionLastRace],
t.Prize
FROM
(
SELECT [LeagueID],
[EntryID],
[Stake],
[MeetingID],
[Refunded],
[UserID],
[League_EntryID],
[TotalPoints],
[TotalBonusPoints],
[TotalPointsLastRace],
[TotalBonusPointsLastRace],
[Prize],
[LeagueSizeID],
[dbo].[GetTotalPool]([LeagueID], 1) AS [TotalPool],
RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC) AS [Position],
RANK() OVER( PARTITION BY [LeagueID] ORDER BY [TotalPointsLastRace] DESC, [TotalBonusPointsLastRace] DESC) AS [PositionLastRace],
ROW_NUMBER() OVER (PARTITION BY [LeagueID]
ORDER BY [TotalPoints] DESC, [TotalBonusPoints] DESC
) as [Position_Rownum]
FROM [DATA] ) AS d
INNER JOIN [Meeting] WITH (NOLOCK) ON [d].[MeetingID] = [Meeting].[MeetingID]
INNER JOIN [Course] ON [Meeting].[CourseID] = [Course].[CourseID]
OUTER APPLY (SELECT * FROM [dbo].[GetLeaguePrizes](d.[LeagueID])) t
WHERE (
([LeagueSizeID] = 3 AND [Position_Rownum] <= 50)
OR (d.[UserID] = #UserID AND [LeagueSizeID] = 3)
)
OR
(
[LeagueSizeID] in (1,2)
)
ORDER BY [LeagueID], [Position]
Any direction would be appreciated.

You need to use OUTER APPLY (a mix of CROSS APPLY and LEFT JOIN).
SELECT * FROM [League]
INNER JOIN [League_Entry] ON [League].[LeagueID] = [League_Entry].[LeagueID]
OUTER APPLY [dbo].[GetPrizesForLeague]([League].[LeagueID]) t
Performance is very good with CROSS APPLY/OUTER APPLY. It's great for replacing some inner queries and cursors.

Related

Problem optimizing sql query with cross apply sub query

So I have three tables:
MakerParts, that holds the primary information of a Vehicle Part:
Id
MakerId
PartNumber
Description
1
1
ABC1234
Tire
2
1
XYZ1234
Door
MakerPrices, that holds the price history variation for the parts (references MakerParts.Id on MakerPartNumberId, and the table MakerPriceUpdates on UpdateId):
Id
MakerPartNumberId
UpdateId
Price
1
1
1
9.83
2
1
2
11.23
MakerPriceUpdates, that holds the date of prices updates. This update is basically a CSV file that is uploaded to our system. One file, one line on this table, multiple prices changes on the table MakerPrices.
Id
Date
FileName
1
2019-01-09 00:00:00.000
temp.csv
2
2019-01-11 00:00:00.000
temp2.csv
This means that one part (MakerParts) may have multiple prices (MakerPrices). The date of the price change is on the table MakerPricesUpdates.
I want to select all MakerParts where the most recent price is zero, filtering by the MakerId on table MakerParts.
What I've tried:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
But that is absurdly slow (we have about 100 million lines on the MakerPrices table). I'm having a hard time optimizing this query. (the result is only two rows for the MakerId 1, and it took 2 mins to run). I also tried:
select * from (
select
mp.*,
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId
where MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as Price
from MakerParts mp) as temp
where temp.Price = 0 and MakerId = 1
Same result, and same time. My query plan (for the first query) (no new indexes suggested by Management Studio):
I think you can avoid joining MakerPriceUpdates with makerprices since with the highest
UpdateId you can find the latest price updates. It will save you some time.
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices where
MakerPrices.MakerPartNumberId = mp.Id order by MakerPrices.UpdateId desc) as p
where mp.MakerId = 1 and p.Price = 0
You can further reduced some times by avoiding sort and order by with cte and row_number() as below:
;with LatestMakerPrices as
(
select *,row_number()over(partition by MakerPartNumberId order by updateid desc)rn from MakerPrices
)
select mp.* from MakerParts mp cross apply
(select price from LatestMakerPrices lmp where lmp.MakerPartNumberId=mp.Id) as p
where mp.MakerId = 1 and p.Price = 0
Execution plan difference between query in question and my answer:
try:
WITH tab AS (
SELECT *, NULL as Price FROM MakerParts
WHERE not exists (
SELECT Id
FROM MakerPrices
WHERE MakerPrices.MakerPartNumberId = MakerParts.Id
)
)
SELECT * from tab WHERE MakerId = 2
UNION ALL
SELECT a.* , Price
FROM [dbo].[MakerParts] a
LEFT JOIN [dbo].[MakerPrices] b
ON b.MakerPartNumberId = a.Id
WHERE MakerId = 2 AND Price = 0
Try your query:
select mp.* from MakerParts mp cross apply
(select top 1 Price from MakerPrices inner join
MakerPricesUpdates on MakerPricesUpdates.Id = MakerPrices.UpdateId where
MakerPrices.MakerPartNumberId = mp.Id order by Date desc) as p
where mp.MakerId = 1 and p.Price = 0
After creating below index:
CREATE NONCLUSTERED INDEX [NCIdx_MakerPrices_MakerPartNumberId_UpdateId] ON [dbo].[MakerPrices]
(
[MakerPartNumberId] ASC,
[UpdateId] ASC
)
INCLUDE([Price])
And making ID column of MakerPricesUpdates table primary key.

Selecting from two queries select not null

i have query that the result is a single value there are many cases that bring me a null value in this case that's what i not need so i need make another query to bring me back a value, so i need to make a one query that bring me back when is null value in the first query omit the result and execute the second query.
the firts query is
SELECT DISTINCT
FIRST_VALUE (pac1.pac_name)
OVER (ORDER BY pac1.pac_final_date DESC)
FROM matricula mac
INNER JOIN
periodo pac1
ON mac.pac_id = pac1.pac_id
WHERE mac.ent_id = 26172 AND mac.mac_estado IN (8072, 10221)
the second query is
SELECT DISTINCT
FIRST_VALUE (pac1.pac_name)
OVER (ORDER BY pac1.pac_final_date DESC)
FROM registro rea
INNER JOIN
periodo pac1
ON rea.pac_id = pac1.pac_id
WHERE rea.ent_id = 26172
The two queries bring me back the same value, but first i need to consult for the first query, there are two cases.
case -1 --> when execute query#1 and bring me the result
Result
FIRST_VALUE (pac1.pac_name)
--------------------------
|Oct/2012 - Feb/2013 |
--------------------------
case -2 --> when execute query#1 the result is null, then execute a query #2 that will assure bring me a value
Result
FIRST_VALUE (pac1.pac_name)
--------------------------
|Oct/2012 - Feb/2013 |
--------------------------
This is probably an easy question, but any help is appreciated.
SELECT COALESCE(
(SELECT DISTINCT
FIRST_VALUE (pac1.pac_name)
OVER (ORDER BY pac1.pac_final_date DESC)
FROM matricula mac
INNER JOIN
periodo pac1
ON mac.pac_id = pac1.pac_id
WHERE mac.ent_id = 26172 AND mac.mac_estado IN (8072, 10221) )
,(SELECT DISTINCT
FIRST_VALUE (pac1.pac_name)
OVER (ORDER BY pac1.pac_final_date DESC)
FROM registro rea
INNER JOIN
periodo pac1
ON rea.pac_id = pac1.pac_id
WHERE rea.ent_id = 26172))
I am not positive without test data and a few more things but you could also use periodio as the main table and then LEFT OUTER JOIN to the other 2 tables. Use a case expression to determine which table/rows to use first in your order by. a similar method with EXISTS could also probably be considered:
SELECT DISTINCT FIRST_VALUE(pac1.pac_name) OVER (ORDER BY
CASE WHEN mac.pac_id IS NOT NULL THEN 0 ELSE 1 END, pac1.pac_final_date DESC)
FROM
periodo pac1
LEFT JOIN matricula mac
ON pac1.pac_id = mac.pac_id
AND mac.ent_id = 26172
AND mac.mac_estado IN (8072, 10221)
LEFT JOIN registro rea
ON pac1.pac_id = rea.pac_id
AND rea.ent_id = 26172

SQL Group By Clause and Empty Entries

I have a SQL Server 2005 query that I'm trying to assemble right now but I am having some difficulties.
I have a group by clause based on 5 columns: Project, Area, Name, User, Engineer.
Engineer is coming from another table and is a one to many relationship
WITH TempCTE
AS (
SELECT htce.HardwareProjectID AS ProjectId
,area.AreaId AS Area
,hs.NAME AS 'Status'
,COUNT(*) AS Amount
,MAX(htce.DateEdited) AS DateModified
,UserEditing AS LastModifiedName
,Engineer
,ROW_NUMBER() OVER (
PARTITION BY htce.HardwareProjectID
,area.AreaId
,hs.NAME
,htce.UserEditing ORDER BY htce.HardwareProjectID
,Engineer DESC
) AS row
FROM HardwareTestCase_Execution AS htce
INNER JOIN HardwareTestCase AS htc ON htce.HardwareTestCaseID = htc.HardwareTestCaseID
INNER JOIN HardwareTestGroup AS htg ON htc.HardwareTestGroupID = htg.HardwareTestGroupId
INNER JOIN Block AS b ON b.BlockId = htg.BlockId
INNER JOIN Area ON b.AreaId = Area.AreaId
INNER JOIN HardwareStatus AS hs ON htce.HardwareStatusID = hs.HardwareStatusId
INNER JOIN j_Project_Testcase AS jptc ON htce.HardwareProjectID = jptc.HardwareProjectId AND htce.HardwareTestCaseID = jptc.TestcaseId
WHERE (htce.DateEdited > #LastDateModified)
GROUP BY htce.HardwareProjectID
,area.AreaId
,hs.NAME
,htce.UserEditing
,jptc.Engineer
)
The gist of what I want is to be able to deal with empty Engineer columns. I don't want this column to have a blank second entry (where row=2).
What I want to do:
Group the items with "row" value of 1 & 2 together.
Select the Engineer that isn't empty.
Do not deselect engineers where there is not a matching row=2.
I've tried a series of joins to try and make things work. No luck so far.
Use j_Project_Testcase PIVOT( MAX(Engineer) for Row in ( [1], [2] ) then select ISNULL( [1],[2]) to select the Engineer value
I can give you a more robust example if you set up a SQL fiddle
Try reading this: PIVOT and UNPIVOT

Limit join to one row

I have the following query:
SELECT sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount, 'rma' as
"creditType", "Clients"."company" as "client", "Clients".id as "ClientId", "Rmas".*
FROM "Rmas" JOIN "EsnsRmas" on("EsnsRmas"."RmaId" = "Rmas"."id")
JOIN "Esns" on ("Esns".id = "EsnsRmas"."EsnId")
JOIN "EsnsSalesOrderItems" on("EsnsSalesOrderItems"."EsnId" = "Esns"."id" )
JOIN "SalesOrderItems" on("SalesOrderItems"."id" = "EsnsSalesOrderItems"."SalesOrderItemId")
JOIN "Clients" on("Clients"."id" = "Rmas"."ClientId" )
WHERE "Rmas"."credited"=false AND "Rmas"."verifyStatus" IS NOT null
GROUP BY "Clients".id, "Rmas".id;
The problem is that the table "EsnsSalesOrderItems" can have the same EsnId in different entries. I want to restrict the query to only pull the last entry in "EsnsSalesOrderItems" that has the same "EsnId".
By "last" entry I mean the following:
The one that appears last in the table "EsnsSalesOrderItems". So for example if "EsnsSalesOrderItems" has two entries with "EsnId" = 6 and "createdAt" = '2012-06-19' and '2012-07-19' respectively it should only give me the entry from '2012-07-19'.
SELECT (count(*) * sum(s."price")) AS amount
, 'rma' AS "creditType"
, c."company" AS "client"
, c.id AS "ClientId"
, r.*
FROM "Rmas" r
JOIN "EsnsRmas" er ON er."RmaId" = r."id"
JOIN "Esns" e ON e.id = er."EsnId"
JOIN (
SELECT DISTINCT ON ("EsnId") *
FROM "EsnsSalesOrderItems"
ORDER BY "EsnId", "createdAt" DESC
) es ON es."EsnId" = e."id"
JOIN "SalesOrderItems" s ON s."id" = es."SalesOrderItemId"
JOIN "Clients" c ON c."id" = r."ClientId"
WHERE r."credited" = FALSE
AND r."verifyStatus" IS NOT NULL
GROUP BY c.id, r.id;
Your query in the question has an illegal aggregate over another aggregate:
sum((select count(*) as itemCount) * "SalesOrderItems"."price") as amount
Simplified and converted to legal syntax:
(count(*) * sum(s."price")) AS amount
But do you really want to multiply with the count per group?
I retrieve the the single row per group in "EsnsSalesOrderItems" with DISTINCT ON. Detailed explanation:
Select first row in each GROUP BY group?
I also added table aliases and formatting to make the query easier to parse for human eyes. If you could avoid camel case you could get rid of all the double quotes clouding the view.
Something like:
join (
select "EsnId",
row_number() over (partition by "EsnId" order by "createdAt" desc) as rn
from "EsnsSalesOrderItems"
) t ON t."EsnId" = "Esns"."id" and rn = 1
this will select the latest "EsnId" from "EsnsSalesOrderItems" based on the column creation_date. As you didn't post the structure of your tables, I had to "invent" a column name. You can use any column that allows you to define an order on the rows that suits you.
But remember the concept of the "last row" is only valid if you specifiy an order or the rows. A table as such is not ordered, nor is the result of a query unless you specify an order by
Necromancing because the answers are outdated.
Take advantage of the LATERAL keyword introduced in PG 9.3
left | right | inner JOIN LATERAL
I'll explain with an example:
Assuming you have a table "Contacts".
Now contacts have organisational units.
They can have one OU at a point in time, but N OUs at N points in time.
Now, if you have to query contacts and OU in a time period (not a reporting date, but a date range), you could N-fold increase the record count if you just did a left join.
So, to display the OU, you need to just join the first OU for each contact (where what shall be first is an arbitrary criterion - when taking the last value, for example, that is just another way of saying the first value when sorted by descending date order).
In SQL-server, you would use cross-apply (or rather OUTER APPLY since we need a left join), which will invoke a table-valued function on each row it has to join.
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
-- CROSS APPLY -- = INNER JOIN
OUTER APPLY -- = LEFT JOIN
(
SELECT TOP 1
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(#in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(#in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
) AS FirstOE
In PostgreSQL, starting from version 9.3, you can do that, too - just use the LATERAL keyword to achieve the same:
SELECT * FROM T_Contacts
--LEFT JOIN T_MAP_Contacts_Ref_OrganisationalUnit ON MAP_CTCOU_CT_UID = T_Contacts.CT_UID AND MAP_CTCOU_SoftDeleteStatus = 1
--WHERE T_MAP_Contacts_Ref_OrganisationalUnit.MAP_CTCOU_UID IS NULL -- 989
LEFT JOIN LATERAL
(
SELECT
--MAP_CTCOU_UID
MAP_CTCOU_CT_UID
,MAP_CTCOU_COU_UID
,MAP_CTCOU_DateFrom
,MAP_CTCOU_DateTo
FROM T_MAP_Contacts_Ref_OrganisationalUnit
WHERE MAP_CTCOU_SoftDeleteStatus = 1
AND MAP_CTCOU_CT_UID = T_Contacts.CT_UID
/*
AND
(
(__in_DateFrom <= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateTo)
AND
(__in_DateTo >= T_MAP_Contacts_Ref_OrganisationalUnit.MAP_KTKOE_DateFrom)
)
*/
ORDER BY MAP_CTCOU_DateFrom
LIMIT 1
) AS FirstOE
Try using a subquery in your ON clause. An abstract example:
SELECT
*
FROM table1
JOIN table2 ON table2.id = (
SELECT id FROM table2 WHERE table2.table1_id = table1.id LIMIT 1
)
WHERE
...

MSSQL Paging is returning random rows when not supposed too

I'm trying to do some basic paging in MSSQL. The problem I'm having is that I'm sorting the paging on a row that (potentially) has similar values, and the ORDER BY clause is returning "random" results, which doesn't work well.
So for example.
If I have three rows, and I'm sorting them by a "rating", and all of the ratings are = '5' - the rows will seemingly "randomly" order themselves. How do I make it so the rows are showing up in the same order everytime?
I tried ordering it by a datetime that the field was last edited, but the "rating" is sorted in reverse, and again, does not work how i expect it to work.
Here is the SQL I'm using thus far. I know it's sort of confusing without the data so.. any help would be greatful.
SELECT * FROM
(
SELECT
CAST(grg.defaultthumbid AS VARCHAR) + '_' +
CAST(grg.garageid AS VARCHAR) AS imagename,
(
SELECT COUNT(imageid)
FROM dbo.images im (nolock)
WHERE im.garageid = grg.garageid
) AS piccount,
(
SELECT COUNT(commentid)
FROM dbo.comments cmt (nolock)
WHERE cmt.garageid = grg.garageid
) AS commentcount,
grg.GarageID, mk.make, mdl.model, grg.year,
typ.type, usr.username, grg.content,
grg.rating, grg.DateEdit as DateEdit,
ROW_NUMBER() OVER (ORDER BY Rating DESC) As RowIndex
FROM
dbo.garage grg (nolock)
LEFT JOIN dbo.users (nolock) AS usr ON (grg.userid = usr.userid)
LEFT JOIN dbo.make (nolock) AS mk ON (grg.makeid = mk.makeid)
LEFT JOIN dbo.type (nolock) AS typ ON (typ.typeid = mk.typeid)
LEFT JOIN dbo.model (nolock) AS mdl ON (grg.modelid = mdl.modelid)
WHERE
typ.type = 'Automobile' AND
grg.defaultthumbid != 0 AND
usr.username IS NOT NULL
) As QueryResults
WHERE
RowIndex BETWEEN (2 - 1) * 25 + 2 AND 2 * 25
ORDER BY
DateEdit DESC
Try ordering by both, e.g.:
ORDER BY Rating DESC, DateEdit ASC
The query first numbers the rows by [Rating], and then re-sorts the results by [DateEdit]. Possibly not what you intended. Ordering by [RowIndex] ASC should sort it out.
ROW_NUMBER() OVER (ORDER BY [Rating] DESC) As [RowIndex]
...
ORDER BY [RowIndex]