Using both CROSS APPLY and INNER JOIN in the same query - sql

I'm attempting to use both Left Join and Cross Apply in the same query and running into difficulties.
SELECT vAH.TagName, vAH.EventSTamp, -123 Value, vAH.Description,
-- Ack.DateTime, Ack.UserFullName as AckUser, Ack.Description as AckComment,
LEFT(vAH.TagName,9) + CONVERT(nvarchar(30),LLC.StartDateTime,113) as ObjName
FROM WWALMDBArchived.dbo.v_AlarmHistory vAH
--CROSS APPLY (
-- SELECT TOP 1 EventStamp as DateTime, UserFullName, Description
-- FROM WWALMDBArchived.dbo.v_AlarmHistory vAH
-- WHERE TagName = vAH.TagName
-- AND EventStamp > vAH.EventStamp
-- AND AlarmState IN ('ACK_RTN','ACK_ALM')
-- ORDER BY DateTime, UserFullName, Description DESC
-- ) Ack
INNER JOIN CPMS.dbo.LotListConfig LLC
ON vAH.EventStamp >= LLC.StartDateTime
AND vAH.EventStamp <= LLC.EndDateTime
WHERE vAH.TagName LIKE #LineNumber + '%.Action_Alarm_ALM'
AND LLC.LineNumber = #LineNumber
AND LLC.LotNumber = #LotNumber
AND vAH.AlarmState = 'UNACK_ALM'
Essentially what I am doing is getting the boundary information from the LotListConfig table, getting the initial alarm information from v_AlarmHistory, and using the Cross Apply to get some subsequent alarm information from the v_AlarmHistory table.
The query above returns the records I would expect, but uncommenting the Cross Apply causes no records to return. There's an interaction of some kind happening between the Inner Join and the Cross Apply that I'm missing.
Anyone?

Nevermind.
My query above is using the same table shortcut (vAH) in both the main query and the CROSS APPLY query. Deleting the vAH inside the Cross Apply resolves the issue.

Related

Improve the speed of this string_agg?

I have data of the following shape:
BOM -- 500 rows, 4 cols
PartProject -- 2.6mm rows, 4 cols
Project -- 1000 rows, 5 cols
Part -- 200k rows, 18 cols
Yet when I try to do string_agg, my code will take me well over 10 minutes to execute on 500 rows. How can I improve this query (the data is not available).
select
BOM.*,
childParentPartProjectName
into #tt2 -- tt for some testing
from #tt1 AS BOM -- tt for some testing
-- cross applys for string agg many to one
CROSS APPLY (
SELECT childParentPartProjectName = STRING_AGG(PROJECT_childParentPart.NAME, ', ') WITHIN GROUP (ORDER BY PROJECT_childParentPart.NAME)
FROM (
SELECT DISTINCT PROJECT3.NAME
FROM [dbo].[Project] PROJECT3
LEFT JOIN [dbo].[Part] P3 on P3.ITEM_NUMBER = BOM.childParentPart
LEFT JOIN [dbo].[PartProject] PP3 on PP3.SOURCE_ID = P3.ID
WHERE PP3.RELATED_ID = PROJECT3.ID and P3.CURRENT = 1
) PROJECT_childParentPart ) PROJECT3
The subquery (within a subquery) you have has a code "smell" to it that it's been written with intention, but not correctly.
Firstly you have 2 LEFT JOINs in the subquery, however, both the tables aliased as P3 and PP3 are required to have a non-NULL value; that is impossible if no related row is found. This means the JOINs are implicit INNER JOINs.
Next you have a DISTINCT against a single column when SELECTing from multiple tables; this seems wrong. DISTINCT is very expensive and the fact you are using it implies that either NAME is not unique or that due to your implicit INNER JOINs you are getting duplicate rows. I assume it's the latter. As a results, very likely you should actually be using an EXISTS, not LEFT JOINs INNER JOINs.
The following is very much a guess, but I suspect it will be more performant.
SELECT BOM.*, --Replace this with an explicit list of the columns you need
SA.childParentPartProjectName
INTO #tt2
FROM #tt1 BOM
CROSS APPLY (SELECT STRING_AGG(Prj.NAME, ', ') WITHIN GROUP (ORDER BY Prj.NAME) AS childParentPartProjectName
FROM dbo.Project Prj --Don't use an alias that is longer than the object name
WHERE EXISTS (SELECT 1
FROM dbo.Part P
JOIN dbo.PartProject PP ON P.ID = PP.SOURCE_ID
WHERE PP.Related_ID = Prg.ID
AND P.ITEM_NUMBER = BOM.childParentPart
AND P.Current = 1)) SA;

SQL query: Iterate over values in table and use them in subquery

I have a simple SQL table containing some values, for example:
id | value (table 'values')
----------
0 | 4
1 | 7
2 | 9
I want to iterate over these values, and use them in a query like so:
SELECT value[0], x1
FROM (some subquery where value[0] is used)
UNION
SELECT value[1], x2
FROM (some subquery where value[1] is used)
...
etc
In order to get a result set like this:
4 | x1
7 | x2
9 | x3
It has to be in SQL as it will actually represent a database view. Of course the real query is a lot more complicated, but I tried to simplify the question while keeping the essence as much as possible.
I think I have to select from values and join the subquery, but as the value should be used in the subquery I'm lost on how to accomplish this.
Edit: I oversimplified my question; in reality I want to have 2 rows from the subquery and not only one.
Edit 2: As suggested I'm posting the real query. I simplified it a bit to make it clearer, but it's a working query and the problem is there. Note that I have hardcoded the value '2' in this query two times. I want to replace that with values from a different table, in the example table above I would want a result set of the combined results of this query with 4, 7 and 9 as values instead of the currently hardcoded 2.
SELECT x.fantasycoach_id, SUM(round_points)
FROM (
SELECT DISTINCT fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
LEFT JOIN fantasyworld_fantasyformation AS ff ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
AND _rr.sequence <= 2 /* HARDCODED USE OF VALUE */
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
AND fpc.round_sequence = 2 /* HARDCODED USE OF VALUE */
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Edit 3: I'm using PostgreSQL.
SQL works with tables as a whole, which basically involves set operations. There is no explicit iteration, and generally no need for any. In particular, the most straightforward implementation of what you described would be this:
SELECT value, (some subquery where value is used) AS x
FROM values
Do note, however, that a correlated subquery such as that is very hard on query performance. Depending on the details of what you're trying to do, it may well be possible to structure it around a simple join, an uncorrelated subquery, or a similar, better-performing alternative.
Update:
In view of the update to the question indicating that the subquery is expected to yield multiple rows for each value in table values, contrary to the example results, it seems a better approach would be to just rewrite the subquery as the main query. If it does not already do so (and maybe even if it does) then it would join table values as another base table.
Update 2:
Given the real query now presented, this is how the values from table values could be incorporated into it:
SELECT x.fantasycoach_id, SUM(round_points) FROM
(
SELECT DISTINCT
fc.id AS fantasycoach_id,
ffv.formation_id AS formation_id,
fpc.round_sequence AS round_sequence,
round_points,
fpc.fantasyplayer_id
FROM fantasyworld_FantasyCoach AS fc
-- one row for each combination of coach and value:
CROSS JOIN values
LEFT JOIN fantasyworld_fantasyformation AS ff
ON ff.id = (
SELECT MAX(fantasyworld_fantasyformationvalidity.formation_id)
FROM fantasyworld_fantasyformationvalidity
LEFT JOIN realworld_round AS _rr
ON _rr.id = round_id
LEFT JOIN fantasyworld_fantasyformation AS _ff
ON _ff.id = formation_id
WHERE is_valid = TRUE
AND _ff.coach_id = fc.id
-- use the value obtained from values:
AND _rr.sequence <= values.value
)
LEFT JOIN fantasyworld_FantasyFormationPlayer AS ffp
ON ffp.formation_id = ff.id
LEFT JOIN dbcache_fantasyplayercache AS fpc
ON ffp.player_id = fpc.fantasyplayer_id
-- use the value obtained from values again:
AND fpc.round_sequence = values.value
LEFT JOIN fantasyworld_fantasyformationvalidity AS ffv
ON ffv.formation_id = ff.id
) x
GROUP BY fantasycoach_id
Note in particular the CROSS JOIN which forms the cross product of two tables; this is the same thing as an INNER JOIN without any join predicate, and it can be written that way if desired.
The overall query could be at least a bit simplified, but I do not do so because it is a working example rather than an actual production query, so it is unclear what other changes would translate to the actual application.
In the example I create two tables. See how outer table have an alias you use in the inner select?
SQL Fiddle Demo
SELECT T.[value], (SELECT [property] FROM Table2 P WHERE P.[value] = T.[value])
FROM Table1 T
This is a better way for performance
SELECT T.[value], P.[property]
FROM Table1 T
INNER JOIN Table2 p
on P.[value] = T.[value];
Table 2 can be a QUERY instead of a real table
Third Option
Using a cte to calculate your values and then join back to the main table. This way you have the subquery logic separated from your final query.
WITH cte AS (
SELECT
T.[value],
T.[value] * T.[value] as property
FROM Table1 T
)
SELECT T.[value], C.[property]
FROM Table1 T
INNER JOIN cte C
on T.[value] = C.[value];
It might be helpful to extract the computation to a function that is called in the SELECT clause and is executed for each row of the result set
Here's the documentation for CREATE FUNCTION for SQL Server. It's probably similar to whatever database system you're using, and if not you can easily Google for it.
Here's an example of creating a function and using it in a query:
CREATE FUNCTION DoComputation(#parameter1 int)
RETURNS int
AS
BEGIN
-- Do some calculations here and return the function result.
-- This example returns the value of #parameter1 squared.
-- You can add additional parameters to the function definition if needed
DECLARE #Result int
SET #Result = #parameter1 * #parameter1
RETURN #Result
END
Here is an example of using the example function above in a query.
SELECT v.value, DoComputation(v.value) as ComputedValue
FROM [Values] v
ORDER BY value

Performance tuning of row-based subqueries: LEFT OUTER JOIN and OUTER APPLY, alternatives?

The performance of a certain query (on a Dynamics CRM 2011 database) was abysmal. Since it is a normalized datamodel, but a flattened view on this data (an SSRS report) is required, I did a lot (12) of LEFT OUTER JOINs with a SELECT TOP (1) subquery, e.g.:
LEFT JOIN Filterednew_rates FRates ON FRates.new_ratesid =
(SELECT TOP (1)
FRR.new_ratesid
FROM Filterednew_rates FRR
WHERE
FRR.new_contractid = FContract.contractid
AND FRR.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FRR.new_startdate DESC
)
This worked for a small number of result rows (like 10 seconds for 3 rows), but I've had it run for 45 minutes on about 100 expected result rows (the amount of source data is the same, just different WHERE clause). So I started looking for ways to "force" SQL Server to run the subqueries per row (since logically to me, that would scale linearly).
Then I read The power of T-SQL's APPLY operator and managed to change the above to
OUTER APPLY (
SELECT TOP (1)
FRR.*
FROM Filterednew_rates FRR
WHERE
FRR.new_contractid = FContract.contractid
AND FRR.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FRR.new_startdate DESC
) AS FRates
Which made the execution time scale about linearly with the number of result records (about 3:30 minutes for 100 rows, still about 6 seconds for 3 rows). Somehow this made SQL Server decide to change the query execution plan for the better!
Is there any other way in SQL to "flatten" a normalized datamodel without resorting to Integration/Analysis Services?
EDIT:
Thanks for the input #Aaron and #BAReese. I'll try to apply PIVOT/UNPIVOT and the Windowing Functions and report back on query performance differences.
And by popular request, a larger part of the query. I've tried to "anonymize" the query a bit, so the actual query properties are more descriptive.
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
WHERE
FCO.new_contractid = FContract.contractid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname = 'SomeOption1'
) AS FOptionSomeOption1
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
WHERE
FCO.new_contractid = FContract.contractid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname = 'SomeOption2'
) AS FOptionSomeOption2
OUTER APPLY (
SELECT TOP (1)
FCD.*
FROM FilteredContractDetail FCD
JOIN FilteredProduct FProd ON FCD.productid = FProd.productid
WHERE
FContract.contractid = FCD.contractid
AND FCD.new_included = 1 -- Is Included
AND FProd.productnumber IN ('COLDEL1', 'COLDEL2', 'COLDEL3', 'COLDEL4')
) AS FColDelContractDetail
LEFT JOIN FilteredProduct FColDelProduct ON FColDelContractDetail.productid = FColDelProduct.productid
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input1'
) AS FColDelInput1Option
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input2'
) AS FColDelInput2Option
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input3'
) AS FColDelInput3Option
OUTER APPLY (
SELECT TOP (1)
FCP.*
FROM Filterednew_price FCP
WHERE FCP.new_contractid = FContract.contractid
AND FCP.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FCP.new_validfrom DESC
) AS FPrice
OUTER APPLY (
SELECT TOP (1)
FCFR.*
FROM Filterednew_contractforecastresult FCFR
WHERE FCFR.new_contractid = FContract.contractid
ORDER BY FCFR.createdon DESC
) AS FForecastResult
Since you're using SQL Server, this would be an excellent opportunity to use windowing functions to improve efficiency.
something like this might help it run quicker:
LEFT JOIN
(
SELECT FRR.new_contractid, ROW_NUMBER() over(partition by FRR.new_contractid
order by FRR.new_startdate DESC) as Last_ID
FROM Filterednew_rates as FRR
WHERE FRR.statuscode <> 803270000 -- NOT Obsolete
) AS FRates
ON FRates.new_contractid = FContract.contractid
and FRates.Last_ID = 1
What this should do is allow the derived table to produce a list of all contractids but give a priority list. In theory, it will be easier on the server and you won't be hitting the table more times than necessary. Another thing you can do is add SET STATISTICS IO ON and SET STATISTICS TIME ON to the top of your query (assuming you're testing this in SQL Server Management Studio). If in SSMS, you'll get a log on the [Messages] tab telling what the logical/physical read count of each table is, as well as the amount of time spent querying.

Providing Language FallBack In A SQL Select Statement

I have a table that represents an Object. It has many columns but also fields that require language support.
For simplicity let's say I have 3 tables:
MainObjectTable
LanguageDependantField1
LanguageDependantField2.
MainObjectTable has a PK int called ID, and both LanguageDependantTables have a foreign key link back to the MainObjectTable along with a language code and the date they were added.
I've created a stored procedure that accepts the MainObjectTable ID and a Language. It will return a single row containing the most recent items from the language tables. The select statement looks like
SELECT
MainObjectTable.VariousColumns,
LanguageDependantField1.Description,
LanguageDependantField2.SomeOtherText
FROM
MainObjectTable
OUTER APPLY
(SELECT TOP 1 LanguageDependantField1.Description
FROM LanguageDependantField1
WHERE LanguageDependantField1.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField1.Language_ID = #language
ORDER BY
LanguageDependantField1.[Default], LanguageDependantField1.CreatedDate DESC) LanguageDependantField1
OUTER APPLY
(SELECT TOP 1 LanguageDependantField2.SomeOtherText
FROM LanguageDependantField2
WHERE LanguageDependantField2.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField2.Language_ID = #language
ORDER BY
LanguageDependantField2.[Default] DESC, LanguageDependantField2.CreatedDate DESC) LanguageDependantField2
WHERE
MainObjectTable.ID = #MainObjectTableID
What I want to add is the ability to fallback to a default language if a row isn't found in the specified language. Let's say we use "German" as the selected language. Is it possible to return an English row from LanguageDependantField1 if the German does not exist presuming we have #fallbackLanguageID
Also am I right to use OUTER APPLY in this scenario or should I be using JOIN?
Many thanks for your help.
Try this:
SELECT MainObjectTable.VariousColumns,
COALESCE(PrefLang.Description,Fallback.Description,'Not Found Desc')
as Description,
COALESCE(PrefLang.SomeOtherText,FallBack.SomeOtherText,'Not found')
as SomeOtherText
FROM MainObjectTable
LEFT JOIN
(SELECT TOP 1 pl.Description,pl.SomeOtherText
FROM LanguageDependantField1 pl
WHERE pl.MainObjectTable_ID = MainObjectTable.ID
AND pl.Language_ID = #language
ORDER BY
pl.[Default], pl.CreatedDate DESC)
PrefLang ON 1=1
LEFT JOIN
(SELECT TOP 1 fb.Description,fb.SomeOtherText
FROM LanguageDependantField1 fb
WHERE fb.MainObjectTable_ID = MainObjectTable.ID
AND fb.Language_ID = #fallbackLanguageID
ORDER BY
fb.[Default], fb.CreatedDate DESC)
Fallback ON 1=1
WHERE
MainObjectTable.ID = #MainObjectTableID
Basically, make two queries, one to the preferred language and one to English (Default). Use the LEFT JOIN, so if the first one isn't found, the second query is used...
I don't have your actual tables, so there might be a syntax error in above, but hope it gives you the concept you want to try...
Yes, the use of Outer Apply is correct if you want to correlate the MainObjectTable table rows to the inner queries. You cannot use Joins with references in the derived table to the outer table. If you wanted to use Joins, you would need to include the joining column(s) and in this case pre-filter the results. Here is what that might look like:
With RankedLanguages As
(
Select LDF1.MainObjectTable_ID, LDF1.Language_ID, LDF1.Description, LDF1.SomeOtherText, ...
, Row_Number() Over ( Partition By LDF1.MainObjectTable_ID, LDF1.Language_ID
Order By LDF1.[Default] Desc, LDF1.CreatedDate Desc ) As Rnk
From LanguageDependantField1 As LDF1
Where LDF1.Language_ID In( #languageId, #defaultLanguageId )
)
Select M.VariousColumns
, Coalesce( SpecificLDF.Description, DefaultLDF.Description ) As Description
, Coalesce( SpecificLDF.SomeOtherText, DefaultLDF.SomeOtherText ) As SomeOtherText
, ...
From MainObjectTable As M
Left Join RankedLanguages As SpecificLDF
On SpecificLDF.MainObjectTable_ID = M.ID
And SpecifcLDF.Language_ID = #languageId
And SpecifcLDF.Rnk = 1
Left Join RankedLanguages As DefaultLDF
On DefaultLDF.MainObjectTable_ID = M.ID
And DefaultLDF.Language_ID = #defaultLanguageId
And DefaultLDF.Rnk = 1
Where M.ID = #MainObjectTableID

Filtering SQL query by unique id and earliest dates that are in the future

I have this query that returns the correct data, but I would like to filter it.
SELECT TOP (100) PERCENT dbo.Reg_Master.id, dbo.Cart_Programs.cartid, dbo.Reg_Master.F_ID, dbo.BlockPeriod.profileid, dbo.Reg_Master.FirstName,
dbo.Reg_Master.LastName, dbo.BlockPeriod.startdate, dbo.Cart_Programs.blockid
FROM dbo.Cart_Programs LEFT OUTER JOIN
dbo.Reg_Master ON dbo.Cart_Programs.cartid = dbo.Reg_Master.cartid LEFT OUTER JOIN
dbo.BlockPeriod ON dbo.Cart_Programs.blockid = dbo.BlockPeriod.id
WHERE (dbo.BlockPeriod.profileid = xxx) AND (dbo.Reg_Master.F_ID = xxxx)
ORDER BY dbo.BlockPeriod.startdate
For each dbo.Reg_Master.id, I would like to return only the earliest dbo.BlockPeriod.startdate (that is today or later - in other words ignoring dates that have already passed) for each dbo.Reg_Master.id, I cannot seem to get it formatted correctly.
First of all, TOP 100 Percent does nothing, the optimizer will just ignore it.
Also, your left joins do not serve any purpose because your WHERE condition, so I have edited the SQL to use an inner join + cross apply vs an outer join + outer apply.
If I understand you correctly for each Reg_Master record, you want at most 1 record from BlockPeriod, where that 1 record is the closest date that is greater than today's date.
If so, then what you are looking for is an APPLY table operator combined with TOP (1) as shown below:
UPDATED:
SELECT Reg_Master.id,
Cart_Programs.cartid,
Reg_Master.F_ID,
T.profileid,
Reg_Master.FirstName,
Reg_Master.LastName,
T.startdate,
Cart_Programs.blockid
FROM Cart_Programs
JOIN Reg_Master ON Cart_Programs.cartid = Reg_Master.cartid
CROSS APPLY(
SELECT TOP 1 * FROM BlockPeriod
WHERE BlockPeriod.id = Cart_Programs.blockid
AND BlockPeriod.profileid = xxx AND Reg_Master.F_ID = xxxx
AND BlockPeriod.startdate >= GETDATE()
ORDER BY BlockPeriod.startdate ASC
) AS T