I have a query in SQL Server that I'm trying to migrate to redshift. It has OUTER APPLY in it but Redshift doesn't support it. How can I convert it to left join so that I can use it in Redshift?
....
OUTER APPLY
(
SELECT TOP 1 fel.*
FROM fact.FactEventLog fel
WHERE fpt.ParcelProfileKey = fel.ParcelProfileKey
AND
fpt.LastEventKey = fel.EventLegKey
AND
FPT.DateLastEvent = fel.EventDateTimeUTC
) fel
....
Something answered in this stackoverflow or this answer
Maybe.... someting like...
....
LEFT JOIN
(
SELECT fel.*, row_number() over (partition by ParcelProfileKey, EventLegKey, EventDateTimeUTC order by null) RN
FROM fact.FactEventLog fel
) fel
on fpt.ParcelProfileKey = fel.ParcelProfileKey
AND fpt.LastEventKey = fel.EventLegKey
AND fpt.DateLastEvent = fel.EventDateTimeUTC
AND 1=fel.RN
....
but it just seems so wrong w/o an order by actually defined in the window function. it's like you don't care what random result is returned just so long as 1 exists... but then why not use an exists.... shrug
Related
I am dont know to construct a query with Cross Apply and SqlKata. I was searching the net and I find out that cross apply was not supported by SqlKata. Is there any other way to achieve my requirements.
var query = new Query("Test as t")
cross apply
(select top 1 t2.TestID from Test as t2 where t1.LegID = t2.LegID order by t2.Sequence desc)
This is the sql query
select * from Test1 t1
cross apply
(select top 1 s.OperationalStopID
from Test2 t2
where t2.LegID = t1.LegID
order by t2.SequenceNumber desc) t3
Just use the Join method and pass "CROSS APPLY" or whatever method you like in the last parameter
check this example on SqlKata Playground
var another = new Query("Test2 as t2")
.WhereColumns("t2.LegID", "=", "t1.LegID")
.Select("s.OperationalStopID")
.OrderByDesc("t2.SequenceNumber")
.Limit(1);
var query = new Query("Test1 as t1").Join(another.As("t3"), j => j, "CROSS APPLY"); // <---
I am not familiar with sqlkata, but you can rewrite your SQL query to an inner join, like:
select DISTINCT
t1.I,
t1.LegID,
x.OperationalStopID
from test1 t1
inner join (select LegID, MAX(OperationalStopID) as OperationalStopID
from test2
group by LegID) x on x.LegID = t1.LegID
I hope you are able to convert this SQL query to sqlkata syntax ?
This query is tested here: DBFIDDLE
P.S.: oops, the DISTINCT should not have been in there anymore, please remove it.
I have 3 three scenarios
where there is no email - returns me empty
where there are multiple emails with one primary email - should return me primary
where there is multiple emails but no primary - should return me the first one
Here is my query I am trying
select *
from departs
left outer join answers on answers.fkdepartid =departs.departID
inner join emails on emails.userid = departs.userID
and emails.primary = 1
where departs.departid = 100
where I can add the logic of the above
You can do this with the row_number() windowing function:
SELECT *
FROM (
SELECT *,
row_number() over (
partition by d.userID
order by case when e.primary = 1 then 0 else 1 end
) rn
FROM departs d
LEFT JOIN answers a on a.fkdepartid = d.departID
INNER JOIN emails e ON e.userid = d.userID
WHERE d.departid = 100
) t
WHERE t.rn = 1
An APPLY lateral join will also work:
SELECT *
FROM departs d
LEFT JOIN answers a on a.fkdepartid = d.departID
CROSS APPLY (
SELECT TOP 1 *
FROM emails e0
WHERE e0.userid = d.userID
ORDER BY case when e0.primary = 1 then 0 else 1 end
) e
WHERE d.departid = 100
APPLY tends to be slower, but I often find it easier to reason about, which can sometimes matter more than the raw performance.
Note both of these options are NON-DETERMINISTIC, which means they could return different results from one moment to the next. This is because there is not a complete definition of "first" for which email to use if none is primary. In database circles, non-deterministic queries and requirements are frowned upon and something to avoid.
To fix this (because it really is broken until you do) to be fully deterministic, add criteria to the ORDER BY clause to further define which email address you want.
I come from SQL Server and I am migrating some T-SQL code to Postgres.
In PostgreSQL, I now have this UPDATE statement (see below).
In there:
"#reportdata" is a temporary table
kwt.Report is a normal table
This part in the WHERE clause is doing an implicit JOIN. I think that's how they call it in Postgres.
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
That is because this couple (campaignid, reportdate) represents a unique logical key in kwt.Report. Also, both columns are not nullable in kwt.Report.
In "#reportdata" both columns can be NULL.
My question is: when I see such an implicit join in an UPDATE statement, I am somehow always not quite sure if it's INNER or OUTER join. I think it's INNER, there's no way this to be OUTER but I just want to be sure.
Could someone please confirm?
I mean, OK, if rp.campaignid is NULL there's no way this condition to evaluate to true, right?
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
I am asking this, because I am not sure if comparison with NULL works the same way in Postgres as in SQL Server. As far as I recall, in SQL Server NULL = a always evaluates to NULL (not to true (bit 0), not to false (bit 1) but to NULL). Please correct me if this understanding is not right. Is this the same in Postgres?
UPDATE kwt.Report cr
SET
impressions = rp.impressions,
clicks = rp.clicks,
views = rp.views
FROM
"#reportdata" AS rp
WHERE
(cr.campaignid = rp.campaignid AND cr.reportdate = rp.reportdate)
AND (rp.campaignid IS NOT NULL);
In SQL:
A = null is neither true nor false
Check this
with cte0 as
(
select '1' as c
), cte1 as
(
select null as c
)
select * from cte0
inner join cte1 on cte0.c = cte1.c
union
select * from cte0
inner join cte1 on cte0.c != cte1.c
c | c
:- | :-
db<>fiddle here
The performance of a certain query (on a Dynamics CRM 2011 database) was abysmal. Since it is a normalized datamodel, but a flattened view on this data (an SSRS report) is required, I did a lot (12) of LEFT OUTER JOINs with a SELECT TOP (1) subquery, e.g.:
LEFT JOIN Filterednew_rates FRates ON FRates.new_ratesid =
(SELECT TOP (1)
FRR.new_ratesid
FROM Filterednew_rates FRR
WHERE
FRR.new_contractid = FContract.contractid
AND FRR.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FRR.new_startdate DESC
)
This worked for a small number of result rows (like 10 seconds for 3 rows), but I've had it run for 45 minutes on about 100 expected result rows (the amount of source data is the same, just different WHERE clause). So I started looking for ways to "force" SQL Server to run the subqueries per row (since logically to me, that would scale linearly).
Then I read The power of T-SQL's APPLY operator and managed to change the above to
OUTER APPLY (
SELECT TOP (1)
FRR.*
FROM Filterednew_rates FRR
WHERE
FRR.new_contractid = FContract.contractid
AND FRR.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FRR.new_startdate DESC
) AS FRates
Which made the execution time scale about linearly with the number of result records (about 3:30 minutes for 100 rows, still about 6 seconds for 3 rows). Somehow this made SQL Server decide to change the query execution plan for the better!
Is there any other way in SQL to "flatten" a normalized datamodel without resorting to Integration/Analysis Services?
EDIT:
Thanks for the input #Aaron and #BAReese. I'll try to apply PIVOT/UNPIVOT and the Windowing Functions and report back on query performance differences.
And by popular request, a larger part of the query. I've tried to "anonymize" the query a bit, so the actual query properties are more descriptive.
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
WHERE
FCO.new_contractid = FContract.contractid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname = 'SomeOption1'
) AS FOptionSomeOption1
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
WHERE
FCO.new_contractid = FContract.contractid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname = 'SomeOption2'
) AS FOptionSomeOption2
OUTER APPLY (
SELECT TOP (1)
FCD.*
FROM FilteredContractDetail FCD
JOIN FilteredProduct FProd ON FCD.productid = FProd.productid
WHERE
FContract.contractid = FCD.contractid
AND FCD.new_included = 1 -- Is Included
AND FProd.productnumber IN ('COLDEL1', 'COLDEL2', 'COLDEL3', 'COLDEL4')
) AS FColDelContractDetail
LEFT JOIN FilteredProduct FColDelProduct ON FColDelContractDetail.productid = FColDelProduct.productid
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input1'
) AS FColDelInput1Option
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input2'
) AS FColDelInput2Option
OUTER APPLY (
SELECT TOP (1)
FCO.*
FROM Filterednew_contractoption FCO
JOIN Filterednew_contractdetail_new_contractoptions FCD_CO ON FCO.new_contractoptionid = FCD_CO.new_contractoptionid
WHERE
FCD_CO.contractdetailid = FColDelContractDetail.contractdetailid
AND FCO.new_included = 1 -- Is Included
AND FCO.new_optionidname LIKE 'Input3'
) AS FColDelInput3Option
OUTER APPLY (
SELECT TOP (1)
FCP.*
FROM Filterednew_price FCP
WHERE FCP.new_contractid = FContract.contractid
AND FCP.statuscode <> 803270000 -- NOT Obsolete
ORDER BY FCP.new_validfrom DESC
) AS FPrice
OUTER APPLY (
SELECT TOP (1)
FCFR.*
FROM Filterednew_contractforecastresult FCFR
WHERE FCFR.new_contractid = FContract.contractid
ORDER BY FCFR.createdon DESC
) AS FForecastResult
Since you're using SQL Server, this would be an excellent opportunity to use windowing functions to improve efficiency.
something like this might help it run quicker:
LEFT JOIN
(
SELECT FRR.new_contractid, ROW_NUMBER() over(partition by FRR.new_contractid
order by FRR.new_startdate DESC) as Last_ID
FROM Filterednew_rates as FRR
WHERE FRR.statuscode <> 803270000 -- NOT Obsolete
) AS FRates
ON FRates.new_contractid = FContract.contractid
and FRates.Last_ID = 1
What this should do is allow the derived table to produce a list of all contractids but give a priority list. In theory, it will be easier on the server and you won't be hitting the table more times than necessary. Another thing you can do is add SET STATISTICS IO ON and SET STATISTICS TIME ON to the top of your query (assuming you're testing this in SQL Server Management Studio). If in SSMS, you'll get a log on the [Messages] tab telling what the logical/physical read count of each table is, as well as the amount of time spent querying.
I have a table that represents an Object. It has many columns but also fields that require language support.
For simplicity let's say I have 3 tables:
MainObjectTable
LanguageDependantField1
LanguageDependantField2.
MainObjectTable has a PK int called ID, and both LanguageDependantTables have a foreign key link back to the MainObjectTable along with a language code and the date they were added.
I've created a stored procedure that accepts the MainObjectTable ID and a Language. It will return a single row containing the most recent items from the language tables. The select statement looks like
SELECT
MainObjectTable.VariousColumns,
LanguageDependantField1.Description,
LanguageDependantField2.SomeOtherText
FROM
MainObjectTable
OUTER APPLY
(SELECT TOP 1 LanguageDependantField1.Description
FROM LanguageDependantField1
WHERE LanguageDependantField1.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField1.Language_ID = #language
ORDER BY
LanguageDependantField1.[Default], LanguageDependantField1.CreatedDate DESC) LanguageDependantField1
OUTER APPLY
(SELECT TOP 1 LanguageDependantField2.SomeOtherText
FROM LanguageDependantField2
WHERE LanguageDependantField2.MainObjectTable_ID = MainObjectTable.ID
AND LanguageDependantField2.Language_ID = #language
ORDER BY
LanguageDependantField2.[Default] DESC, LanguageDependantField2.CreatedDate DESC) LanguageDependantField2
WHERE
MainObjectTable.ID = #MainObjectTableID
What I want to add is the ability to fallback to a default language if a row isn't found in the specified language. Let's say we use "German" as the selected language. Is it possible to return an English row from LanguageDependantField1 if the German does not exist presuming we have #fallbackLanguageID
Also am I right to use OUTER APPLY in this scenario or should I be using JOIN?
Many thanks for your help.
Try this:
SELECT MainObjectTable.VariousColumns,
COALESCE(PrefLang.Description,Fallback.Description,'Not Found Desc')
as Description,
COALESCE(PrefLang.SomeOtherText,FallBack.SomeOtherText,'Not found')
as SomeOtherText
FROM MainObjectTable
LEFT JOIN
(SELECT TOP 1 pl.Description,pl.SomeOtherText
FROM LanguageDependantField1 pl
WHERE pl.MainObjectTable_ID = MainObjectTable.ID
AND pl.Language_ID = #language
ORDER BY
pl.[Default], pl.CreatedDate DESC)
PrefLang ON 1=1
LEFT JOIN
(SELECT TOP 1 fb.Description,fb.SomeOtherText
FROM LanguageDependantField1 fb
WHERE fb.MainObjectTable_ID = MainObjectTable.ID
AND fb.Language_ID = #fallbackLanguageID
ORDER BY
fb.[Default], fb.CreatedDate DESC)
Fallback ON 1=1
WHERE
MainObjectTable.ID = #MainObjectTableID
Basically, make two queries, one to the preferred language and one to English (Default). Use the LEFT JOIN, so if the first one isn't found, the second query is used...
I don't have your actual tables, so there might be a syntax error in above, but hope it gives you the concept you want to try...
Yes, the use of Outer Apply is correct if you want to correlate the MainObjectTable table rows to the inner queries. You cannot use Joins with references in the derived table to the outer table. If you wanted to use Joins, you would need to include the joining column(s) and in this case pre-filter the results. Here is what that might look like:
With RankedLanguages As
(
Select LDF1.MainObjectTable_ID, LDF1.Language_ID, LDF1.Description, LDF1.SomeOtherText, ...
, Row_Number() Over ( Partition By LDF1.MainObjectTable_ID, LDF1.Language_ID
Order By LDF1.[Default] Desc, LDF1.CreatedDate Desc ) As Rnk
From LanguageDependantField1 As LDF1
Where LDF1.Language_ID In( #languageId, #defaultLanguageId )
)
Select M.VariousColumns
, Coalesce( SpecificLDF.Description, DefaultLDF.Description ) As Description
, Coalesce( SpecificLDF.SomeOtherText, DefaultLDF.SomeOtherText ) As SomeOtherText
, ...
From MainObjectTable As M
Left Join RankedLanguages As SpecificLDF
On SpecificLDF.MainObjectTable_ID = M.ID
And SpecifcLDF.Language_ID = #languageId
And SpecifcLDF.Rnk = 1
Left Join RankedLanguages As DefaultLDF
On DefaultLDF.MainObjectTable_ID = M.ID
And DefaultLDF.Language_ID = #defaultLanguageId
And DefaultLDF.Rnk = 1
Where M.ID = #MainObjectTableID