Outer Apply in SQLAdapter - sql

Do I have to create a stored procedure to call to get this working? I am no friend of adapters in Visual Studio (The GUI-ones that gets destroyed instantly if you edit them :) ).
However I have this query that I got working (SQL Management Studio 2008 R2) with using Outer Apply (Similar to Left join). My VS-adapter does not accept this throwing "The OUTER APPLY SQL construct or statement is not supported". I therefore need help writing the code below in "Normal" t-sql :)
SELECT DISTINCT t1.col1,t2.col2,t3.col2,t3.col4
FROM t1
OUTER APPLY
(
SELECT TOP 1 col1,col2,col3,col4
FROM t2
WHERE col3 = value
AND t2.col1 = t1.col1
ORDER BY col4 ASC
) AS t3

OUTER APPLY is equivalent to a LEFT JOIN, will the following code fit your needs?
SELECT DISTINCT t1.col1,t3.col2,t3.col4
FROM t1
LEFT JOIN
(
SELECT TOP 1 col1,col2,col4
FROM t2
WHERE col3 = value
ORDER BY col3 ASC
) t3
ON t3.col1 = t1.col1

Related

SQL Query doesn't work in Azure SQL Data Warehouse (Synapse) why not?

I have an SQL query that works in an on-premise SQL Database, but when I try to execute it on an Azure SQL Data Warehouse, I get an error.
Does anyone know another way of writing this SQL Query so that it will work in Azure DW?
SQL Query is:
SELECT MAX(Dates) MostRecentDate FROM (VALUES('2020-01-01'), ('2021-01-01')) AS t(Dates)
Azure DW Error is:
Msg 103010, Level 16, State 1, Line 1
Parse error at line: 1, column: 39: Incorrect syntax near '('.
Here is the full SQL Query (real table names have been removed)
SELECT t1.reference, dd.closest_date, d1.date_one, d2.date_two, d3.date_three
FROM dbo.table1 AS t1
LEFT JOIN dbo.table2 AS t2 ON t1.table2_id = t2.id
LEFT JOIN dbo.table3 AS t3 ON t1.table3_id = t3.id
LEFT JOIN dbo.table4 AS t4 ON t3.table4_id = t4.id
LEFT JOIN dbo.table5 AS t5 ON t4.table5_id = t5.id
LEFT JOIN dbo.table6 AS t6 ON t5.table6_id = t6.id
OUTER APPLY (SELECT CASE WHEN t3.outcome IS NULL THEN '5000-01-01' ELSE ISNULL(t3.outcome_date,'5000-01-01') END AS date_one) AS d1
OUTER APPLY (SELECT ISNULL(t2.outcome_date,'5000-01-01') AS date_two) AS d2
OUTER APPLY (SELECT ISNULL(t6.outcome_date,'5000-01-01') AS date_three) AS d3
/*the below works in normal SQL, but doesn't work in Azure SQL!!!*/
-- OUTER APPLY (SELECT MIN(Dates) closest_date FROM (VALUES(d1.date_one),(d2.date_two),(d3.date_three)) AS t(Dates)) AS dd
The Table Value Constructor is nice but is not fully supported in Azure Synapse Analytics as per the documentation, ie Synapse is absent in the Applies to list at the top of the linked doc. VALUES is supported for single rows as per this example, but the easiest fix for your example is to simply rewrite as a simple UNION ALL statement, eg
SELECT t1.reference, dd.closest_date, d1.date_one, d2.date_two, d3.date_three
FROM dbo.table1 AS t1
LEFT JOIN dbo.table2 AS t2 ON t1.table2_id = t2.id
LEFT JOIN dbo.table3 AS t3 ON t1.table3_id = t3.id
LEFT JOIN dbo.table4 AS t4 ON t3.table4_id = t4.id
LEFT JOIN dbo.table5 AS t5 ON t4.table5_id = t5.id
LEFT JOIN dbo.table6 AS t6 ON t5.table6_id = t6.id
OUTER APPLY (SELECT CASE WHEN t3.outcome IS NULL THEN '5000-01-01' ELSE ISNULL(t3.outcome_date,'5000-01-01') END AS date_one) AS d1
OUTER APPLY (SELECT ISNULL(t2.outcome_date,'5000-01-01') AS date_two) AS d2
OUTER APPLY (SELECT ISNULL(t6.outcome_date,'5000-01-01') AS date_three) AS d3
/*the below works in normal SQL, but doesn't work in Azure Synapse*/
--OUTER APPLY (SELECT MIN(Dates) closest_date FROM (VALUES(d1.date_one),(d2.date_two),(d3.date_three)) AS t(Dates)) AS dd
OUTER APPLY (SELECT MIN(Dates) closest_date FROM ( SELECT d1.date_one UNION ALL SELECT d2.date_two UNION ALL SELECT d3.date_three) AS t(Dates)) AS dd
It's just syntactic sugar really. They are all set operations at the end of the day.
Could you please try this
select max(t.dates)from
(
select '2020-01-01'as dates
union all
select '2021-01-01'
)as t

SQL Server Query having multiple left outer joins hangs

I have two tables say for ex: table1 and table2 as below
Table1(id, desc )
Table2(id, col1, col2.. col10.....)
col1 to col10 in table 2 could be linked with id field in table1.
I write a query which has 10 instances of table1 (each one to link col1 to col10 of table2)
select t2.id, t1_1.desc, t1_2.desc,.....t1_10.desc from table2 t2
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
where t2.id ='111'
This query is inside the Sp and when i try to execute the Sp in SSMS, it works without any problems.
However When my web application runs, the query works for few where clause value and hangs for few.
I have checked the cost of the query, and created one nonclusteredindex with this 10 columns in table2. The cost found to be reduced to 0 on joins. However, I am still seeing the query hangs
The table 1 has 500 rows and table 2 has 700 rows in it.
Can any one help.
First of all, why are you rejoining to the table 10 times rather than one join with 10 predicates?
left outer join table1 t1_1 on t1_1.id = t2.col1
left outer join table1 t1_2 on t1_2.id = t2.col2
left outer join table1 t1_3 on t1_3.id = t2.col3
.
.
.
left outer join table1 t1_10 on t1_10.id = t2.col10
vs.
left outer join table1 t1 on t1.col1 = t2.col1
and t1.col2 = t2.col2
and t1.col3 = t2.col3
just wanted to bring that up because its very unusual to rejoin to the same table like that 10 times.
As far as your query plan goes, sql server sniffs the first parameter used in the query and caches that query plan for use in future queries. This query plan can be a good plan for certain where clause values and a bad plan for other where clause values which is why sometimes it is performing well and other times it is not. If you have skews in your table columns (some where clause values have a high number of recurring values) then you could consider using OPTION(RECOMPILE) in your query to force it to develop a new execution plan each time it is called. This has pros and cons, see this answer for a discussion OPTION (RECOMPILE) is Always Faster; Why?

Joining selected column to a table

I am try running this query and it takes long time because of the join i am using
SELECT T1.Id,T2.T2Id,T2.Col2
FROM Table1 T1
LEFT OUTER JOIN (SELECT TOP 1 Id, TT.T2Id,TT.Col2
FROM Table2 TT
WHERE TT.TypeId=3
ORDER BY TT.OrderId
)AS T2 ON T2 .Id=T1.Id
Thing is it doesn't let me do something like TT.Id=T1.Id with in the join query.
Is there any other way I can do this?
Try it with outer apply:
SELECT T1.Id, T2.T2Id, T2.Col2
FROM Table1 T1
OUTER APPLY (SELECT TOP 1 T2Id, T2.Col2
FROM Table2 TT
WHERE TT.TypeId = 3 AND TT.Id = T1.Id) T2
SELECT T1.Id, T2.T2Id, T2.Col2
FROM Table1 T1
OUTER APPLY (SELECT TOP 1 T2Id, T2.Col2
FROM Table2 TT
WHERE TT.TypeId = 3 AND T1.Id = TT.Id
Order by T2id desc) T2
I would use Outer Apply and T1.Id = TT.Id in the where condition since T1 is the parent table plus adding on order by - if needed for ordered result set
Well first of all your derived table will produce non deterministic results, as the top 1 row you return may differ each time you run it, even if the data in the table remains the same. You could put an order by clause in the the derived table to prevent that.
Is there an index on Table1.id? What exactly are you trying to achieve though, is it to return all rows from Table1, with just one row of many from Table2 that has the same ID?
If so I would look into using Cross Apply instead. Or maybe in this case Outer Apply. If I get a chance later I'll write up an example if needed, but in the mean time just Google Outer Apply for SQL Server.
Dan

Is there documentation on/can someone explain a nested join in TSQL?

I'm not quite sure how to describe this, and I'm not quite sure if it's just syntactical sugar. This is the first time I've seen it, and I'm having trouble finding a reference or explanation as to the why and what of it.
I have a query as follows:
select * from
table1
join table2 on field1 = field2
join (
table3
join table4 on field3 = field4
join table5 on field5 = field6
) on field3 = field2
-- notice the fields in the parens and outside the parens
-- are part of the on clause
Are the parentheses necessary? Will removing them change the join order? I'm in a SQL Server 2005 environment in this case. Thanks!
Join order should make no difference in the result set of a query using natural joins (outside of column order). The query
select *
from t1
join t2 on t2.t1_id = t1.id
produces the same result set as
select *
from t2
join t1 on t1.id = t2.t1_id
If you're using outer joins and change the order of the tables in the from clause, naturally the direction of the outer join must change:
select *
from t1
left join t2 on t2.t1_id = t1.id
is the same as
select *
from t2
right join t1 on t1.id = t2.t1_id
However, if you see a subquery used as a table, with syntax like
select *
from t1
join ( select t2.*
from t2
join t3 on t3.t2_id = t2.id
where t3.foobar = 37
) x on x.t1_id = t1.id
You'll note the table alias (x) assigned to the subquery above.
What you have is something called a derived table (though some people call it a virtual table). You can think of it as a temporary view that exists for the life of a query. It's particularly useful when you need to filter something based on something like the result of a aggregration (group by).
The T-SQL documentation on the select, under the from clause goes into the details:
http://msdn.microsoft.com/en-us/library/ms189499(v=SQL.100).aspx
http://msdn.microsoft.com/en-us/library/ms177634(v=sql.100).aspx
It's not necessary in this case.
It's necessary (or at the very least, a lot simpler) in some others, especially where you name the nested call:
select table1.fieldX, table2.fieldY, sq.field6 from
table1 join table2 on field1 = field2
join ( select
top 1 table3.field6
from table3 join table4
on field3 = field4
where table3.field7 = table2.field8
order by fieldGoshIveUsedALotOfFieldsAlready
) sq on sq.field6 = field12345
The code you had could have been:
Like the above once, and then refactored.
Machine produced.
Reflecting the thought process of the developer as he or she arrived at the query, as they thought of that part of the larger query as a unit, then worked it into the larger query.
In this case they are not necessary:
select * from table1
join table2 on field1 = field2
join table3 on field3 = field2
join table4 on field3 = field4
join table5 on field5 = field6
Produces the same result.

Best self join technique when checking for duplicates

i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > #Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)
Why self join: this is an aggregate question.
Hope you have an index on col1, col2, ...
--DELETE table
--WHERE KeyCol NOT IN (
select
MIN(KeyCol) AS RowToKeep,
col1, col2,
from
table
GROUP BY
col12, col2
HAVING
COUNT(*) > 1
--)
However, this will take some time. Have a look at bulk delete techniques
You can use ROW_NUMBER() to find duplicate rows in one table.
You can check here
The two methods you give should be equivalent. I think most SQL engines would do exactly the same thing in both cases.
And, by the way, this won't work. You have to have at least one field that is differernt or every record will match itself.
You might want to try something more like:
select col1, col2, col3
from table
group by col1, col2, col3
having count(*)>1
For table with 100m+ rows, Using GROUPBY functions and using holding table will be optimized. Even though it translates into four queries.
STEP 1: create a holding key:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
STEP 2: Push all the duplicate entries into the holddups. This is required for Step 4.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 3: Delete the duplicate rows from the original table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 4: Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
To detect duplicates, you don't need to join:
SELECT col1, col2
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1
That should be much faster.
In my experience, SQL Server performance is really bad with OR conditions. Probably it is not the self join but that with table3 that causes the bad performance. But without seeing the plan, I would not be sure.
In this case, it might help to split your query into two:
One with a WHERE condition t3.uniqueoid = 1 and one with a WHERE condition for the other conditons on table3, and then use UNION ALL to append one to the other.