Wrapping SQL query into outer Select causes Order By reshuffle - sql

I have an engine that builds a query. So this is not static and this is why I had to go this way (below). Plus, it works for SQL and Oracle (Oracle adds different wrapper, RowNum, etc...). I have no easy way to test Oracle but below is SQL Server problem, step-by-step logic
Lets take a simple query
Select field1 as f1, myDate dateFld From table1 t1 Where t1.field2 = 1
I may or may not, have to union output with another table
Select field1, myDate dateFld as f1 From table1 t1 Where t1.field2 = 1
Union
Select field2, myDate dateFld as f1 From table2 t2 Where t2.field2 = 2
I need to get only N records from this Union
Select Top(N) *
From
(
Select field1 as f1, myDate dateFld From table1 t1 Where t1.field2 = 1
Union
Select field2 as f1, myDate dateFld From table2 t2 Where t2.field2 = 2
) Union_Tbl_Alias
Order By dateFld Desc, f1
Remember this "Order by"
I also have Select Subqueries (and nothing I can do but have them in Select), which I moved to yet another Select wrapper
Select
f1,
myDate,
(Select field99 From table99 t99 Where t99.f1 = Outer_Tbl_Alias.f1) as f3
From
(
Select Top(N) *
From
(
Select field1 as f1, myDate dateFld From table1 t1 Where t1.field2 = 1
Union
Select field2 as f1, myDate dateFld From table2 t2 Where t2.field2 = 2
)
Order By dateFld Desc, f1
) Outer_Tbl_Alias
So the problem is that outer-most select reshuffles records a bit. They no longer sorted dateFld Desc.
I don't want to speculate, I think, this is only SQL Server issue but I will test it in oracle as well. Moving "Order By" to outer-most statement fixes it for SQL Server.
But I'm wondering:
1 - why it happens?
2 - is there a hint to tell SQL server - keep the order of inner Select?

That behavior appears to make sense. Your outer query does not contain an ORDER BY clause so the order of the results is arbitrary. The fact that rows may have been ordered in a subquery is not controlling (though it undoubtedly does end up affecting the order of the results). Since you are building the query programmatically, it would make far more sense to add whatever ORDER BY clause you want than to try to work around the issue (and I'm not aware of a way to work around the issue that is guaranteed to work every time).
You'll have exactly the same issue when you run against an Oracle database and switch out the TOP for a couple of nested queries with rownum predicates. The only way to guarantee the order of your results is to add an ORDER BY clause. Since that is going to be necessary regardless of the database you are using, it makes even more sense to do it correctly by adding the additional ORDER BY to the outer query rather than having different database-specific workarounds.

Related

Write a where clause that compares two columns to the same subquery?

I want to know if it's possible to make a where clause compare 2 columns to the same subquery. I know I could make a temp table/ variable table or write the same subquery twice. But I want to avoid all that if possible. The Subquery is long and complex and will cause significant overhead if I have to write it twice.
Here is an example of what I am trying to do.
SELECT * FROM Table WHERE (Column1 OR Column2) IN (Select column from TABLE)
I'm looking for a simple answer and that might just be NO but if it's possible without anything too elaborate please clue me in.
I updated the select to use OR instead of AND as this clarified my question a little better.
The example you've given would probably perform best using exists, such as:
select *
from t1
where exists (
select 1 from t2
where t2.col = t1.col1 and t2.col = t1.col2
);
To prevent writing the complicated subquery twice, you can use a CTE (Common Table Expression):
;WITH MyFirstCTE (x) AS
(
SELECT [column] FROM [TABLE1]
-- add all the very complicated stuff here
)
SELECT *
FROM Table2
WHERE Column1 IN (SELECT x FROM MyFirstCTE)
AND Column2 IN (SELECT x FROM MyFirstCTE)
Or using EXISTS:
;WITH MyFirstCTE (x) AS
(
SELECT [column] FROM [TABLE1]
-- add all the very complicated stuff here
)
SELECT *
FROM Table2
WHERE EXISTS (SELECT 1 FROM MyFirstCTE WHERE x = Column1)
AND EXISTS (SELECT 1 FROM MyFirstCTE WHERE x = Column2)
I used deliberately clumsy names, best to pick better ones.
I started it with a ; because if it's not the first command in a larger script then a ; is needed to separate the CTE from the commands before it.

Using distinct on in subqueries

I noticed that in PostgreSQL the following two queries output different results:
select a.*
from (
select distinct on (t1.col1)
t1.*
from t1
order by t1.col1, t1.col2
) a
where a.col3 = value
;
create table temp as
select distinct on (t1.col1)
t1.*
from t1
order by t1.col1, t1.col2
;
select temp.*
from temp
where temp.col3 = value
;
I guess it has something to do with using distinct on in subqueries.
What is the correct way to use distinct on in subqueries? E.g. can I use it if I don't use where statement?
Or in queries like
(
select distinct on (a.col1)
a.*
from a
)
union
(
select distinct on (b.col1)
b.*
from b
)
In normal situation, both examples should return the same result.
I suspect that you are getting different results because the order by clause of your distinct on subquery is not deterministic. That is, there may be several rows in t1 sharing the same col1 and col2.
If the columns in the order by do not uniquely identify each row, then the database has to make its own decision about which row will be retained in the resultset: as a consequence, the results are not stable, meaning that consecutive executions of the same query may yield different results.
Make sure that your order by clause is deterministic (for example by adding more columns in the clause), and this problem should not arise anymore.

Differences between select #var=column1 from table1 & select top 1 column1 from table1

In sql server 2005
differences between select #var=column1 from table1 & select top 1 column1 from table1
I have a problem with a view that has a column in select statement at this model
select column0, fn(column0) as col from table2
that fn return select #var=column1 from table1 where table1.column3=#inputid
I replace it with this
select
column0,
(select top 1 column1 from table1 where table1.id = table2.column0) as col
from table2
but result is not same as previous
and using order by in
select top 1 column1 from table1 where table1.id = table2.column0
has no effect too
I need to know why can I change
select top 1 column1 from table1 where table1.id = table2.column0
that has same result as
select #var=column1 from table1 where table1.column3 = #inputid
When Sql Server compiles a query, It doesn't compiles Scalar valued function. So you can never know which result will come from Table valued function. Moreover scalar valued function results in relatively bad performance when compared to inline query or table valued functions.
Since ORDER BY gets executed after TOP in case of UNION. So I doubt this case is getting applied. Can you paste the execution plan of query
Order of execution of query
1. FROM, JOIN, APPLY and ON
2. WHERE
3. GROUP BY
4. HAVING
5. SELECT
6. ORDER BY
7. TOP
8. FOR XML
When using UNION, Order of execution changes slightly
1. FROM, JOIN, APPLY and ON
2. WHERE
3. GROUP BY
4. HAVING
5. TOP
5. UNION and SELECT
6. ORDER BY
8. FOR XML

sql delete rows with 1 column duplicated

I have a microsoft sql 2005 db table where the entire row is not duplicate, but a column is duplicated.
1 aaa
1 bbb
1 ccc
2 abc
2 def
How can i delete all the rows but 1 that have the first column duplicated?
For clarification I need to get rid of the second, third and fifth rows.
Try the following query in sql server 2005
WITH T AS (SELECT ROW_NUMBER()OVER(PARTITION BY id ORDER BY id) AS rnum,* FROM dbo.Table_1)
DELETE FROM T WHERE rnum>1
Let's call these the id and the Col1 columns.
DELETE myTable T1
WHERE EXISTS
(SELECT * FROM myTable T2
WHERE T2.id = T1.id AND T2.Col1 > T1.Col1)
Edit: As pointed out by Andomar, the above doesn't get rid of exact duplicate cases, where both id and Col1 are the same in different rows.
These can be handled as follow:
(note: whereby the above query is generic SQL, the following applies to MSSQL 2005 and above)
It uses the Common Table Expression (CTE) feature, along with ROW_NUMBER() function to produce a distinctive row value. It is essentially the same construct as the above except that it now works with a "table" (CTEs are mostly like a table) which has a truly distinct identifier key.
Note that by removing "AND T2.Col1 = T1.Col1", we produce a query which can handle both types of duplicates (id-only duplicates and both Id and Col1 duplicates) in a single query, i.e. in a similar fashion that Hamadri's solution (the PARTITION in his/her CTE serves the same purpose as the subquery in this solution, essentially the same amount of work is done). Depending on the situation, it may be preferable, performance-wise or other, to handle the situation in two steps.
WITH T AS
(SELECT ROW_NUMBER() OVER (ORDER BY id, Col1) AS rn, id, Col1 FROM MyTable)
DELETE T AS T1
WHERE EXISTS
(SELECT *
FROM T AS T2
WHERE T2.id = T1.id AND T2.Col1 = T1.Col1
AND T2.rn > T1.rn
)
DELETE tableName as ta
WHERE col2 NOT IN (SELECT MIN(col2) FROM tableName AS t2 GROUP BY col1)
Make sure the sub select returns the rows you want to keep.
Try this.
DELETE FROM <TABLE_NAME_HERE> WHERE <SECOND_COLUMN_NAME_HERE> IN ("bbb","abc","def");
SQL server is not my native SQL database, but maybe something like this? The idea is to get the duplicates and delete the ones with the larger ROW_NUMBER. This should leave only the first one. I dont know if this is what you want or if it will work, but the logic seems sound
DELETE T1
FROM T1 T2
WHERE T1.Col1 = T2.col1
AND T1.ROW_NUMBER() > T2.ROW_NUMBER()
Please feel free to correct me if SQL server cant handle that kind of treatment :)
--Another idea using ROW_NUMBER()
Delete MyTable
Where Id IN
(
Select T.Id FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY UniqueColumn ORDER BY Id) AS RowNumber FROM MyTable
)T
WHERE T.RowNumber > 1
)

Combining several query results into one table, how is the results order determined?

I am retuning table results for different queries but each table will be in the same format and will all be in one final table. If I want the results for query 1 to be listed first and query2 second etc, what is the easiest way to do it?
Does UNION append the table or are is the combination random?
The SQL standard does not guarantee an order unless explicitly called for in an order by clause. In practice, this usually comes back chronologically, but I would not rely on it if the order is important.
Across a union you can control the order like this...
select
this,
that
from
(
select
this,
that
from
table1
union
select
this,
that
from
table2
)
order by
that,
this;
UNION appends the second query to the first query, so you have all the first rows first.
You can use:
SELECT Col1, Col2,...
FROM (
SELECT Col1, Col2,..., 1 AS intUnionOrder
FROM ...
) AS T1
UNION ALL (
SELECT Col1, Col2,..., 2 AS intUnionOrder
FROM ...
) AS T2
ORDER BY intUnionOrder, ...