Write a where clause that compares two columns to the same subquery? - sql

I want to know if it's possible to make a where clause compare 2 columns to the same subquery. I know I could make a temp table/ variable table or write the same subquery twice. But I want to avoid all that if possible. The Subquery is long and complex and will cause significant overhead if I have to write it twice.
Here is an example of what I am trying to do.
SELECT * FROM Table WHERE (Column1 OR Column2) IN (Select column from TABLE)
I'm looking for a simple answer and that might just be NO but if it's possible without anything too elaborate please clue me in.
I updated the select to use OR instead of AND as this clarified my question a little better.

The example you've given would probably perform best using exists, such as:
select *
from t1
where exists (
select 1 from t2
where t2.col = t1.col1 and t2.col = t1.col2
);

To prevent writing the complicated subquery twice, you can use a CTE (Common Table Expression):
;WITH MyFirstCTE (x) AS
(
SELECT [column] FROM [TABLE1]
-- add all the very complicated stuff here
)
SELECT *
FROM Table2
WHERE Column1 IN (SELECT x FROM MyFirstCTE)
AND Column2 IN (SELECT x FROM MyFirstCTE)
Or using EXISTS:
;WITH MyFirstCTE (x) AS
(
SELECT [column] FROM [TABLE1]
-- add all the very complicated stuff here
)
SELECT *
FROM Table2
WHERE EXISTS (SELECT 1 FROM MyFirstCTE WHERE x = Column1)
AND EXISTS (SELECT 1 FROM MyFirstCTE WHERE x = Column2)
I used deliberately clumsy names, best to pick better ones.
I started it with a ; because if it's not the first command in a larger script then a ; is needed to separate the CTE from the commands before it.

Related

SQL - Is there a way to check rows for duplicates in all columns of a table

I have a big table with over 1 million rows and 96 columns.
Using SQL I want to find rows where every value is the same. The table doesn't have any primary key so I'm not sure how to approach this. I'm not permitted to change the table structure.
I've seen people use count(*) and group by but I'm not sure if this is effective for a table with 96 columns.
Using COUNT() as an analytic function we can try:
WITH cte AS (
SELECT *, COUNT(*) OVER (PARTITION BY col1, col2, ..., col96) cnt
FROM yourTable
)
SELECT col1, col2, ..., col96
FROM cte
WHERE cnt > 1;
you can use md5 function as primary key.
select count(1),md5_col,* from (
select md5(concat_ws('',col1,col2)) as md5_col,* from db_name.table_name) tt group by md5_col;
For convenience, use BINARY_CHECKSUM:
with cte as (
select *, BINARY_CHECKSUM(*) checksum
from mytable
), cte2 as (
select checksum
from cte
group by checksum
having count(*) > 1
)
select distinct t1.*
from cte t1
join cte t2 on t1.checksum = t2.checksum
and t1.col1 = t2.col2
and t1.col2 = t2.col2
-- etc
where t1.checksum in (select checksum from cte2)
cte2 will return (almost) only truly matching rows, so join condition won't have many rows to exhaustively compare every column.
Rather than trying to boil the ocean and solve the entire problem with a single sql query (which you certainly can do...), I recommend using any indexes or statistics on the tables to filter out as many rows as you can.
Start by finding the columns with the most / fewest unique values (assuming you have statistics, that is), and smash them up against each other to rapidly exclude as many rows as possible. Take the results, dump them to a temp table, index fields as needed, and repeat.
Or you could just do this:
Declare #sql nvarchar(max);
Select #sql='select column1 from schema.table where case ' + stuff((select 'when col1!=' + quotename(name) + ' then 0 ' from sys.columns where object_id=object_id('schema.table') for xml path(''),Type).value('.','nvarchar(max)'),1,11,'') + 'else 1 end = 1';
Exec sp_executesql #sql;
If you must run that horrorshow of a query in production, please use snapshot isolation or move it to a temp table first (unless no one ever updates the table.
(Honestly, I would probably use something like that query on the temp table containing my filtered-down dataset... but anything you can do to makes sure that the comparisons aren't naïve (e.g. taking statistics into account) can improve your performance significantly. If you want to do it all at once, you could always join sys.tables to a temp table that puts your field comparisons into a thoughtful order. After all, once a case statement if found to be true, all the others will be skipped for that record. )

compare the two lines

There are two columns, in the same table.
I need to find values that are in the second column but not in the first.
All I've been able to think of so far:
SELECT DISTINCT [column],
FROM [table]
WHERE column2 LIKE "2";
I know the task is simple, but do you have any ideas?
I need to find values that are in the second column but not in the first.
This suggests not exists or a similar approach:
select t.*
from t
where not exists (select 1 from t t2 where t2.column2 = t.column1);

Using distinct on in subqueries

I noticed that in PostgreSQL the following two queries output different results:
select a.*
from (
select distinct on (t1.col1)
t1.*
from t1
order by t1.col1, t1.col2
) a
where a.col3 = value
;
create table temp as
select distinct on (t1.col1)
t1.*
from t1
order by t1.col1, t1.col2
;
select temp.*
from temp
where temp.col3 = value
;
I guess it has something to do with using distinct on in subqueries.
What is the correct way to use distinct on in subqueries? E.g. can I use it if I don't use where statement?
Or in queries like
(
select distinct on (a.col1)
a.*
from a
)
union
(
select distinct on (b.col1)
b.*
from b
)
In normal situation, both examples should return the same result.
I suspect that you are getting different results because the order by clause of your distinct on subquery is not deterministic. That is, there may be several rows in t1 sharing the same col1 and col2.
If the columns in the order by do not uniquely identify each row, then the database has to make its own decision about which row will be retained in the resultset: as a consequence, the results are not stable, meaning that consecutive executions of the same query may yield different results.
Make sure that your order by clause is deterministic (for example by adding more columns in the clause), and this problem should not arise anymore.

Can use select into with multiple cte

Can use select into with multiple cte? for example in the below code the result of the first cte cte_table is inserted into dbo.table1, then the other cte is defined. is this possible?
WITH cte_table
AS
(
SELECT *
FROM dbo.table
)
INSERT INTO dbo.table1
SELECT *
FROM [cte_table]
, cte_table2
AS
(
SELECT *
FROM dbo.table2
)
Chain all your CTEs and THEN do the select into.
WITH First_CTE AS
(
SELECT
Columns
FROM
Schema.Table
WHERE
Conditions
),
Second_CTE AS
(
SELECT
Columns
FROM
Schema.OtherTable
WHERE
Conditions
)
SELECT
Variables
INTO
NewTable
FROM
First_CTE A
JOIN
Second_CTE B
ON
A.MatchVar = B.MatchVar
This can be helpful if you have no need of the CTEs later but prefer a simpler method than subqueries for your ETL.
If your case is Re-usability of the record set, in that case use a Temp Table or Table variable.
e.g.
Select * Into #temp1 From dbo.table
INSERT INTO dbo.table1
SELECT * FROM #temp1
SELECT * FROM #temp1 ..... and do some other re-usability operations.
A chained Cte work as under (just an example)
;With Cte1 As ( Select * from table1)
,Cte2 As (Select * from table2)
select c1.*,c2.*
from cte1 c1, cte2 c2
Hope you understand when to use what and how.
No you can't: you get an error as INTO is not allowed, and, as others have pointed out, it makes sense as the CTE is intended to be a repeatable (and thereby static) reference.
And I recall reading somewhere that is/was in large part syntactical sugar, in so far as the cte is resolved out into a derived table when the sql is executed.
No You cant use select into in CTE. And it actually does not make any sense also.

SQL "In" Statement Match Anything

If I have a query like this
SELECT * FROM table1 WHERE col1 IN ({SUBS})
Is there anything I can replace {SUBS} with that will return all rows in the table?
Further details:
I am building the SQL dynamically in my app, so I cannot (should not) edit other parts of the query except what's in braces. So,
SELECT * FROM table1
will not do.
Also,
SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table1)
would be hackish and highly inefficient. Consider the table have more than 50k rows.
This would do it:
select col1 from table1
Edit: There seems to be a bit of confusion - the OP asked what value could be used to replace {SUBS} that would return all rows from table1. My answer above is what you could use in place of {SUBS} that would return all the rows.
This works for me in SQL Server:
SELECT * FROM INFORMATION_SCHEMA.COLUMNS
WHERE COLUMN_NAME IN (COLUMN_NAME)
Have you tried just using COL1 for {SUBS}?
e.g.
SELECT * FROM table1 WHERE col1 IN (col1)
If you replaced {SUBS} with SELECT col1 FROM table1, you would end up with
SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table1);
which would return all rows from table1. This is, of course, simply a more roundabout way of saying:
SELECT * FROM table1;
You're right,
SELECT * FROM table1 WHERE col1 IN (SELECT col1 FROM table1)
does work, but is highly inefficient; requiring a merge join to return all rows.
Use the following which is just as efficient as regular SELECT * FROM table1
SELECT * FROM table1 WHERE col1 IN (col1)
However, that said; I suggest you have a chat to the person who is trying to impose the SELECT * FROM table1 WHERE col1 IN ({SUBS}) structure. There is no good reason to do so.
It unnecessarily complicates queries.
Creates risk of highly inefficient queries.
Potentially even limits developers to use certain techniques.
I suspect the person imposing this is trying to implement some sort of silver-bullet framework. Remember, the golden rule in software development is that there are no silver-bullets.
If you're simply trying to retrieve every row in the table, then:
select * from table1
If you're trying to prove a point or win a bet or something, then:
select * from table1 where col1 in (select col1 from table1)
If the query requires some WHERE condition, then I would try to replace it with an EXISTS statement:
select
*
from
table1 t1
where
exists ( {subs} )
Then {subs} can be replaced with any expression that does not yield NULL.
This works in Oracle:
select * from table1 where col1 in (col1)