We have a SQL query which is not written as per the sql guideline. We have to change the query but if we change the logic and remove the inner query then it take to much time to execute. Below is the query:
select col1,
col2,
case
when col1 <> '' then(select top 1
col1
from table1 as BP
where bp.col1 = FD.col1 order by BP.col1)
when col2 <> '' then(select top 1
BP.col2
from table1 as BP
where BP.col2 = FD.col2 order by BP.col2)
else ''
end
from table2 FD
The above query is being used to insert the data into a temp table. The table1 has almost 100 million of data. Is there any way to remove the inline query along with the good performance. We have already created the indexes on table1. Any thought?
Try this
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(ORDER BY COALESCE(T2.col1,T2.col2)),
T2.col1,
T2.col2,
T1Val = COALESCE(T2.col1,T2.col2,'')
FROM table2 T2
LEFT JOIN table1 T1
ON
(
(
ISNULL(T2.col1,'')<>'' AND T1.col1 = T2.col1
)
OR
(
ISNULL(T2.col2,'')<>'' AND T1.col2 = T2.col2
)
)
)
SELECT
*
FROM CTE
WHERE RN = 1
Here is my modest help:
You can already prepare and materialize your subquery1 and subquery2 (Group BY col1 or col2) <-- It will reduce the size of your table 1)
Split your main query (from table2 into 3 different queries)
1 with SELECT .. FROM table2 WHERE col1 <> ''
1 with SELECT .. FROM table2 WHERE col1 = '' AND col2 <> ''
1 with SELECT .. FROM table2 WHERE col1 = '' AND col2 = ''
Use an INNER JOIN with your table created in the first point.
(If you use SSIS you can // and use your inner join table into a Lookup)
If your col1 or col2 use a NVARCHAR(MAX) or a big size, you should have a look to a HashFunction (MD5 for example) and compare Hash instead.
Be sure to have all your indexes
At least if it is not performant, you can try with:
OUTER APPLY (SELECT TOP 1 .. )
Another idea should be:
SELECT col1, col2, col1 AS yourNewCol
FROM table2 T2
WHERE EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col1 = T2.col1)
UNION ALL
SELECT col1, col2, col2 AS yourNewCol
FROM table2 T2
WHERE
NOT EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col1 = T2.col1)
AND EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col2 = T2.col2)
UNION ALL
...
I don't have a clean solution for you, but some ideas.
Let me know if it helps you.
Regards,
Arnaud
Related
If I am interested in selecting records that have the largest value in each group of duplicate records (and some general conditions), I normally do this with the following SQL code:
select a.id, a.col2, b.col3
from (select id, col2, col3,
rank() over (partition by id order by col2 desc, col3 desc) as r1 from my_table where col2 > 5 and col3 > 5) a
where a.r1 =1
I am interested in learning alternate ways to do this.
Are there other popular ways to do this in (netezza) SQL?
Thank you!
One way to do it is with NOT EXISTS:
SELECT t1.id, t1.col2, t1.col3
FROM my_table t1
WHERE t1.col2 > 5 AND t1.col3 > 5
AND NOT EXISTS (
SELECT 1
FROM my_table t2
WHERE t2.id = t1.id AND t2.col2 > 5 AND t2.col3 > 5
AND (t2.col2 > t1.col2 OR (t2.col2 = t1.col2 AND t2.col3 > t1.col3))
);
Or, if you use a CTE to preselect from the table:
WITH cte AS (
SELECT id, col2, col3
FROM my_table
WHERE col2 > 5 AND col3 > 5
)
SELECT c1.*
FROM cte c1
WHERE NOT EXISTS (
SELECT 1
FROM cte c2
WHERE c2.id = c1.id
AND (c2.col2 > c1.col2 OR (c2.col2 = c1.col2 AND c2.col3 > c1.col3))
);
Depending on the requirement, the WHERE clause inside the subquery may be a lot more complex. This is one of the reasons that, if you can, you should use a window function.
A sample table with two column and I need to compare the column 1 and column 2 to the same table records and need to remove the column 1 + column 2 = column 2+column 1.
I tried to do self join and case condition. But its not working
If I understand correctly, you can run a simple select like this if you have all reversed pairs in the table:
select col1, col2
from t
where col1 < col2;
If you have some singletons, then:
select col1, col2
from t
where col1 < col2 or
(col1 > col2 and
not exists (select 1
from t t2
where t2.col1 = t.col2 and
t2.col2 = t.col1
)
);
You can use the except operator.
"EXCEPT returns distinct rows from the left input query that aren't output by the right input query."
SELECT C1, C2 FROM table
Except
SELECT C2, C1 FROM table
Example with your given data set : dbfiddle
I am posting the answer based on oracle database and also the columns are string/varchar:
delete from table where rowid in (
select rowid from table
where column1 || column2 =column2 || column1 )
Feel free to provide more input and we can tweak the answer.
Okay. There might be a simpler way of doing this but this might work as well. {table} is to be replaced with your table name.
;with orderedtable as (select t1.col1, t1.col2, ROW_NUMBER() OVER(ORDER BY t1.col1, t1.col2 ASC) AS rownum
from (select distinct t2.col1, t2.col2 from {table} t2) as t1)
select f1.col1, f1.col2
from orderedtable f1
left join orderedtable f2 on f1.col1 = f2.col2 and f1.col2 = f2.col1 and f1.rownum < f2.rownum
where f2.rownum is null
The SQL below will get the reversed col1 and col2 rows:
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
And when we get these reversed rows, we can except them with the left join clause, the complete SQL is:
select
t.col1,t.col2
from
table t
left join
(
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
) tmp on t.col1 = tmp.col1 and t.col2 = tmp.col2
where
tmp.col1 is null
Is it clear?
I'm a sales-force developer, I got a requirement to write a SQL query and I did it, but the performance in very low. Could you please help me here?
My query is like this:
select col1, col2,col3,col4
from table1
where col1 is not null
and col2='ABC'
and (col3 IN (SELECT field1 FROM table 2)
OR col4 in('A','B','C'))
Is there someway I can optimize this for better performance?
Update
I used left outer join to achieve it, Is that the correct way?
Try these queries:
SELECT col1, col2,col3,col4 FROM TABLE1 T1
WHERE (EXISTS (SELECT 1 FROM TABLE2 T2 WHERE T1.COL3 = T2.FIELD1)
OR COL4 IN ('A','B','C'))
SELECT col1, col2,col3,col4 FROM TABLE1 T1
WHERE EXISTS (SELECT 1 FROM TABLE2 T2 WHERE T1.COL3 = T2.FIELD1)
UNION
SELECT col1, col2,col3,col4 FROM TABLE1 T1 WHERE COL4 IN ('A','B','C')
I was wondering if there exists code to accomplish the following in SQL-Server 2008?
Table 1:
id column name
-------------------
1 col1
2 col2
3 col3
4 col2
Table 2:
col1 col2 col3
--------------------
a b c
Result Table:
id data
--------------------
1 a
2 b
3 c
4 b
Thanks in advance, I really have no idea how to do this.
You can use UNPIVOT table2 to access the data from the columns:
select t1.id, t2.value
from table1 t1
left join
(
select value, col
from table2
unpivot
(
value
for col in (col1, col2, col3)
) u
) t2
on t1.name = t2.col
see SQL Fiddle with Demo
Or you can use a UNION ALL to access the data in table2:
select t1.id, t2.value
from table1 t1
left join
(
select col1 value, 'col1' col
from table2
union all
select col2 value, 'col2' col
from table2
union all
select col3 value, 'col3' col
from table2
) t2
on t1.name = t2.col
see SQL Fiddle with Demo
I dont see how you do it withou a column connection them:
Table1:
ID
ColumnName
Table2:
Table1ID
Letter
Select table1.id, table2.Letter
from table1
inner join table2 on table1.ID = table2.Table1ID
You can do this with a case statement and cross join:
select t1.id,
(case when t1.columnname = 'col1' then t2.col1
when t1.columnname = 'col2' then t2.col2
when t1.columnname = 'col3' then t2.col3
end) as data
from table1 t1 cross join
table2 t2
i'm trying to optimize a query that is in production which is taking a long time. The goal is to find duplicate records based on matching field values criteria and then deleting them. The current query uses a self join via inner join on t1.col1 = t2.col1 then a where clause to check the values.
select * from table t1
inner join table t2 on t1.col1 = t2.col1
where t1.col2 = t2.col2 ...
What would be a better way to do this? Or is it all the same based on indexes? Maybe
select * from table t1, table t2
where t1.col1 = t2.col1, t2.col2 = t2.col2 ...
this table has 100m+ rows.
MS SQL, SQL Server 2008 Enterprise
select distinct t2.id
from table1 t1 with (nolock)
inner join table1 t2 with (nolock) on t1.ckid=t2.ckid
left join table2 t3 on t1.cid = t3.cid and t1.typeid = t3.typeid
where
t2.id > #Max_id and
t2.timestamp > t1.timestamp and
t2.rid = 2 and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.cid,-1) = isnull(t2.cid,-1) and
isnull(t1.rid,-1) = isnull(t2.rid,-1)and
isnull(t1.typeid,-1) = isnull(t2.typeid,-1) and
isnull(t1.cktypeid,-1) = isnull(t2.cktypeid,-1) and
isnull(t1.oid,'') = isnull(t2.oid,'') and
isnull(t1.stypeid,-1) = isnull(t2.stypeid,-1)
and (
(
t3.uniqueoid = 1
)
or
(
t3.uniqueoid is null and
isnull(t1.col1,'') = isnull(t2.col1,'') and
isnull(t1.col2,'') = isnull(t2.col2,'') and
isnull(t1.rdid,-1) = isnull(t2.rdid,-1) and
isnull(t1.stid,-1) = isnull(t2.stid,-1) and
isnull(t1.huaid,-1) = isnull(t2.huaid,-1) and
isnull(t1.lpid,-1) = isnull(t2.lpid,-1) and
isnull(t1.col3,-1) = isnull(t2.col3,-1)
)
)
Why self join: this is an aggregate question.
Hope you have an index on col1, col2, ...
--DELETE table
--WHERE KeyCol NOT IN (
select
MIN(KeyCol) AS RowToKeep,
col1, col2,
from
table
GROUP BY
col12, col2
HAVING
COUNT(*) > 1
--)
However, this will take some time. Have a look at bulk delete techniques
You can use ROW_NUMBER() to find duplicate rows in one table.
You can check here
The two methods you give should be equivalent. I think most SQL engines would do exactly the same thing in both cases.
And, by the way, this won't work. You have to have at least one field that is differernt or every record will match itself.
You might want to try something more like:
select col1, col2, col3
from table
group by col1, col2, col3
having count(*)>1
For table with 100m+ rows, Using GROUPBY functions and using holding table will be optimized. Even though it translates into four queries.
STEP 1: create a holding key:
SELECT col1, col2, col3=count(*)
INTO holdkey
FROM t1
GROUP BY col1, col2
HAVING count(*) > 1
STEP 2: Push all the duplicate entries into the holddups. This is required for Step 4.
SELECT DISTINCT t1.*
INTO holddups
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 3: Delete the duplicate rows from the original table.
DELETE t1
FROM t1, holdkey
WHERE t1.col1 = holdkey.col1
AND t1.col2 = holdkey.col2
STEP 4: Put the unique rows back in the original table. For example:
INSERT t1 SELECT * FROM holddups
To detect duplicates, you don't need to join:
SELECT col1, col2
FROM table
GROUP BY col1, col2
HAVING COUNT(*) > 1
That should be much faster.
In my experience, SQL Server performance is really bad with OR conditions. Probably it is not the self join but that with table3 that causes the bad performance. But without seeing the plan, I would not be sure.
In this case, it might help to split your query into two:
One with a WHERE condition t3.uniqueoid = 1 and one with a WHERE condition for the other conditons on table3, and then use UNION ALL to append one to the other.