Alternatives to the "Partition By" Statement in SQL - sql

If I am interested in selecting records that have the largest value in each group of duplicate records (and some general conditions), I normally do this with the following SQL code:
select a.id, a.col2, b.col3
from (select id, col2, col3,
rank() over (partition by id order by col2 desc, col3 desc) as r1 from my_table where col2 > 5 and col3 > 5) a
where a.r1 =1
I am interested in learning alternate ways to do this.
Are there other popular ways to do this in (netezza) SQL?
Thank you!

One way to do it is with NOT EXISTS:
SELECT t1.id, t1.col2, t1.col3
FROM my_table t1
WHERE t1.col2 > 5 AND t1.col3 > 5
AND NOT EXISTS (
SELECT 1
FROM my_table t2
WHERE t2.id = t1.id AND t2.col2 > 5 AND t2.col3 > 5
AND (t2.col2 > t1.col2 OR (t2.col2 = t1.col2 AND t2.col3 > t1.col3))
);
Or, if you use a CTE to preselect from the table:
WITH cte AS (
SELECT id, col2, col3
FROM my_table
WHERE col2 > 5 AND col3 > 5
)
SELECT c1.*
FROM cte c1
WHERE NOT EXISTS (
SELECT 1
FROM cte c2
WHERE c2.id = c1.id
AND (c2.col2 > c1.col2 OR (c2.col2 = c1.col2 AND c2.col3 > c1.col3))
);
Depending on the requirement, the WHERE clause inside the subquery may be a lot more complex. This is one of the reasons that, if you can, you should use a window function.

Related

Join or Union table with same Characteristics and different Measures

I would like to understand the easy/better way to join 2 tables with same characteristics and different measures as an example described below:
tab1
Col1
Col2
Measure1
1
1
10
1
2
5
tab2
Col1
Col2
Measure2
1
1
20
2
1
25
Expected Result
Col1
Col2
Measure1
Measure2
1
1
10
20
1
2
5
0
2
1
0
25
Questions:
How to avoid message: Ambiguous column name col1?
How to create a correct Join?
I have tried:
select col1, col2, t1.Measure1, t2.Measure2
from tab1 t1
full outer jon tab2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2
I have tried a Union and it works, but i am looking a easy way using joins:
Select col1, col2, Measure1, 0 as Measure2 From tab1
Union
Select col1, col2, 0 as Measure1, Measure2 From tab2
The full join is the correct approach. But you need to disambiguate col1 and col2 in the select clause: both tables have both columns, so it is unclear to which column an unprefixed col1 refers.
A typical approach uses coalesce():
select
coalesce(t1.col1, t2.col1) col1,
coalesce(t1.col2, t2.col2) col2,
coalesce(t1.measure1, 0) measure1,
coalesce(t2.measure2, 0) measure2
from tab1 t1
full outer jon tab2 t2
on t1.col1 = t2.col1 and t1.col2 = t2.col2
Note that you also need coalesce() around the measures to return 0 instead of null on "missing" values.
In some databases (eg Postgres), you can use the using syntax to declare the join conditions for columns that have the same name across the tables ; this syntax automagically disambiguates the unprefixed column names, so:
select
col1,
col2,
coalesce(t1.measure1, 0) measure1,
coalesce(t2.measure2, 0) measure2
from tab1 t1
full join tab2 t2 using (col1, col2)
You should reference source tables for col1 and col2 as well.
As you're using FULL OUTER JOIN I'd recommend using COALESCE statement.
SELECT COALESCE(t1.col1, t2.col1) as col1,
COALESCE(t1.col2, t2.col2) as col2,
t1.Measure1,
t2.Measure2
FROM tab1 t1
FULL OUTER JOIN tab2 t2
on t1.col1 = t2.col1
and t1.col2 = t2.col2

inserting into a table records from another table with a condition

Let's say I have two tables t1 and t2.
t1 has two integer cols col1 (primary) and col2
t2 has two cols a foreign key of t1.col1 and t2.col2
I want to do the following
Retrieve only the records where t1.col2 is unique OR if t1.col2 is duplicate only those if t2.col2 is not null.
Insert the above records into another summary table, let's say t3
This is what I tried:
insert into t3 (col1,col2)
select col1, col2
from t1
where t.col1 in (select A.col1 from t1 as A
group by 1
having count(*) > 1
union
select col1, col2
from t1, t2
where t.col1 in (select A.col1 from t1 as A
group by 1
having count(*) > 1
and t2.col2 is not null;
While the 'union qry' works on its own, the insert is not happening.
Any ideas or any other efficient way to achieve this please
You can select the records you want using:
select t1.*
from (select t1.*, count(*) over (partition by col2) as cnt
from t1
) t1
where cnt = 1 or
exists (select 1 from t2.col1 = t1.col1 and t2.col2 is null);
The rest is just an insert.

Compare a two column with the same table to remove duplicate

A sample table with two column and I need to compare the column 1 and column 2 to the same table records and need to remove the column 1 + column 2 = column 2+column 1.
I tried to do self join and case condition. But its not working
If I understand correctly, you can run a simple select like this if you have all reversed pairs in the table:
select col1, col2
from t
where col1 < col2;
If you have some singletons, then:
select col1, col2
from t
where col1 < col2 or
(col1 > col2 and
not exists (select 1
from t t2
where t2.col1 = t.col2 and
t2.col2 = t.col1
)
);
You can use the except operator.
"EXCEPT returns distinct rows from the left input query that aren't output by the right input query."
SELECT C1, C2 FROM table
Except
SELECT C2, C1 FROM table
Example with your given data set : dbfiddle
I am posting the answer based on oracle database and also the columns are string/varchar:
delete from table where rowid in (
select rowid from table
where column1 || column2 =column2 || column1 )
Feel free to provide more input and we can tweak the answer.
Okay. There might be a simpler way of doing this but this might work as well. {table} is to be replaced with your table name.
;with orderedtable as (select t1.col1, t1.col2, ROW_NUMBER() OVER(ORDER BY t1.col1, t1.col2 ASC) AS rownum
from (select distinct t2.col1, t2.col2 from {table} t2) as t1)
select f1.col1, f1.col2
from orderedtable f1
left join orderedtable f2 on f1.col1 = f2.col2 and f1.col2 = f2.col1 and f1.rownum < f2.rownum
where f2.rownum is null
The SQL below will get the reversed col1 and col2 rows:
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
And when we get these reversed rows, we can except them with the left join clause, the complete SQL is:
select
t.col1,t.col2
from
table t
left join
(
select
distinct t2.col1,t1.col2
from
table t1
join
table t2 on t1.col1 = t2.col2 and t1.col2 = t2.col1
) tmp on t.col1 = tmp.col1 and t.col2 = tmp.col2
where
tmp.col1 is null
Is it clear?

How to achieve the below requirements in sql

I have table like below
the result set should contain 4,5 since it has count of (c2) and (c3) >1
You can use exists :
select t.*
from table t
where exists (select 1 from table t1 where t1.col2 = t.col2 and t1.col1 <> t.col1);
JOIN a subquery that returns the duplicated col2 values:
select t1.*
from tablename t1
join (select col2 from tablename
group by col2 having count(*) > 1) t2
on t1.col2 = t2.col2
I would use window functions:
select t.*
from (select t.*,
count(*) over (partition by c2) as c2_cnt,
count(*) over (partition by c3) as c3_cnt
from t
) t
where c2_cnt > 1 and c3_cnt > 1;
As you changed your requirement so i changed my query
select t1.* from table_name t1 where
exists( select 1 from table_name t2 where t2.col2=t1.col2
and t2.c3=t1.c3
group by t2.c2,t2.c3
having count(*)>1
)

Remove inner query in SQL

We have a SQL query which is not written as per the sql guideline. We have to change the query but if we change the logic and remove the inner query then it take to much time to execute. Below is the query:
select col1,
col2,
case
when col1 <> '' then(select top 1
col1
from table1 as BP
where bp.col1 = FD.col1 order by BP.col1)
when col2 <> '' then(select top 1
BP.col2
from table1 as BP
where BP.col2 = FD.col2 order by BP.col2)
else ''
end
from table2 FD
The above query is being used to insert the data into a temp table. The table1 has almost 100 million of data. Is there any way to remove the inline query along with the good performance. We have already created the indexes on table1. Any thought?
Try this
;WITH CTE
AS
(
SELECT
RN = ROW_NUMBER() OVER(ORDER BY COALESCE(T2.col1,T2.col2)),
T2.col1,
T2.col2,
T1Val = COALESCE(T2.col1,T2.col2,'')
FROM table2 T2
LEFT JOIN table1 T1
ON
(
(
ISNULL(T2.col1,'')<>'' AND T1.col1 = T2.col1
)
OR
(
ISNULL(T2.col2,'')<>'' AND T1.col2 = T2.col2
)
)
)
SELECT
*
FROM CTE
WHERE RN = 1
Here is my modest help:
You can already prepare and materialize your subquery1 and subquery2 (Group BY col1 or col2) <-- It will reduce the size of your table 1)
Split your main query (from table2 into 3 different queries)
1 with SELECT .. FROM table2 WHERE col1 <> ''
1 with SELECT .. FROM table2 WHERE col1 = '' AND col2 <> ''
1 with SELECT .. FROM table2 WHERE col1 = '' AND col2 = ''
Use an INNER JOIN with your table created in the first point.
(If you use SSIS you can // and use your inner join table into a Lookup)
If your col1 or col2 use a NVARCHAR(MAX) or a big size, you should have a look to a HashFunction (MD5 for example) and compare Hash instead.
Be sure to have all your indexes
At least if it is not performant, you can try with:
OUTER APPLY (SELECT TOP 1 .. )
Another idea should be:
SELECT col1, col2, col1 AS yourNewCol
FROM table2 T2
WHERE EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col1 = T2.col1)
UNION ALL
SELECT col1, col2, col2 AS yourNewCol
FROM table2 T2
WHERE
NOT EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col1 = T2.col1)
AND EXISTS( SELECT 1 FROM table1 T1 WHERE T1.col2 = T2.col2)
UNION ALL
...
I don't have a clean solution for you, but some ideas.
Let me know if it helps you.
Regards,
Arnaud