How to get not equal rows in SQL query - sql

I have 2 tables and I want not equal rows to be fetched. How to write a query?
For example, table a contain 10 rows, table b contain 10 rows.
Equal rows in a and b is 5.
I want to take a not equal rows (not in b table)
How to fetch a table value which is not equal to b table ?
Result should be 5 record

To take rows in A but not in B:
select * from A minus select * from B
To take rows in A and B but not in both:
(select * from A union select * from B) minus (select * from A intersect select * from B)

This problem has been solved long ago. The optimal solution only reads each table once (unlike the "symmetric difference" solution which reads each table twice and does some additional work).
select 'A' as source, col1, col2, ...
from table_A
union all
select 'B' as source, col1, col2, ...
from table_B
group by col1, col2, ...
having count(*) = 1
;
If a row is present in both tables, then the count will be 2.
This assumes there are no duplicate rows in either table; if there may be duplicate rows, the HAVING condition can be modified, for example:
having count(case when source = 'A' then 1 end) = 0
or count(case when source = 'B' then 1 end) = 0

use EXCEPT
the syntax is similar to INTERSECT.
https://www.tutorialspoint.com/sql/sql-intersect-clause.htm

Related

Looking for performance improvements in the SQL

Scenario:
There are 2 columns in the table with data as given in the sample below.
It is possible that the table has multiple rows for the same value of 'a' column.
In the example, Considering the 'a' column, There are three rows for '1' and one row for '2'.
Sample table 't1':
|a|b |
|1|1.1|
|1|1.2|
|1|2.2|
|2|3.1|
Requirement is to get following output:
Expected Query output:
|a|b |
|1|1.2|
|2|3.1|
Requirement:
Get the row if there is only one row for a given value for column 'a'.
If there are multiple rows for the same value for column 'a' and for all rows, FLOOR(b) == a, then get MIN(a) and MAX(b)
If there are multiple rows for column 'a' and for all rows, there is 1 row of column 'b' for which
FLOOR(b) > a, then ignore that row. from the remaining rows, get MIN(a) and MAX(b)
Query I used:
select distinct min(a) over(partition by table1.a) as a,
min(b) over(partition by table1.a) as b
from (
SELECT distinct Min(table2.a) OVER (PARTITION BY table2.a) AS a,
Max(table2.b) OVER (PARTITION BY table2.a) AS b
FROM t1 table2
union
SELECT distinct Min(table3.a) OVER (PARTITION BY table3.a) AS a,
Max(table3.b) OVER (PARTITION BY table3.a) AS b
FROM t1 table3
where table3.a = FLOOR(table3.b)
) table1;
This query is working and I am getting the desired output. Looking for inputs to improve by removing union and extra select from the above script.
Note: t1 is not a table but it's a procedure call in my case and there are additional columns that it returns. It would help if the extra call to the procedure can be avoided.
This is how I would get the data you need.
select t1.a, max(t1.b)
from (select a, b, count(1) over(partition by t1.a) cnt from t1) t1
where t1.a = floor(t1.b) or cnt = 1
group by t1.a ,cnt;
It has only one procedure call so it might run significantly faster
And please note that "union" clause not only appends two data sets, but removes duplicates as well. Removing duplicates causes additional checks between data sets and therefore is leading to performance issues.
It is in most cases better to use "union all" which doesn't check for duplicates

SQL - Snowflake Minus Operator

Hi I am running a query to check for any changes in a table between two dates....
SELECT * FROM TABLE_A where run_time = current_date()
MINUS
SELECT * FROM TABLE_A where run_time = current_date()-1
The first select statement (where run_time = current_date() return 3,357,210 records.
The second select statement (where run_time = current_date()-1 returns 0 records.
Using the MINUS operator, I was expecting to see 3,357,210 records (3,357,210 - 0) but instead I get 2,026,434
Any thoughts on why? Thanks
https://docs.snowflake.com/en/sql-reference/operators-query.html#minus-except
Removes rows from one query’s result set which appear in another query’s result set, with duplicate elimination.
Thus, you only have 2,026,434 unique values in your first query. The missing million-and-a-bit are the duplicates, which have been eliminated.
This query:
SELECT * FROM TABLE_A where run_time = current_date()
MINUS
SELECT * FROM TABLE_A where run_time = current_date()-1
Will always return all unique rows from Table_A. Why? Because run_time is one of the columns and it is different in the two queries. MINUS looks at all the columns. Note that this is true even if the second query returns rows, because the values on the rows are different.
If your total is different from the total number of rows, then you have duplicates in the table.
Here are two ways to get the new records. Let me assume that identical records are identified by col1/col2:
select col1, col2
from table_a
where run_time in (current_date(), current_date() -1)
group by col1, col2
having min(run_time) = current_date();
That is, the first occurrence is the current date.
Or:
select col1, col2
from table_a a
where a.run_time = current_date() and
not exists (select 1
from table_a a2
where a2.run_time = current_date() - 1 and
a2.col1 = a.col1 and a2.col2 = a.col2
);

Two equal tables (different column numbers) have different number of rows

AWS Redshift DB
I have two tables A and B
select col1, col2 from A
except
select col1, col2 from B
returns empty, the same
select col1, col2 from B
except
select col1, col2 from A
returns empty
but
select count(*) from A
returns for example 100, but
select count(*) from B
returns 200
how can that be ?
Because each tables distinct data set is contained in the other. A different count means that you have duplicate rows. This might make it clearer.
Distinct(A) is a subset of B
Distinct(B) is a subset of A

Select 10 records after id=somevalue, (id>somevalue) and select first 10 records if id=somevalue doesn't exist

I have an sql query which selects 10 records after id= somevalue, but i want to select the first 10 records if the record doesnt exist. Query is in below structure.
SELECT * FROM TABLE WHERE ID > x ORDER BY METRIC LIMIT 10
Provided, id here is a varchar field which is sorted based on some field.
This comes close to what you want:
SELECT *
FROM TABLE
ORDER BY (CASE WHEN ID > X THEN 1 ELSE 0 END) DESC,
METRIC
LIMIT 10
It will always return 10 records (assuming you have at least 10 records in the table). It will put the ones with id > x first. If there are not enough of those, then it will fill in with other records.
This will also work:
SELECT TOP 10 col1, col2
FROM #yourtable
WHERE col1 > #ID
UNION ALL
SELECT TOP 10 col1, col2
FROM #yourtable
WHERE NOT EXISTS (SELECT * FROM #yourtable WHERE col1 = #ID)
However, this assumes you have an ID that you can query on using greater than/less than to retrieve the desired "next ten" records. Also, you would probably need to add an "ORDER BY" clause to ensure the records have the desired values.

How to fix Indeterminate TOP 1 result from SQL select union

I am selecting an int value from two different tables as shown below
select col1 from tablea
union
select col1 from tableb
The requirement is if result is found in first query, use that; otherwise, look in second table.
select top 1 * from (select col1 from tablea union select col1 from tableb) as a
Problem is the highest numerical value of the result set is returned as top 1 - not the first result found.
I don't care about the numerical value as in it's order - I just want to apply precedence to if find a value from select 1 don't bother to run second query.
Without the top 1 * I get returned 3 and 6. When I do top 1 * I get 6 and same result if I do the other select first.
Help!
When you UNION these two selects the order IS NOT DEFINED you can think of it as a RANDOM order. So you should define order to get right results. For example:
select top 1 C1 from
(select col1 as C1, 1 as c2 from tablea
union
select col1 as C1, 2 as c2 from tableb) as a
ORDER BY C2