Two equal tables (different column numbers) have different number of rows - sql

AWS Redshift DB
I have two tables A and B
select col1, col2 from A
except
select col1, col2 from B
returns empty, the same
select col1, col2 from B
except
select col1, col2 from A
returns empty
but
select count(*) from A
returns for example 100, but
select count(*) from B
returns 200
how can that be ?

Because each tables distinct data set is contained in the other. A different count means that you have duplicate rows. This might make it clearer.
Distinct(A) is a subset of B
Distinct(B) is a subset of A

Related

SQL query to remove duplicates from a table with 139 columns and load all columns to another table

I need to remove the duplicates from a table with 139 columns based on 2 columns and load the unique rows with 139 columns into another table.
eg :
col1 col2 col3 .....col139
a b .............
b c .............
a b .............
o/p:
col1 col2 col3 .....col139
a b .............
b c .............
need a SQL query for DB2?
If the "other table" does not exist yet you can create it like this
CREATE TABLE othertable LIKE originaltable
And the insert the requested row with this statement:
INSERT INTO othertable
SELECT col1,...,coln
FROM (SELECT
t.*,
ROW_NUMBER() OVER (PARTITION BY col1, col2 ORDER BY col1) AS num
FROM t) t
WHERE num = 1
There are numerous tools out there that generate queries and column lists - so if you do not want to write it by hand you could generate it with these tools or use another SQL statement to select it from the Db2 catalog table (syscat.columns).
You might be better just deleting the duplicates in place. This can be done without specifying a column list.
DELETE FROM
( SELECT
ROW_NUMBER() OVER (PARTITION BY col1, col2) AS DUP
FROM t
)
WHERE
DUP > 1
You can use row_number():
select t.*
from (select t.*,
row_number() over (partition by a, b order by a) as seqnum
from t
) t;
If you don't want seqnum in the result set, though, you need to list out all the columns.
To find duplicate values in col1 or any column, you can run the following query:
SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1;
And if you want to delete those duplicate rows using the value of col1, you can run the following query:
DELETE FROM your_table WHERE col1 IN (SELECT col1 FROM your_table GROUP BY col1 HAVING COUNT(*) > 1);
You can use the same approach to delete duplicate rows from the table using col2 values.

Select (show) only different columns from almost similar rows

I have a table with many columns 50+. in order to take decisions I analyze any variant data.
Actually my query:
SELECT maincol, count(maincol) FROM table where (conditions) group by maincol having count(maincol) > 1
then:
SELECT * FROM table where (conditions) and maincol = (previous result)
before consult displays all rows and I have to search one by one
col1, col2, col3, col4, col5, col6, manycolumns..., colN
5 7 1 13 341 9 123
5 7 2 13 341 5 123
I want to get:
col3, col6
1 9
2 5
because it's difficult searching manually column by column.
- N columns could be different
- I don't have access to credentials, then I can't use a programing language to manage results.
- Working on DB2
This will be a little tedious but worth it. This assumes that col1 through coln are all of the same type. If not, cast each to character in the select clause.
The result set will identify the maincol values that occur more than once that also have one or more columns with differing values. The columns that differ will be named.
Select maincol, colname, count(distinct colvalue)
From (
Select maincol, ‘column1’ as colname, col1 as colvalue
from table
Union
Select maincol, ‘column2’ as colname, col2 as colvalue
from table
Union
Select maincol, ‘column3’ as colname, col3 as colvalue
from table
Repeat this pattern for remaining columns
)
Group by maincol, colname
Having count(distinct colvalue) > 1
You could even join the result set from above with the original table to show the entire row including the name of the columns that differ:
Select b.colname, a.*
From table a, Select(
include entire query from above
) as b
Where a.maincol = b.maincol

Counting matching rows of two same tables and counting rows of the table

I have the same table structure called "table1" under two different schemas "schema1" and "schema2". "table1" contains columns "col1, col2, col3". Initialy I want see whether there are records having the same entries of col1 and col2 in the table schema1.table1 and schema2.table1. But I had mistyped schema2.table1 as schema1.table1. And now I am confused by the query result.
SELECT COUNT(*) FROM schema1.table1 AS s1t, schema1.table1 AS s2t
WHERE s1t.col1 = s2t.col1 AND s1t.col2 = s2t.col2;
I got
count
-------
530
(1 row)
However, SELECT COUNT(*) FROM schema1.table1; shows that there are 17815 rows.
Why would the first query show there are only 530 satisfied records? Shouldn't it be 17815 as well?
You can try to use FULL OUTER JOIN to see even mismatched rows, including null values for columns(col1 and 2). This way, at least(more than or equal to) 17815 rows return
SELECT COUNT(*)
FROM schema1.table1 AS s1t
FULL OUTER JOIN schema1.table1 AS s2t
ON s1t.col1 = s2t.col1 AND s1t.col2 = s2t.col2
In your case, only matched rows return for those columns (col1 and 2).
You are joining the table to itself. That is really strange.
In any case, your join is going to filter out any rows where col1 or col2 are NULL.
In addition, the self-join might multiply the number of rows if there are duplicates (with respect to the two columns) in the table.
It is really unclear why you would be doing this, but the above explains the results you are seeing.
If you want to compare the results in the two schemas allowing for duplicates and missing values, I recommend union all/group by:
select col1, col2, sum(cnt1) as cnt1, sum(cnt2) as cnt2
from ((select col1, col2, count(*) as cnt1, 0 as cnt2
from schema1.table1
group by col1, col2
) union all
(select col1, col2, 0 as cnt1, count(*) as cnt2
from schema2.table1
group by col1, col2
)
) t12
group by col1, col2
having sum(cnt1) <> sum(cnt2);
This returns pairs where the counts are not the same in the two tables. It even works for NULL values. If you ran this on the same table, no rows would be returned.

Exclude rows that have same value in two different columns

I have a table which has 2 columns that sometimes have the same values. I want to know how to exclude the rows where the value of column1 is equal to a value in column2.
EXAMPLE:
COL1 | COL2
1 -------- 7
2 -------- 8
3 -------- 2
4 -------- 5
5 -------- 9
Here I would exclude rows 2 and 5.
Thanks
select
*
from table
where col1 not in (
select
column2
from table
)
Something like this should work :
SELECT *
FROM yourtable
WHERE COL1 NOT IN (SELECT COL2
FROM yourtable)
I tend to avoid using IN for long lists of values, as it performs poorly on some database systems. The following selects all values from col1 that are not present in col2:
SELECT col1
FROM
yourtable t1
LEFT JOIN
yourtable t2
ON
t1.col1 = t2.col2
WHERE
t2.col2 IS NULL
Why does it work? Well, normally the join operator will link together rows that have the same value. Left join will keep some rows that are mismatched though (and it's those we want). The left join takes the table on the left (t1) and uses it as the reference table, and starts associating rows from the table on the right (after the word JOIN, in this case t2). If the col1 value has a matching value in col2 then the row will be fully populated with values for each. If the value from col1 has no matching value from col2, the col2 cell on the resulting row is blank/null. Because we hence want to know only those values that aren't matched, we say "where col2 is null"
The other trick with getting to grips with this is in understanding that the same table can appear twice in a query. We give it a different alias each time we use it so we can tell them apart. You could conceive it as virtually making a copy of the table, before it links them together
Use EXCEPT together with a correlated sub-query - as shown below.
Read up on EXCEPT here: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/set-operators-except-and-intersect-transact-sql
SELECT *
FROM TEST
EXCEPT
SELECT *
FROM TEST
WHERE COL1 IN (
SELECT COL2
FROM TEST
)
not sure, but maybe...
SELECT t1.*
FROM my_table AS t1
LEFT JOIN my_table AS t2
ON t2.col_b = t1.col_a
WHERE t2.col_b IS NULL

How to get not equal rows in SQL query

I have 2 tables and I want not equal rows to be fetched. How to write a query?
For example, table a contain 10 rows, table b contain 10 rows.
Equal rows in a and b is 5.
I want to take a not equal rows (not in b table)
How to fetch a table value which is not equal to b table ?
Result should be 5 record
To take rows in A but not in B:
select * from A minus select * from B
To take rows in A and B but not in both:
(select * from A union select * from B) minus (select * from A intersect select * from B)
This problem has been solved long ago. The optimal solution only reads each table once (unlike the "symmetric difference" solution which reads each table twice and does some additional work).
select 'A' as source, col1, col2, ...
from table_A
union all
select 'B' as source, col1, col2, ...
from table_B
group by col1, col2, ...
having count(*) = 1
;
If a row is present in both tables, then the count will be 2.
This assumes there are no duplicate rows in either table; if there may be duplicate rows, the HAVING condition can be modified, for example:
having count(case when source = 'A' then 1 end) = 0
or count(case when source = 'B' then 1 end) = 0
use EXCEPT
the syntax is similar to INTERSECT.
https://www.tutorialspoint.com/sql/sql-intersect-clause.htm