Row count discrepancy between Intersect and Except queries - sql

I'm getting some strange behaviour using intersect and except. Tb1 has the least rows out of the two tables, and the difference in row count between tb1 and the intersect query results is 143 (intersect = 9782, tb1 = 9925).
But when I run the same query with except, it returns 24 lines. My understanding is that it should have returned 143 rows, being the rows that didn't match in the intersect query. Could someone help me understand why this might be?
There is a possibility that both datasets have multiple duplicate rows (being subset data). Could this be the cause of the difference?
SELECT
amount
,date
FROM tb1
INTERSECT
SELECT
amount
,date
FROM tb2

As you're probably already aware, the difference between UNION and UNION ALL is that the former returns a unique result, while the latter doesn't.
The same can be said about INTERSECT versus INTERSECT ALL.
And also about EXCEPT versus EXCEPT ALL.
So when there are dups, then the totals can be different from what you expect.
Here's a simplified demo to illustrate.
create table TableA (
col1 int not null,
col2 varchar(8)
);
create table TableB (
col1 int not null,
col2 varchar(8)
);
insert into TableA (Col1, Col2) values
(1,'A') -- only A
, (3,'AB') -- 1 in both
, (4,'AAB'), (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, NULL); -- 1 NULL in both
8 rows affected
insert into TableB (Col1, Col2) values
(2,'B') -- only B
, (3,'AB') -- 1 in both
, (4,'AAB') -- 2 in A, 1 in B
, (5,'ABB'), (5,'ABB') -- 1 in A, 2 in B
, (6,'AABB'), (6,'AABB') -- 2 in both
, (7, null); -- 1 NULL in both
8 rows affected
select Col1, Col2 from TableA
intersect
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
7 | null
select Col1, Col2 from TableA
intersect all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
3 | AB
4 | AAB
5 | ABB
6 | AABB
6 | AABB
7 | null
select Col1, Col2 from TableA
except
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
select Col1, Col2 from TableA
except all
select Col1, Col2 from TableB
order by Col1, Col2
col1 | col2
---: | :---
1 | A
4 | AAB
Demo on db<>fiddle here

Related

sql selecting unique rows based on a specific column

I have an table like this :
Col1 Col2 Col3 Col4
asasa 1 d 44
asasa 2 sd 34
asasa 3 f 3
dssd 4 d 2
sdsdsd 5 sd 11
dssd 1 dd 34
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to filter out something like this based on Col1
Col1 Col2 Col3 Col4
asasa 1 d 44
dssd 4 d 2
sdsdsd 5 sd 11
xxxsdsds2 d 3
erewer 3 sd 3
I am trying to get the all unique rows based on the values in Col1. If I have duplicates in Col1, the first row should be taken.
I tried SELECT Col1 FROM tblname GROUP BY Col1 and got unique Col1 but extending it using * is giving me error.
You should be able to achieve your goal using something like the following:
WITH CTE AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Col1 ORDER BY Col2) AS rn FROM MyTable
)
SELECT * FROM CTE WHERE rn = 1
What it does is it creates a CTE (Common Table Expression) that adds a ROW_NUMBER on Col1, ordered by the data in row2.
In the outer select, we then only grab the rows from the CTE where the row number generated is 1.
Try this
;WITH CTE(
SELECT *,
ROW_NUMBER() OVER(PARTITIAN BY Col1 ORDER BY(SELECT NULL))RN
FROM tblname
)
SELECT Col1, Col2, Col3, Col4 FROM CTE;
Depending on the flavor of SQL that you have are using, what may help you are window functions.
In SQL Server, this can be accomplished with the FIRST_VALUE window function like so:
DROP TABLE IF EXISTS #vals;
CREATE TABLE #vals (COL1 VARCHAR(10), COL2 INT, COL3 VARCHAR(5), COL4 INT);
INSERT INTO #vals (COL1, COL2, COL3, COL4)
VALUES ('asasa', 1, 'd', 44),
('asasa', 2, 'sd', 34),
('asasa', 3, 'f', 3),
('dssd' , 4, 'd', 2),
('sdsdsd', 5, 'sd', 11),
('dssd', 1, 'dd', 34),
('xxxsdsds', 2, 'd', 3),
('erewer', 3, 'sd', 3);
SELECT *
FROM #vals
SELECT DISTINCT COL1,
FIRST_VALUE(COL2) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col2,
FIRST_VALUE(COL3) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col3,
FIRST_VALUE(COL4) OVER (PARTITION BY COL1 ORDER BY Col1) AS Col4
FROM #vals AS v1
This returns:
|COL1 | Col2 | Col3 | Col4|
|-----------|-----------|-----------|-------|
|asasa | 1 | d | 44 |
|dssd | 4 | d | 2 |
|erewer | 3 | sd | 3 |
|sdsdsd | 5 | sd | 11 |
|xxxsdsds | 2 | d | 3 |
which may then be ORDERed in whatever way is needed.
Select DISTINCT , should do the trick. Here is a good reference https://www.w3schools.com/sql/sql_distinct.asp

T-SQL sequential updating with two columns

I have a table created by:
CREATE TABLE table1
(
id INT,
multiplier INT,
col1 DECIMAL(10,5)
)
INSERT INTO table1
VALUES (1, 2, 1.53), (2, 3, NULL), (3, 2, NULL),
(4, 2, NULL), (5, 3, NULL), (6, 1, NULL)
Which results in:
id multiplier col1
-----------------------
1 2 1.53000
2 3 NULL
3 2 NULL
4 2 NULL
5 3 NULL
6 1 NULL
I want to add a column col2 which is defined as multiplier * col1, however the next value of col1 then updates to take the previous calculated value of col2.
The resulting table should look like:
id multiplier col1 col2
---------------------------------------
1 2 1.53000 3.06000
2 3 3.06000 9.18000
3 2 9.18000 18.36000
4 2 18.36000 36.72000
5 3 36.72000 110.16000
6 1 110.16000 110.16000
Is this possible using T-SQL? I've tried a few different things such as joining id to id - 1 and have played around with a sequential update using UPDATE and setting variables but I can't get it to work.
A recursive CTE might be the best approach. Assuming your ids have no gaps:
with cte as (
select id, multiplier, convert(float, col1) as col1, convert(float, col1 * multiplier) as col2
from table1
where id = 1
union all
select t1.id, t1.multiplier, cte.col2 as col1, cte.col2 * t1.multiplier
from cte join
table1 t1
on t1.id = cte.id + 1
)
select *
from cte;
Here is a db<>fiddle.
Note that I converted the destination type to float, which is convenient for this sort of operation. You can convert back to decimal if you prefer that.
Basically, this would require an aggregate/window function that computes the product of column values. Such set function does not exists in SQL though. We can work around this with arithmetics:
select
id,
multiplier,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2
from table1
Demo on DB Fiddle:
id | multiplier | col1 | col2
-: | ---------: | -----: | -----:
1 | 2 | 1.53 | 3.06
2 | 3 | 3.06 | 9.18
3 | 2 | 9.18 | 18.36
4 | 2 | 18.36 | 36.72
5 | 3 | 36.72 | 110.16
6 | 1 | 110.16 | 110.16
This will fail if there are negative multipliers.
If you wanted an update statement:
with cte as (
select col1, col2,
coalesce(min(col1) over() * exp(sum(log(multiplier)) over(order by id rows between unbounded preceding and 1 preceding)), col1) col1_new,
min(col1) over() * exp(sum(log(multiplier)) over(order by id)) col2_new
from table1
)
update cte set col1 = col1_new, col2 = col2_new

Query different number of records for each category in SQL

I have a table that looks like the following:
col1 | col2 | col3 | col4
A | 1 | 2 | 4
A | 2 | 5 | 3
A | 5 | 1 | 6
B | 3 | 1 | 2
B | 4 | 4 | 4
I have another table where the records are unique and looks like the following:
col1 | col2
A | 2
B | 1
I want to query Table 1 in such a way that I filter out only n number of records for each category in Table 1 based on the value the categories have in Table 2.
Based on Table 2 I need to extract 2 records for A and 1 record for B. I need the resulting queried table to look like the following:
col1 | col2 | col3 | col4
A | 2 | 5 | 3
A | 1 | 2 | 4
B | 3 | 1 | 2
The choice of the records are made based on col4 sorted in ascending order. I am currently tring to do this on BigQuery.
You can use row_number() and join:
select t1.col1, t1.col2, t1.col3, t1.col4
from (select t1.*, row_number() over (partition by col1 order by col4) as seqnum
from table1 t1
) t1 join
table2 t2
on t2.col1 = t1.col1 and t1.seqnum <= t2.col2
order by t1.col1, t1.col4;
Below is for BigQuery Standard SQL
#standardSQL
SELECT t.*
FROM (
SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt
you can test / play with it using dummy data from your question as below
#standardSQL
WITH `table1` AS (
SELECT 'A' col1, 1 col2, 2 col3, 4 col4 UNION ALL
SELECT 'A', 2, 5, 3 UNION ALL
SELECT 'A', 5, 1, 6 UNION ALL
SELECT 'B', 3, 1, 2 UNION ALL
SELECT 'B', 4, 4, 4
), `table2` AS (
SELECT 'A' col1, 2 col2 UNION ALL
SELECT 'B', 1
)
SELECT t.*
FROM (
SELECT ARRAY_AGG(t1 ORDER BY t1.col4) arr, MIN(t2.col2) cnt
FROM table1 t1 JOIN table2 t2 ON t1.col1 = t2.col1
GROUP BY t1.col1
), UNNEST(arr) t WITH OFFSET num
WHERE num < cnt
with output as
Row col1 col2 col3 col4
1 A 2 5 3
2 A 1 2 4
3 B 3 1 2

Get column name where values differ between two rows

I have a table with lots of columns. Sometimes I need to find differences between two rows. I can do it just by scrolling through screen but it is dull. I'm looking for a query that will do this for me, something like
SELECT columns_for_id_1 != columns_for_id_2
FROM xyz
WHERE id in (1,2)
Table:
id col1 col2 col3 col4
1 qqq www eee rrr
2 qqq www XXX rrr
Result:
"Different columns: id, col3"
Is there a simple way to do this?
UPDATE
Another example as wanted:
What I have (table has more than 50 column, not only 7):
Id| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
==============================================
1 | aaa | bbb | ccc | ddd | eee | fff |
----------------------------------------------
2 | aaa | XXX | ccc | YYY | eee | fff |
Query:
SELECT *
FROM table
WHERE Id = 1 OR Id = 2
AND "columns value differs"
Query result: "Id, Col2, Col4"
OR something like:
Id|Col2 |Col4 |
===============
1 |bbb |ddd |
---------------
2 |XXX |YYY |
Right now I have to scroll through more than 50 columns to see if rows are the same, it's not efficient and prone to mistakes. I don't want any long query like
SELECT (COMPARE Id1.Col1 with Id2.Col1 if different then print "Col1 differs", COMPARE Id1.Col2 with Id2.Col2...) because I will compare the rows myself faster ;)
Something like this:
SELECT col, MIN(VAL) AS val1, MAX(val) AS val2
FROM (
SELECT id, val, col
FROM (
SELECT id, [col1], [col2], [col3], [col4]
FROM mytable
WHERE id IN (1,2)) AS src
UNPIVOT (
val FOR col IN ([col1], [col2], [col3], [col4])) AS unpvt ) AS t
GROUP BY col
HAVING MIN(val) <> MAX(val)
Output:
col val1 val2
================
col3 eee XXX
Try this simple query, may be help you
SELECT (CASE WHEN a.col1 <> b.col1 THEN 'Different Col1'
WHEN a.col2 <> b.col2 THEN 'Different Col2'
...
ELSE 'No Different' END) --You can add only required columns here
FROM xyz AS a
INNER JOIN xyz AS b ON b.id = 1 --First Record
WHERE a.id = 2 --Second record to compare
If you are on SQL Server 2012 then you can also use LEAD/LAG windowed funuction to do this. MSDN Reference is here - https://msdn.microsoft.com/en-us/library/hh213125.aspx
select
id,
col1,
col2,
col3,
col4,
stuff(diff_cols,len(diff_cols-1),1,'') diff_cols
from
(
SELECT
id,
col1,
col2,
col3,
col4,
concat
(
'Different columns:',
CASE
WHEN LEAD(id, 1,0) OVER (ORDER BY id) <> id THEN 'id,'
WHEN LEAD(col1, 1,0) OVER (ORDER BY id) <> col1 THEN 'col1,'
WHEN LEAD(col2, 1,0) OVER (ORDER BY id) <> col2 THEN 'col2,'
WHEN LEAD(col3, 1,0) OVER (ORDER BY id) <> col3 THEN 'col3,'
WHEN LEAD(col4, 1,0) OVER (ORDER BY id) <> col4 THEN 'col4,'
) diff_cols
FROM xyz
) tmp

Grouping multiple rows from a table into column

I have two table as below.
Table 1
+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 |
+------+------+------+------+
| 1 | 1.5 | 1.5 | 2.5 |
| 1 | 2.5 | 3.5 | 1.5 |
+------+------+------+------+
Table 2
+------+--------+
| Col1 | Col2 |
+------+--------+
| 1 | 12345 |
| 1 | 678910 |
+------+--------+
I want the result as below.
+------+------+------+------+-------+--------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+-------+--------+
| 1 | 4 | 5 | 4 | 12345 | 678910 |
+------+------+------+------+-------+--------+
Here Col2, Col3 and Col4 is the aggregate of value from Col2,3,4 in Table 1. And rows from Table 2 are transposed to Columns in the result.
I use Oracle 11G and tried the PIVOT option. But I couldn't aggregate values from Column 2,3,4 in Table 1.
Is there any function available in Oracle which provides direct solution without any dirty work around?
Thanks in advance.
Since you will always have only 2 records in second table simple grouping and join will do.
Since I dont have tables I am using CTEs and Inline views
with cte1 as (
select 1 as col1 , 1.5 as col2 , 1.5 as col3, 2.5 as col4 from dual
union all
select 1 , 2.5 , 3.5 , 1.5 fom dual
) ,
cte2 as (
select 1 as col1 , 12345 as col2 fom dual
union all
select 1,678910 fom dual )
select* from(
(select col1,sum(col2) as col2 , sum(col3) as col3,sum(col4) as col4
from cte1 group by col1) as x
inner join
(select col1 ,min(col2) as col5 ,max(col2) as col from cte2
group by col1
) as y
on x.col1=y.col1)
with
mytab1 as (select col1, col2, col3, col4, 0 col5, 0 col6 from tab1),
mytab2 as
(
select
col1, 0 col2, 0 col3, 0 col4, "1_COL2" col5, "2_COL2" col6
from
(
select
row_number() over (partition by col1 order by rowid) rn, col1, col2
from
tab2
)
pivot
(
max(col2) col2
for rn in (1, 2)
)
)
select
col1,
sum(col2) col2,
sum(col3) col3,
sum(col4) col4,
sum(col5) col5,
sum(col6) col6
from
(
select * from mytab1 union all select * from mytab2
)
group by
col1
Hello You can use the below query
with t1 (col1,col2,col3,col4)
as
(
select 1,1.5,1.5,2.5 from dual
union
select 1,2.5,3.5,1.5 from dual
),
t2 (col1,col2)
as
(
select 1,12345 from dual
union
select 1,678910 from dual
)
select * from
(
select col1
,max(decode(col2,12345,12345)) as co5
,max(decode(col2,678910,678910)) as col6
from t2
group by col1
) a
inner join
(
select col1,sum(col2) as col2,sum(col3) as col3,sum(col4) as col4
from t1
group by col1
) b
on a.col1=b.col1
Pivot only the second table. You can then do GROUP BY on the nested UNION ALL between table1 (col5 and col6 are null for subsequent group by) and pivoted table2 (col2, col3, col4 are null for subsequent group by).