Compare two columns in SQL - sql

I'm new to SQL and have very basic queries in GCP.
Let's consider this table below:
Name
B
C
Arun
1234-5678
1234
Tara
6789 - 7654
6789
Arun
4567
4324
Here, I want to compare column B and C and if they match then give 1 else 0 in column same and else different (which we have to create).
So here the catch:
if column B has 1234-5678 and column C has 1234, then the column should match considering only the number before the "-" in the value.
The output should be :
Name
B
C
same
different
Arun
1234-5678
1234
1
0
Tara
6789 - 7654
6789
1
0
Arun
4567
4324
0
1
Also, I want to count the values of 1 for each values in Name for same and different columns.
So far I've tried this:
SELECT
name,
b,
c ,
if(b = c, 1, 0) as same,
if (b!=c,1,0) as different,
count(same),
count(different)
From Table

using "MySQL" (will work almost same with SQL server as well) here's the possible solution.
Step 1) Setup table
CREATE TABLE Users (
Name varchar(50),
B varchar(50),
C varchar(50)
);
INSERT INTO Users
VALUES
('Arun', '1234-5678', '1234'),
('Tara', '6789-7654', '6789'),
('Arun', '4567', '4324');
Step 2) same & different columns
SELECT
Name, B, C,
CASE WHEN SUBSTRING_INDEX(B, "-", 1) = C THEN 1 ELSE 0 END as same,
CASE WHEN SUBSTRING_INDEX(B, "-", 1) <> C THEN 1 ELSE 0 END as different
FROM
Users
Step 3) Join both results to get total_same & total_different for each user
SELECT
Name,
SUM(CASE WHEN SUBSTRING_INDEX(B, "-", 1) = C THEN 1 ELSE 0 END) as total_same,
SUM(CASE WHEN SUBSTRING_INDEX(B, "-", 1) <> C THEN 1 ELSE 0 END) as total_different
FROM
Users
GROUP BY Name
Reference: SQL Fiddle

For the first step, you will need to SUBSTR the column b.
We start at position 1 and we want 4 characters (only works if there's only 4 characters before the '-').
With table2 as (
select name, b,c, same, different from (select name, b, c, case when (SUBSTR(b,1,4) = c)
then '1' else '0' end as same, case when(SUBSTR(b,1,4)!= c) then '1' else '0' end as different
from Table1
group by name, b,c))
The WITH clause can be used when you have complex query, and if you want to create a temporary table in order to use it after.
The Table2 give you this :
After the WITH clause, you will have the second step, the count of same / different per name :
Select table1.name,count(table2.same+table2.different) as total from table1
join table2 on (table2.name = table1.name and table2.b = table1.b)
group by table1.name;
The output give you the total per name (the name are group by, so in your example you will only have 2 rows, one for Arun with a total of 2 (same + different) and the other one with a total of 1)
So here's the entire code :
with table2 as (
select name, b,c, same, different from (select name, b, c, case when (SUBSTR(b,1,4) = c) then '1' else '0' end as same, case when(SUBSTR(b,1,4)!= c) then '1' else '0' end as different
From Table1
group by name, b,c))
select table1.name, table1.b, table1.c, count(table2.same+table2.different) as total from table1
join table2 on (table2.name = table1.name and table2.b = table1.b)
group by table1.name;

Related

SQL - Get per column count of differences when comparing two tables

I have 2 similar tables as shown below with minor difference between some cells
Table A
Roll_ID
FirstName
LastName
Age
1
AAA
XXX
31
2
BBB
YYY
32
3
CCC
ZZZ
33
Table B
Roll_ID
FirstName
LastName
Age
1
AAA
XXX
35
2
PPP
YYY
36
3
QQQ
WWW
37
I would like to get an output that shows the count of different records on a per-column level.
For example the output of the query for the above scenario should be
Output
Roll_ID
FirstName
LastName
Age
0
2
1
3
For this question we can assume that there will always be one column which will have non-null unique values (or one column which may be primary key). In above example Roll_ID is such a column.
My question is: What would be the most efficient way to get such an output? Is there anything to keep in mind when running such query for tables that may have millions of records from point of view of efficiency?
First you have to join the tables
SELECT *
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
Now just add the counts
SELECT
SUM(CASE WHEN table1.FirstName <> table2.FirstName THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN table1.LastName <> table2.LastName THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN table1.Age <> table2.Age THEN 1 ELSE 0 END) as AgeDiff
FROM table1
JOIN table2 on table1.ROLL_ID = table2.ROLL_ID
If an id not existing in both tables is considered "different" then you would need something like this
SELECT
SUM(CASE WHEN COALESCE(table1.FirstName,'x') <> COALESCE(table2.FirstName,'y') THEN 1 ELSE 0 END) as FirstNameDiff,
SUM(CASE WHEN COALESCE(table1.LastName,'x') <> COALESCE(table2.LastName,'y') THEN 1 ELSE 0 END) as LastNameDiff,
SUM(CASE WHEN COALESCE(table1.Age,-1) <> COALESCE(table2.Age,-2) THEN 1 ELSE 0 END) as AgeDiff
FROM ( SELECT table1.Roll_id FROM table1
UNION
SELECT table2.Roll_id FROM table2
) base
LEFT JOIN table1 on table1.ROLL_ID = base.ROLL_ID
LEFT JOIN table2 on table2.ROLL_ID = base.ROLL_ID
Here we get all the roll_ids and then left join back to the tables. This is much better than a cross join if the roll_id column is indexed.
SELECT SUM(IIF(ISNULL(A.FirstName, '') <> ISNULL(B.FirstName, ''), 1, 0)) AS FirstNameRecordDiff,
SUM(IIF(ISNULL(A.LastName, '') <> ISNULL(B.LastName, ''), 1, 0)) AS LastNameRecordDiff,
SUM(IIF(ISNULL(A.Age, 0) <> ISNULL(B.Age, 0), 1, 0)) AS LastNameRecordDiff
FROM A
FULL OUTER JOIN B
ON B.Roll_ID = A.Roll_ID;
This query intentionally allows nulls to equal, assuming that a lack of data would mean the same thing to the end user.
As written, it would only work on SQL Server. To use it for MySQL or Oracle, the query would vary.

SQL using CASE in SELECT with GROUP BY. Need CASE-value but get row-value

so basicially there is 1 question and 1 problem:
1. question - when I have like 100 columns in a table(and no key or uindex is set) and I want to join or subselect that table with itself, do I really have to write out every column name?
2. problem - the example below shows the 1. question and my actual SQL-statement problem
Example:
A.FIELD1,
(SELECT CASE WHEN B.FIELD2 = 1 THEN B.FIELD3 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD1
(SELECT CASE WHEN B.FIELD2 = 2 THEN B.FIELD4 ELSE null FROM TABLE B WHERE A.* = B.*) AS CASEFIELD2
FROM TABLE A
GROUP BY A.FIELD1
The story is: if I don't put the CASE into its own select statement then I have to put the actual rowname into the GROUP BY and the GROUP BY doesn't group the NULL-value from the CASE but the actual value from the row. And because of that I would have to either join or subselect with all columns, since there is no key and no uindex, or somehow find another solution.
DBServer is DB2.
So now to describing it just with words and no SQL:
I have "order items" which can be divided into "ZD" and "EK" (1 = ZD, 2 = EK) and can be grouped by "distributor". Even though "order items" can have one of two different "departements"(ZD, EK), the fields/rows for "ZD" and "EK" are always both filled. I need the grouping to consider the "departement" and only if the designated "departement" (ZD or EK) is changing, then I want a new group to be created.
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
ZD
EK
TABLE.DISTRIBUTOR
TABLE.DEPARTEMENT
This here worked in the SELECT and ZD, EK in the GROUP BY. Only problem was, even if EK was not the designated DEPARTEMENT, it still opened a new group if it changed, because he was using the real EK value and not the NULL from the CASE, as I was already explaining up top.
And here ladies and gentleman is the solution to the problem:
SELECT
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END) AS ZD,
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END) AS EK,
TABLE.DISTRIBUTOR,
sum(TABLE.SOMETHING) AS SOMETHING,
FROM TABLE
GROUP BY
(CASE WHEN TABLE.DEPARTEMENT = 1 THEN TABLE.ZD ELSE null END),
(CASE WHEN TABLE.DEPARTEMENT = 2 THEN TABLE.EK ELSE null END),
TABLE.DISTRIBUTOR,
TABLE.DEPARTEMENT
#t-clausen.dk: Thank you!
#others: ...
Actually there is a wildcard equality test.
I am not sure why you would group by field1, that would seem impossible in your example. I tried to fit it into your question:
SELECT FIELD1,
CASE WHEN FIELD2 = 1 THEN FIELD3 END AS CASEFIELD1,
CASE WHEN FIELD2 = 2 THEN FIELD4 END AS CASEFIELD2
FROM
(
SELECT * FROM A
INTERSECT
SELECT * FROM B
) C
UNION -- results in a distinct
SELECT
A.FIELD1,
null,
null
FROM
(
SELECT * FROM A
EXCEPT
SELECT * FROM B
) C
This will fail for datatypes that are not comparable
No, there's no wildcard equality test. You'd have to list every field you want tested individually. If you don't want to test each individual field, you could use a hack such as concatenating all the fields, e.g.
WHERE (a.foo + a.bar + a.baz) = (b.foo + b.bar + b.az)
but either way, you're listing all of the fields.
I might tend to solve it something like this
WITH q as
(SELECT
Department
, (CASE WHEN DEPARTEMENT = 1 THEN ZD
WHEN DEPARTEMENT = 2 THEN EK
ELSE null
END) AS GRP
, DISTRIBUTOR
, SOMETHING
FROM mytable
)
SELECT
Department
, Grp
, Distributor
, sum(SOMETHING) AS SumTHING
FROM q
GROUP BY
DEPARTEMENT
, GRP
, DISTRIBUTOR
If you need to find all rows in TableA that match in TableB, how about INTERSECT or INTERSECT DISTINCT?
select * from A
INTERSECT DISTINCT
select * from B
However, if you only want rows from A where the entire row matches the values in a row from B, then why does your sample code take some values from A and others from B? If the row matches on all columns, then that would seem pointless. (Perhaps your question could be explained a bit more fully?)

Return different rows for each column in a row

I have data which is presented in multiple rows and columns with 0 or 1 values. What I'm trying to do is create a unique row for each 1, but there are sometimes multiple 1's in a row. For ex:
**A B C D**
1 0 1 1
0 0 0 1
1 1 0 0
I would like to have return six rows, all in one column like so
**RETURN**
A
C
D
D
A
B
Thanks in advance!
You can do this with a union all statement:
select val
from ((select 'A' as val from t where A = 1) union all
(select 'B' from t where B = 1) union all
(select 'C' from t where C = 1) union all
(select 'D' from t where D = 1)
) t
As a note: I hope you have other columns that you can include in the output. SQL tables are, by definition, not ordered. So, you really have no idea in your example of the original source for any given value.

SQL (TSQL) - Select values in a column where another column is not null?

I will keep this simple- I would like to know if there is a good way to select all the values in a column when it never has a null in another column. For example.
A B
----- -----
1 7
2 7
NULL 7
4 9
1 9
2 9
From the above set I would just want 9 from B and not 7 because 7 has a NULL in A. Obviously I could wrap this as a subquery and USE the IN clause etc. but this is already part of a pretty unique set and am looking to keep this efficient.
I should note that for my purposes this would only be a one-way comparison... I would only be returning values in B and examining A.
I imagine there is an easy way to do this that I am missing, but being in the thick of things I don't see it right now.
You can do something like this:
select *
from t
where t.b not in (select b from t where a is null);
If you want only distinct b values, then you can do:
select b
from t
group by b
having sum(case when a is null then 1 else 0 end) = 0;
And, finally, you could use window functions:
select a, b
from (select t.*,
sum(case when a is null then 1 else 0 end) over (partition by b) as NullCnt
from t
) t
where NullCnt = 0;
The query below will only output one column in the final result. The records are grouped by column B and test if the record is null or not. When the record is null, the value for the group will increment each time by 1. The HAVING clause filters only the group which has a value of 0.
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
If you want to get all the rows from the records, you can use join.
SELECT a.*
FROM TableName a
INNER JOIN
(
SELECT B
FROM TableName
GROUP BY B
HAVING SUM(CASE WHEN A IS NULL THEN 1 ELSE 0 END) = 0
) b ON a.b = b.b

Return records that all match and all records where at least one doesn't match

Given a table of exam results, where 1 == PASS and 0 == FAIL
ID Name Test Result
--------------------
1 John MATH 1
2 John ENGL 1
3 Mary MATH 1
4 Mary PSYC 0
EDIT: assume that the name is unique.
I need to get all records for people who
1) passed all tests
2) failed at least one test
So, the 1st query should return John and all his records, and the 2nd query should return Mary and all her records (including the ones with PASS).
I'm trying to do a LEFT OUTER JOIN with itself and compare counts, but don't seem to get a working query.
SELECT * FROM Results R1
LEFT OUTER JOIN Results R2 on R1.ID=R2.ID and R2.Result=1
WHERE ??? count of rows from R1 is compared to count of non-null rows from R2
This is a "poster-child" exercise for the EXISTS clause:
At leasr one failed result:
select * from Results r
where exists (select * from Results rr where rr.Name=r.Name AND Result=0)
All passed:
select * from Results r
where not exists (select * from Results rr where rr.Name=r.Name AND Result=0)
See how these queries work on your data set at sqlfiddle.com.
All passed
SELECT Name FROM Results R1
GROUP BY NAME
HAVING SUM(RESULT) = COUNT(RESULT)
Some failed
SELECT Name FROM Results R1
GROUP BY NAME
HAVING SUM(RESULT) < COUNT(RESULT)
Hope it helps
Edit
All passed
SELECT Name FROM Results R1
GROUP BY NAME
HAVING SUM(1-RESULT) = 0
Some failed
SELECT Name FROM Results R1
GROUP BY NAME
HAVING SUM(1-RESULT) > 0
(This might run faster)
One way
Select Name,
Case failCount When 0 then 'X' Else '' End PassedAll,
Case failCount When 0 then '' Else 'X' End FailedOneOrMore
From (Select name,
Sum(Case Result when 0 Then 1 Else 0 End) failCount
From Results R
Group By Name) Z
to get all the records, just join to this
Select zz.Name, zz.PassedAll, zz.FailedOneOrMore,
r.Test, r.Result
From (Select Name,
Case failCount When 0 then 'X' Else '' End PassedAll,
Case failCount When 0 then '' Else 'X' End FailedOneOrMore
From (Select name,
Sum(Case Result when 0 Then 1 Else 0 End) failCount
From Results R
Group By Name) Z) ZZ
Left Join Results r On r.Name = zz.Name
This query uses a subquery to return all records (pass & fail) for people who have passed at least one of the Tests:
select * from Results where Name in (select Name from Results where Result = '1' group by Name);
Results exclude those who failed to pass any of the tests.