How to avoid DISTINCT in a query that joins multiple tables? - sql

I want to avoid using DISTINCT and produce the same result for queries that join multiple tables.
Without DISTINCT, it produces the same row multiple times.
I already tried looking up how to avoid DISTINCT, but nothing seems to work for me, seemingly because my table is more complicated and joining multiple tables at the same time.
SELECT DISTINCT C.COL3, B.COL1, A.COL2, A.COL4, B.COL5 FROM C
INNER JOIN B
ON B.COL1 = C.COL1
INNER JOIN A
ON B.COL2 = A.COL2
ORDER BY C.COL3 ASC;
I know I have to use GROUP BY somehow, but I just can't wrap my head around it...

You can just group by all the columns (without having ay aggregation):
SELECT
C.COL3, B.COL1, A.COL2, A.COL4, B.COL5
FROM C
JOIN B ON B.COL1 = C.COL1
JOIN A ON B.COL2 = A.COL2
GROUP BY C.COL3, B.COL1, A.COL2, A.COL4, B.COL5 -- group by all selected columns
ORDER BY C.COL3 ASC
If you then wanted to aggregate over the de-duped rows of the above query, use it as a subquery. For example, to SUM(B.COL5) of the de-duped rows:
SELECT
COL3, COL1, COL2, COL4, SUM(COL5)
FROM (
SELECT
C.COL3, B.COL1, A.COL2, A.COL4, B.COL5
FROM C
JOIN B ON B.COL1 = C.COL1
JOIN A ON B.COL2 = A.COL2
GROUP BY C.COL3, B.COL1, A.COL2, A.COL4, B.COL5
) deduped
GROUP BY COL3, COL1, COL2, COL4
ORDER BY COL3 ASC

Are you getting multiple duplicate rows of the same data if you do not use DISTINCT? If so, this query worked for me when I was joining multiple asp net tables in order to show the user info, plus the roles within the site they are assigned to. Hopefully this can help you.
SELECT AspNetUsers.Id, AspNetRoles.Name as SiteRole,
AspNetRoles.ID as RoleID, AspNetUsers.UserName,
AspNetUsers.Email FROM AspNetUserRoles INNER JOIN
AspNetUsers ON AspNetUserRoles.UserId = AspNetUsers.Id INNER JOIN
AspNetRoles ON AspNetUserRoles.RoleId = AspNetRoles.Id

You can use row_number() partition by [column you want to be distinct].
select *
from (select c.col3, b.col1, a.col2, a.col4, b.col5
, row_number() over (partition by c.col1 order by c.col3) as rn
from c
inner join b on b.col1 = c.col1
inner join a on a.col2 = b.col2) t1
where t1.rn = 1
order by t1.col3

SELECT COL3, COL1, SUM(COL5)
FROM
(
SELECT DISTINCT C.COL3, B.COL1, A.COL2, A.COL4, B.COL5 FROM C
INNER JOIN B
ON B.COL1 = C.COL1
INNER JOIN A
ON B.COL2 = A.COL2
) X
GROUP BY COL3, COL1
ORDER BY COL3, COL1

Related

Show Rows That Are Different Between Two Tables - MS Access

I have been working on trying to convert the following SQL-Server code to achieve a similar result in MS Access.
WITH TableA(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,86 UNION ALL
SELECT 'Cat',128,92),
TableB(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,105 UNION ALL
SELECT 'Lizard',83,NULL)
SELECT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON A.Col1 = B.Col1
AND A.Col2 = B.Col2
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
/*Exclude identical rows*/
WHERE EXISTS (SELECT A.*
EXCEPT
SELECT B.*)
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
Gives
what Col1 Col2 Col3
------ ------ ----------- -----------
TableA Cat 27 86
TableB Cat 27 105
TableA Cat 128 92
TableB Lizard 83 NULL
So far I have been able to convert get replication of the FULL OUTER JOIN using the following code, but I have been unable to replicate unpivoting the joint rows (CROSS APPLY).
(SELECT *
FROM TableA AA
INNER JOIN TableB BB ON AA.Col1 = BB.Col1
UNION ALL
SELECT *
FROM TableA AA
LEFT JOIN TableB BB ON AA.Col1 = BB.Col1
WHERE BB.[IP Number] IS NULL
UNION ALL
SELECT *
FROM TableA AA
RIGHT JOIN TableB BB ON AA.Col1 = BB.Col1
WHERE AA.Col1 IS NULL
)
I could use some help achieving the same result in a MS-Access query.
From what I can gather, you have two tables that have unique rows. You want to return rows that are present in one table but not the other.
I would suggest aggregation and HAVING for this -- in either database:
SELECT col1, col2, col3
FROM ((SELECT col1, col2, col3 FROM TableA) UNION ALL
(SELECT col1, col2, col3 FROM TableB)
) as ab
GROUP BY col1, col2, col3
HAVING COUNT(*) = 1;
Or alternatively, two NOT EXISTS clauses:
SELECT a.*
FROM TableA as a
WHERE NOT EXISTS (SELECT 1
FROM TableB as b
WHERE (a.col1 = b.col1 OR a.col1 IS NULL AND b.col1 IS NULL) AND
(a.col2 = b.col2 OR a.col2 IS NULL AND b.col2 IS NULL) AND
(a.col3 = b.col3 OR a.col3 IS NULL AND b.col3 IS NULL)
)
UNION ALL
SELECT b.*
FROM TableB as b
WHERE NOT EXISTS (SELECT 1
FROM TableA as a
WHERE (a.col1 = b.col1 OR a.col1 IS NULL AND b.col1 IS NULL) AND
(a.col2 = b.col2 OR a.col2 IS NULL AND b.col2 IS NULL) AND
(a.col3 = b.col3 OR a.col3 IS NULL AND b.col3 IS NULL)
);
Here is a db<>fiddle that uses SQL Server, but the syntax should be basically the same in MS Access.

How to combine CASE, INNER JOIN, and GROUP BY

I'm dealing with an issue; I need to join two tables, group by their ID's and use CASE statement to compare values from those 2 tables. I have been trying to use a temp table and then SELECT from it.
Purpose is to test if values in CORE correspond to values in MART.
Ideally I want to have one query, where I will see column CORE_X_MART and can use where statement on it.
Group by is essential because otherwise I have ID duplicates in the temporary table.
My code:
drop table if exists #tNDWH_4034
select a.ID, b.ID, a.col2 as MART_Value, b.col2 as CORE_Value,
case when a.col2 = b.col2 then 'Match' else 'Mismatch' end as CORE_X_MART
into #tNDWH_4034
from tab1 as a
inner join tab2 as b on a.ID = b.ID
where a.CurrentFlag = 1
group by a.ID, b.ID;
select * from #tNDWH_4034
where CORE_X_MART = 'Mismatch';
I'm using SQL server.
You don't need temp table. You can go for derived table to achieve the purpose, to have them in a single query.
SELECT * FROM
(select a.ID, b.ID, a.col2 as MART_Value, b.col2 as CORE_Value,
case when a.col2 = b.col2 then 'Match' else 'Mismatch' end as CORE_X_MART
from tab1 as a
inner join tab2 as b
on a.ID = b.ID
where a.CurrentFlag = 1
group by a.ID, b.ID) as t
WHERE t.CORE_X_MART = 'Mismatch'
maybe:
select a.ID, b.ID, a.col2 as MART_Value, b.col2 as CORE_Value,
case when a.col2 = b.col2 then 'Match' else 'Mismatch' end as CORE_X_MART
into #tNDWH_4034
from tab1 as a
inner join tab2 as b on (a.ID = b.ID)
where a.CurrentFlag = 1
group by a.ID, b.ID, a.col2, b.col2
,case when a.col2 = b.col2 then 'Match' else 'Mismatch' end --this line is probably not required
You don't need the temp table, group by, or the case either. You are looking for mismatches only so just use the not equal to operator <> to filter your results.
select distinct a.ID, a.col2 as MART_Value, b.col2 as CORE_Value
from tab1 as a
inner join tab2 as b on a.ID = b.ID
where a.CurrentFlag = 1
and a.col2 <> b.col2

Teradata wildcard join with modifying table instead of conditions

I have tablea and tableb, that I need to join.
It can happen that in b.col2, b.col3 can be value '%', which should be something like wildcard, meaninng, that in this case we can join value of b.col2 on any value of a.col2 or value b.col3 on any value a.col3.
One solution would look like this:
select a.*, b.col4, b.col5
from tablea a
left join (select col1, col2, col3, col4, col5 tableb) b
on b.col1=a.col1 and
(b.col2 = a.col2 or b.col2 = '%') and
(b.col3 = a.col3 or b.col3 = '%')
qualify 1 = row_number() over (partition by a.id order by (case when b.col2 = '%' then 2 else 1 end), (case when b.col3 = '%' then 2, else 1 end))
My problem is that because of later use in different app, I can only use simple join conditions like:
b.col1 = a.col1 and
b.col2 = a.col2 and
b.col3 = a.col3
My question is, if there is a way, how to achieve the same result as in the first solution, but using 'simple' join conditions (a.col2=b.col2) and just making changes in selection of tableb?

Teradata wildcard in join definition

I have joined tables like bellow:
select a.*, b.col4, b.col5 from table a
inner join table b
on a.col2=b.col2
and a.col3=b.col3
It can happen that in b.col2, b.col3 can be value '*', which should be something like wildcard, meaninng, that in this case we can join value of b.col2 on any value of a.col2 or value b.col3 on any value a.col3.
Would you please help me define it?
It sounds like you have a default. One method is multiple comparison:
select a.*,
coalesce(b.col4, bdef3.col4, bdef2.col4, bdef.col4) as col4, b.col5
coalesce(b.col5, bdef3.col5, bdef2.col5, bdef.col5) as col5
from tablea a left join
tableb b
on b.col2 = a.col2 and b.col3 = a.col3 left join
tableb bdef3
on b.col2 = a.col2 and b.col3 = '*' left join
tableb bdef2
on b.col2 = '*' and b.col3 = a.col3 left join
tableb bdef
on b.col2 = '*' and b.col3 = '*';
You may want a where clause if you want to guarantee some match:
where (b.col2 is not null or bdef3.col2 is not null or bdef2.col2 is not null or bdef.col2 is not null)
I think the above is more efficient, but you can express this more succinctly as:
select a.*, b.col4, b.col5
from tablea a left join
tableb b
on (b.col2 = a.col2 or b.col2 = '*') and
(b.col3 = a.col3 or b.col3 = '*')
qualify 1 = row_number() over (partition by a.id order by (case when b.col2 = '*' then 2 else 1 end), (case when b.col3 = '*' then 2, else 1 end))

Dynamic SQL in WHERE clause

SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A, TABLE B
WHERE B.COL1 = A.COL1
AND B.COL2 = A.COL2
AND B.COL3 - A.COL3
AND B.COL4 = A.COL4
Now I want to tune the SQL query, that whenever any of the Columns in Table B has field value 'ALL' the where clause will not come into picture.
i.e. When it has a distinct value it will match with both the tables, when the field value is 'ALL' then to exclude from the where clause.
Alternatively,
I Need B.COL1= A.COL1 (When B.COL1 <> 'ALL')
Else NO WHERE clause with B.Col1 = A.Col1 (When B.COL1 = 'ALL')
Use OR wisely:
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A, TABLE B
WHERE (B.COL1 = A.COL1 OR B.COL1='ALL')
AND (B.COL2 = A.COL2 OR B.COL2='ALL')
...
I would also suggest learning JOIN syntax.
Hi, You can use case statement to have a condition in where clause,
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A, TABLE B
WHERE B.COL1 =
CASE
WHEN B.COL1 <> 'ALL' THEN A.COL1
ELSE NULL
END
AND B.COL2 = A.COL2
AND B.COL3 - A.COL3
AND B.COL4 = A.COL4
You can achieve this with just IN:
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A, TABLE B
WHERE B.COL1 IN (A.COL1, 'ALL')
AND B.COL2 IN (A.COL2, 'ALL')
AND B.COL3 IN (A.COL3, 'ALL')
AND B.COL4 IN (A.COL4, 'ALL')
What is actually going on may be more clear with a more verbose version using AND/OR, but the logic is exactly the same
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A, TABLE B
WHERE (B.COL1 = A.COL1 OR B.COL1 = 'ALL')
AND (B.COL2 = A.COL2 OR B.COL2 = 'ALL')
AND (B.COL3 = A.COL3 OR B.COL3 = 'ALL')
AND (B.COL4 = A.COL4 OR B.COL4 = 'ALL')
Simple solution:
SELECT DISTINCT A.COL1, A.COL2, A.COL3, A.COL4
FROM TABLE A
INNER JOIN TABLE B ON
(B.COL1 = A.COL1 AND B.COL2 = A.COL2 AND B.COL3 = A.COL3 AND B.COL4 = A.COL4)
OR
('ALL' IN (B.COL1, B.COL2, B.COL3, B.COL4))
but, if you work with large tables that complex filtering could slow down very much the execution, so I suggest to use a different syntax for complex JOINs
SELECT DISTINCT *
FROM (
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM A
INNER JOIN B
ON (B.COL1 = A.COL1 AND B.COL2 = A.COL2 AND B.COL3 = A.COL3 AND B.COL4 = A.COL4)
) J1
UNION -- ALL ?
SELECT DISTINCT *
FROM (
SELECT A.COL1, A.COL2, A.COL3, A.COL4
FROM A
INNER JOIN B ON ('ALL' IN (B.COL1, B.COL2, B.COL3, B.COL4))
) J2
This one should be much faster than previous one.
Also, I wonder about row duplicates.. with that syntax each row of table A will be added to result as many times as many rows in table B contains 'ALL'
I have added the DISTINCT clause to the SELECT to avoid duplicates (same problem affects UNION operator), so if you need duplicates, remove DISTINCT and use UNION ALL instead of UNION