Removing duplicate records from JOIN in MS Access - sql

My co-worker asked me for help with a query in MS Access that joins three tables. I have confirmed that the order and inner/outer status of the JOIN is what my co-worker wants. (They have three tables, A, B, and C; they want all records from table B plus the matching records from A and C.)
The (sanitized) query is:
SELECT B.ID, B.Date from (A RIGHT JOIN B on A.ID = B.ID) LEFT JOIN C on B.ID = C.ID
GROUP BY B.ID, B.Date
This returns the correct number of rows (about 16000). However, when I change the select and group clauses to
SELECT B.ID, B.Date, A.Time ...
GROUP BY B.ID, B.Date, A.Time
then the query returns duplicate records (the record count is about 19000). How do I improve the query to eliminate the duplicates?
This Stack Overflow answer helped me figure out the GROUP BY clause for table B. I had tried the clause as just GROUP BY B.ID, but got an error message that I hadn't done any aggregation with B.Date.

Is it actually producing duplicate records, or is it now returning multiple records from the same date that have different times? If so, you will need to assess if these are actually duplicate records for your report purpose. If they are, you will want to aggregate the time with something like min(a.time) or max(a.time) in the select clause (to get the earliest or latest instance only) and leave it out of the group by.

Related

Distinct IDs from one table for inner join SQL

I'm trying to take the distinct IDs that appear in table a, filter table b for only these distinct IDs from table a, and present the remaining columns from b. I've tried:
SELECT * FROM
(
SELECT DISTINCT
a.ID,
a.test_group,
b.ch_name,
b.donation_amt
FROM table_a a
INNER JOIN table_b b
ON a.ID=b.ID
ORDER by a.ID;
) t
This doesn't seem to work. This query worked:
SELECT DISTINCT a.ID, a.test_group, b.ch_name, b.donation_amt
FROM table_a a
inner join table_b b
on a.ID = b.ID
order by a.ID
But I'm not entirely sure this is the correct way to go about it. Is this second query only going to take unique combinations of a.ID and a.test_group or does it know to only take distinct values of a.ID which is what I want.
Your first and second query are similar.(just that you can not use ; inside your query) Both will produce the same result.
Even your second query which you think is giving you desired output, can not produce the output what you actually want.
Distinct works on the entire column list of the select clause.
In your case, if for the same a.id there is different a.test_group available then it will have multiple records with same a.id and different a.test_group.

Drop duplicated rows in postgresql

I am querying some data from the database, and my code looks as below
select
a.id
a.party
a.date
a.name
a.revenue
b.company
c.cost
from a
left join b on a.id = b.id
left join c on a.id = c.id
where a.party = 'cat' and a.date > '2000-01-01'
I got a returned table but the table has duplicated rows. Is there anyway I can remove all duplicated rows (meaning the entire row is the same, row 1 = row2, remove row1)
I put select distinct at the top, but then it took forever to run. Not sure if some fundamental programming logic was wrong in this code.
If one row in a id related to two rows in b because b.id is not unique, and both these rows have the same company, your query result will contain duplicate rows. There is nothing wrong with that.
Removing duplicate rows with DISTINCT is expensive for big result sets, because the set has to be sorted.
Ideas to improve performance:
increase work_mem, that makes sorting faster
perhaps you don't need all the result rows, then adding a LIMIT clause will make the query faster
Normal deduplicate process with better performance is using row_number() over(partition by order by) and then rownumber =1.
Select a_id, b_id, name, ...
From
( Select a.id a_id, b.id b_id, b.name name,
row_number() over (partition by a.id,b.id, b.name order by create_date desc) rownum from
A
Join b on a.id=b.id) rs
Where rs.rownum=1
Please note you can partition by, order by any key column(whatever you want as unique). I used create date to pick latest row.
Also note huge number of row can hamper performance but it's better than distinct. Also plss check if you can get distinct first before joining.

SQL filter LEFT TABLE before left join

I have read a number of posts from SO and I understand the differences between filtering in the where clause and on clause. But most of those examples are filtering on the RIGHT table (when using left join). If I have a query such as below:
select * from tableA A left join tableB B on A.ID = B.ID and A.ID = 20
The return values are not what I expected. I would have thought it first filters the left table and fetches only rows with ID = 20 and then do a left join with tableB.
Of course, this should be technically the same as doing:
select * from tableA A left join table B on A.ID = B.ID where A.ID = 20
But I thought the performance would be better if you could filter the table before doing a join. Can someone enlighten me on how this SQL is processed and help me understand this thoroughly.
A left join follows a simple rule. It keeps all the rows in the first table. The values of columns depend on the on clause. If there is no match, then the corresponding table's columns are NULL -- whether the first or second table.
So, for this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID and A.ID = 20;
All the rows in A are in the result set, regardless of whether or not there is a match. When the id is not 20, then the rows and columns are still taken from A. However, the condition is false so the columns in B are NULL. This is a simple rule. It does not depend on whether the conditions are on the first table or the second table.
For this query:
select *
from tableA A left join
tableB B
on A.ID = B.ID
where A.ID = 20;
The from clause keeps all the rows in A. But then the where clause has its effect. And it filters the rows so on only id 20s are in the result set.
When using a left join:
Filter conditions on the first table go in the where clause.
Filter conditions on subsequent tables go in the on clause.
Where you have from tablea, you could put a subquery like from (select x.* from tablea X where x.value=20) TA
Then refer to TA like you did tablea previously.
Likely the query optimizer would do this for you.
Oracle should have a way to show the query plan. Put "Explain plan" before the sql statement. Look at the plan both ways and see what it does.
In your first SQL statement, A.ID=20 is not being joined to anything technically. Joins are used to connect two separate tables together, with the ON statement joining columns by associating them as keys.
WHERE statements allow the filtering of data by reducing the number of rows returned only where that value can be found under that particular column.

Oracle SQL: Selecting all, plus an extra column with a complex query

I have these tables setup:
NOMINATIONS: A table of award nominations
NOMINATION_NOMINEES: A table of users with a FK on NOMINATIONS.ID
One Nomination can be referenced by many nominees via the ID field.
SELECT a.*, COUNT(SELECT all records from NOMINATION_NOMINEES with this ID) AS "b"
FROM NOMINATIONS a
LEFT JOIN NOMINATION_NOMINEES b on a.ID = b.ID
The results would look like:
ID | NOMINATION_DESCRIPTION | ... | NUMBER_NOMINEES
Where NUMBER_NOMINEES is the number of rows in the NOMINATION_NOMINEES table with the current row's ID.
This is a tricky one, we are feeding this into a larger system so I'm hoping to get this in one query with a bunch of subqueries. Implementing subqueries into this has twisted my mind. Anyone have an idea of where to head with this?
I'm sure the above way is not close to a decent approach to this one, but I can't quite wrap my mind around this one.
It can be done with a single correlated sub-query in SELECT clause.
SELECT a.*,
( SELECT COUNT(b.ID) FROM NOMINATION_NOMINEES b WHERE a.ID= b.ID )
FROM NOMINATIONS a
You should be able to use count as an analytic function:
select a.*,
count(b.id) over (partition by b.id)
from nominations a
left join nomination_nominees b on a.id = b.id

multi-table query when there is no record in one table

What should I do if I want to:
For now, there are table A and table B,
A:
id, name, address //the id is unique
B
id, contact, email
Since one person may have more than one contact and email, or have no contact and email(which means no record in table B)
Now I want to count how many records for each id, even 0:
And the result will look like:
id, name, contact_email_total_count
How can I do that(for now the only place I can not figure out is how to count 0 record since there is no record in table B)?
For that case you will want to use a LEFT JOIN, then add an aggregate and a GROUP BY:
select a.id,
a.name,
count(b.id) as contact_email_total_count
from tablea a
left join tableb b
on a.id = b.id
group by a.id, a.name
See SQL Fiddle with Demo
If you need help learning join syntax here is a great visual explanation of joins.
Based on your comment the typical order of execution is as follows:
FROM
ON
JOIN
WHERE
GROUP BY
HAVING
SELECT
ORDER BY
Need to do a left join to maintain the records in table A regardless of B:
PostgreSQL: left outer join syntax
Need to aggregate the count of records in B:
PostgreSQL GROUP BY different from MySQL?