Left Join with Distinct Clause

Left Join with Distinct Clause - sql

Below is my insert query.
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT Q.CUSTNO, NVL(((NVL(P.RATING,0) * '10.0')/100),0) AS RATING
FROM TB_ACCOUNTS Q LEFT JOIN TB_CUSTPARAM P
ON P.TEXT_PARAM IN (SELECT DISTINCT PRDCD FROM TB_ACCOUNTS)
AND P.TABLENAME='TB_ACCOUNTS' AND P.COLUMNNAME='PRDCD';
In the previous version of the query, P.TEXT_PARAM=Q.PRDCD but during insert to TEMP_CUSTPARAM due to violation of unique constraint on CUSTNO.
The insert query is taking ages to complete. Would like to know how to use distinct with LEFT JOIN statement.
Thanks.

SELECT T1.Col1, T2.Col2 FROM Table1 T1
Left JOIN
(SELECT Distinct Col1, Col2 FROM Table2
) T2 ON T2.Id = T1.Id

You are missing criteria to join TB_ACCOUNTS records with their related TB_ACCOUNTS/PRDCD TB_CUSTPARAM records and thus cross join them instead. I guess you want:
INSERT INTO /*+ APPEND*/ TEMP_CUSTPARAM(CUSTNO, RATING)
SELECT DISTINCT
Q.CUSTNO,
NVL(P.RATING, 0) * 0.1 AS RATING
FROM TB_ACCOUNTS Q
LEFT JOIN TB_CUSTPARAM P ON P.TEXT_PARAM = Q.PRDCD
AND P.TABLENAME = 'TB_ACCOUNTS'
AND P.COLUMNNAME = 'PRDCD';

If the query is taking ages to complete, check first the execution plan. You may find some hints here - If you see a cartesian join on two non-trivial tables, probably the query should be revisited.
Than ask yourself what is the expectation of the query.
Do you expect one record per CUSTNO? Or can a customer have more than one rating?
One reting per customer could have sense from the point of business. To get unique customer list with rating
1) first get a UNIQUE CUSTNO - note that this is in generel not done with a DISTINCT clause, but if tehre are more rows per customer with a filter predicate, e.g. selecting the most recent row.
2) than join to the rating table

Related

Count records only from left side of a LEFT JOIN

I'm building an Access query with a LEFT JOIN that, among other things, counts the number of unique sampleIDs present in the left table of the JOIN, and counts the aggregate number of specimens (bugs) present in the right table of the JOIN, both for a given group of samples (TripID). Here's the pertinent chunk of SQL code:
SELECT DISTINCT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t2.C1 + t2.C2)
AS Bugs FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
The trouble I'm having is that COUNT(t1.SampleID) is not giving me my desired result. My desired result is the number of unique SampleIDs present in t1 for a given TripID (let's say 7). Instead, what I get seems to be the number of rows in t2 for which the SampleID is contained within the given TripID group (let's say 77). How can I change this SQL query to get the desired number (7, not 77)?

just take the aggregate sum first on t2, then join with t2 like this:
SELECT t1.TripID, COUNT(t1.SampleID) AS Samples, SUM(t3.Bugs) as Bugs
FROM tbl_Sample AS t1
LEFT Join (
SELECT t2.SampleID, SUM(t2.C1 + t2.C2) as Bugs
FROM tbl_Bugs as t2
GROUP BY SampleID) AS t3 ON t1.SampleID = t3.SampleID
GROUP BY t1.TripID

This is a tricky query, because you have different hierarchies. Here is one method:
select s.tripid, count(*) as numsamples,
(select sum(b2.c1 + b2.c2)
from bugs b join
tbl_sample s2
on s2.sampleid = b.sampleid
where s2.tripid = s.tripid
) as numbugs
from tbl_sample s
group by s.tripid

You included a DISTINCT with a Group By. This is removing duplicates twice, which is unnecessarily complex. You can get rid of the DISTINCT.
I would have the count separate from what is going on in the group by.
SELECT dT.TripID
,(SELECT COUNT(DISTINCT(SampleID))
FROM Bugs B
WHERE B.TripID = dT.TripID
) AS [Samples]
,dT.Bugs
FROM (
SELECT t1.TripID
,SUM(t2.C1 + t2.C2) AS Bugs
FROM tbl_Sample AS t1
LEFT JOIN tbl_Bugs AS t2 ON t1.SampleID = t2.SampleID
GROUP BY t1.TripID
) AS dT

Count rows after joining three tables in PostgreSQL

Suppose I have three tables in PostgreSQL:
table1 - id1, a_id, updated_by_id
table2 - id2, a_id, updated_by_id
Users - id, display_name
Suppose I am using the using the following query:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u1.id=100;
I get 50 as count.
Whereas with:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u2.id=100;
I get only 25 as count.
What is my mistake in the second query? What can I do to get the same count?
My requirement is that there is a single user table, referenced by multiple tables. I want to take the complete list of users and get the count of ids from different tables.
But the table on which I have joined alone returns the proper count but rest of them don't return the proper count. Can anybody suggest a way to modify my second query to get the proper count?

To simplify your logic, aggregate first, join later.
Guessing missing details, this query would give you the exact count, how many times each user was referenced in table1 and table2 respectively for all users:
SELECT *
FROM users u
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t1_ct
FROM table1
GROUP BY 1
) t1 USING (id)
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t2_ct
FROM table2
GROUP BY 1
) t2 USING (id);
In particular, avoid multiple 1-n relationships multiplying each other when joined together:
Two SQL LEFT JOINS produce incorrect result
To retrieve a single or few users only, LATERAL joins will be faster (Postgres 9.3+):
SELECT *
FROM users u
LEFT JOIN LATERAL (
SELECT count(*) AS t1_ct
FROM table1
WHERE updated_by_id = u.id
) ON true
LEFT JOIN LATERAL (
SELECT count(*) AS t2_ct
FROM table2
WHERE updated_by_id = u.id
) ON true
WHERE u.id = 100;
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Explain perceived difference
The particular mismatch you report is due to the specifics of a FULL OUTER JOIN:
First, an inner join is performed. Then, for each row in T1 that does
not satisfy the join condition with any row in T2, a joined row is
added with null values in columns of T2. Also, for each row of T2 that
does not satisfy the join condition with any row in T1, a joined row
with null values in the columns of T1 is added.
So you get NULL values appended on the respective other side for missing matches. count() does not count NULL values. So you can get a different result depending on whether you filter on u1.id=100 or u2.id=100.
This is just to explain, you don't need a FULL JOIN here. Use the presented alternatives instead.

Counts for distinct values in different tables where columns are common to separate tables

I have no idea if that title conveys what I want it to.
I have two tables containing phone records (one for each account) and I'd like to get call counts for the numbers that are common to each account. In other words:
Table 1
Number ...
8675309
8675309
8675310
8675310
8675312
Table 2
Number ...
8675309
8675309
8675309
8675310
8675311
Querying with something like:
SELECT DISTINCT table1.number, COUNT(table1.number), COUNT(table2.number) FROM table1, table2 WHERE table1.number = table2.number GROUP BY table1.number
would hopefully produce:
8675309|2|3
8675310|2|1
Instead, it currently produces something like:
8675309|6|6
8675310|2|2
It appears to be multiplying the count from each table. Presumably, this is because I'm not joining the tables the way I should for this goal. Or because by the time I ask for COUNT(table1.number) the tables have already been joined in some multiplicative way. Should I not be doing a JOIN and instead something that would read like: "where table2.number CONTAINS(table1.number)"?
Any tips?

One way is with subqueries:
SELECT t1.number, t1.table1Count, t2.table2Count
from (select number, count(*) table1Count
from table1
group by number) t1
inner join (select number, count(*) table2Count
from table2
group by number) t2
on t2.number = t1.number
This assumes that you only want to list numbers that appear in both tables. If you want to list all numbers that appear in one table and optionally the other, you'd use a left or right outer join; if you wanted all numbers that appeared in either or both tables, you'd use a full outer join.
Another and potentially more efficient way requires the presence of a single column that uniquely identifies each row in each table:
SELECT
t1.number
,count(distinct t1.PrimaryKeyValue) table1Count
,count(distinct t2.PrimaryKeyValue) table2Count
from table1 t1
inner join table2 t2
on t2.number = t1.number
group by t1.number
This makes the same assumptions as before, and can also be adjusted modified via outer joins.

One way is to use a couple of derived tables to compute your counts separately and then join them to produce your final summary:
select t1.number, t1.count1, t2.count2
from (select number, count(number) as count1 from table1 group by number) as t1
join (select number, count(number) as count2 from table2 group by number) as t2
on t1.number = t2.number
There are probably other ways but that should work and it is the first thing that came to mind.
You're getting your "multiplicative" effect pretty much for the reasons you suspect. If you have this:
table1(id,x) table2(id,x)
------------ ------------
1, a 4, a
2, a 5, a
3, b 6, b
Then joining them on x will give you this:
1,a, 4,a
1,a, 5,a
2,a, 4,a
2,a, 5,a
...
Usually you could use a GROUP BY to sort out the duplicates but you can't do that because it would mess up your per-table counts.

Try this:
select tab1.number,tab1.num1,tab2.num2
from
(SELECT number, COUNT(number) as num1 from table1 group by number) as tab1
left join
(SELECT number, COUNT(number) as num2 from table2 group by number) as tab2
on tab1.number = tab2.number

SQL query to limit number of rows having distinct values

Is there a way in SQL to use a query that is equivalent to the following:
select * from table1, table2 where some_join_condition
and some_other_condition and count(distinct(table1.id)) < some_number;
Let us say table1 is an employee table. Then a join will cause data about a single employee to be spread across multiple rows. I want to limit the number of distinct employees returned to some number. A condition on row number or something similar will not be sufficient in this case.
So what is the best way to get the same effect the same output as intended by the above query?

select *
from (select * from employee where rownum < some_number and some_id_filter), table2
where some_join_condition and some_other_condition;

This will work for nearly all DBs
SELECT *
FROM table1 t1
INNER JOIN table2 t2
ON some_join_condition
AND some_other_condition
INNER JOIN (
SELECT t1.id
FROM table1 t1
HAVING
count(t1.ID) > someNumber
) on t1.id = t1.id
Some DBs have special syntax to make this a little bit eaiser.

I may not have a full understanding of what you're trying to accomplish, but lets say you're trying to get it down to 1 row per employee, but each join is causing multiple rows per employee and grouping by employee name and other fields is still not unique enough to get it down to a single row, then you can try using ranking and partitioning and then select the rank you prefer for each employee partition.
See example : http://msdn.microsoft.com/en-us/library/ms176102.aspx

Getting distinct rows from a left outer join

I am building an application which dynamically generates sql to search for rows of a particular Table (this is the main domain class, like an Employee).
There are three tables Table1, Table2 and Table1Table2Map.
Table1 has a many to many relationship with Table2, and is mapped through Table1Table2Map table. But since Table1 is my main table the relationship is virtually like a one to many.
My app generates a sql which basically gives a result set containing rows from all these tables. The select clause and joins dont change whereas the where clause is generated based on user interaction. In any case I dont want duplicate rows of Table1 in my result set as it is the main table for result display. Right now the query that is getting generated is like this:
select distinct Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
For simplicity I have excluded the where clause. The problem is when there are multiple rows in Table2 for Table1 even though I have said distinct of Table1.Id the result set has duplicate rows of Table1 as it has to select all the matching rows in Table2.
To elaborate more, consider that for a row in Table1 with Id = 1 there are two rows in Table1Table2Map (1, 1) and (1, 2) mapping Table1 to two rows in Table2 with ids 1, 2. The above mentioned query returns duplicate rows for this case. Now I want the query to return Table1 row with Id 1 only once. This is because there is only one row in Table2 that is like an active value for the corresponding entry in Table1 (this information is in Mapping table).
Is there a way I can avoid getting duplicate rows of Table1.
I think there is some basic problem in the way I am trying to solve the problem, but I am not able to find out what it is. Thanks in advance.

Try:
left outer join (select distinct YOUR_COLUMNS_HERE ...) SUBQUERY_ALIAS on ...
In other words, don't join directly against the table, join against a sub-query that limits the rows you join against.

You can use GROUP BY on Table1.Id ,and that will get rid off the extra rows. You wouldn't need to worry about any mechanics on join side.
I came up with this solution in a huge query and it this solution didnt effect the query time much.
NOTE : I'm answering this question 3 years after its been asked but this may help someone i believe.

You can re-write your left joins to be outer applies, so that you can use a top 1 and an order by as follows:
select Table1.Id as Id, Table1.Name, Table2.Description
from Table1
outer apply (
select top 1 *
from Table1Table2Map
where (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
order by somethingCol
) t1t2
outer apply (
select top 1 *
from Table2
where (Table2.Id = Table1Table2Map.Table2Id)
) t2;
Note that an outer apply without a "top" or an "order by" is exactly equivalent to a left outer join, it just gives you a little more control. (cross apply is equivalent to an inner join).
You can also do something similar using the row_number() function:
select * from (
select distinct Table1.Id as Id, Table1.Name, Table2.Description,
rowNum = row_number() over ( partition by table1.id order by something )
from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id)
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)
) x
where rowNum = 1;
Most of this doesn't apply if the IsActive flag can narrow down your other tables to one row, but they might come in useful for you.

To elaborate on one point: you said that there is only one "active" row in Table2 per row in Table1. Is that row not marked as active such that you could put it in the where clause? Or is there some magic in the dynamic conditions supplied by the user that determines what's active and what isn't.
If you don't need to select anything from Table2 the solution is relatively simply in that you can use the EXISTS function but since you've put TAble2.Description in the clause I'll assume that's not the case.
Basically what separates the relevant rows in Table2 from the irrelevant ones? Is it an active flag or a dynamic condition? The first row? That's really how you should be removing duplicates.
DISTINCT clauses tend to be overused. That may not be the case here but it sounds like it's possible that you're trying to hack out the results you want with DISTINCT rather than solving the real problem, which is a fairly common problem.

You have to include activity clause into your join (and no need for distinct):
select Table1.Id as Id, Table1.Name, Table2.Description from Table1
left outer join Table1Table2Map on (Table1Table2Map.Table1Id = Table1.Id) and Table1Table2Map.IsActive = 1
left outer join Table2 on (Table2.Id = Table1Table2Map.Table2Id)

If you want to display multiple rows from table2 you will have duplicate data from table1 displayed. If you wanted to you could use an aggregate function (IE Max, Min) on table2, this would eliminate the duplicate rows from table1, but would also hide some of the data from table2.
See also my answer on question #70161 for additional explanation

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas