Joined query producing more results compared to solo query - sql

I am performing the following query which has an inner join against another table.
select count(myTable.name)
from sch2.sample_detail as myTable
inner join sch1.otherTable as otherTable on myTable.name = otherTable.name
where otherTable.is_valid = 1
and myTable.name IS NOT NULL;
This produces a count of 4912304.
The following is a query just on a single table (my table).
SELECT COUNT(myTable.name)
from sch2.sample_detail as myTable
where myTable.name IS NOT NULL;
This produces a count of 2864654.
But how is this possible? Both queries have the clause where myTable.name IS NOT NULL.
Shouldn't the second query produce same results or if not even more cos the second query doesn't have the otherTable.is_valid = 1 clause?
Why does the inner join produces a higher count of result?
Please advice if there is something I should amend in the 1st query, thanks.

Inner, left or cross join can duplicate rows. sch1.otherTable.name is not unique and this causing rows duplication because for each row in left table all corresponding rows from right table are being selected, this is normal join behavior.
To get duplicate names list use this query and decide how to remove duplicated rows: filter or distinct or filter by row_number, etc.
select count(*) cnt,
name
from sch1.otherTable
having count(*)>1
order by cnt desc;
If you need EXISTS (and do not need to select columns from otherTable), use left semi join.
Also subquery with distinct can be used to pre-aggregate name before join and filter:
select count(myTable.name)
from sch2.sample_detail as myTable
LEFT SEMI JOIN (select distinct name from sch1.otherTable otherTable where otherTable.is_valid = 1 ) as otherTable on myTable.name = otherTable.name
where myTable.name IS NOT NULL;

Related

PostgreSQL LEFT OUTER JOIN Conditionals not working

This LEFT OUTER JOIN with several conditionals is not working, it's probably something obvious. It is returning the result of all distinct sid and not performing conditionals at all.
SELECT
count(distinct student_status.sid)
FROM studentcoursedb.student_status
LEFT OUTER JOIN studentcoursedb.student_status AS t0
ON t0.sid = student_status.sid
AND t0.term < student_status.term
AND student_status.major LIKE 'ABC%';
The result, 32684 is the count of total distinct sids, the same value returned by this query:
select count(distinct sid)
from studentcoursedb.student_status;
The two query
SELECT
count(distinct student_status.sid)
FROM studentcoursedb.student_status
LEFT OUTER JOIN studentcoursedb.student_status AS t0
ON t0.sid = student_status.sid
AND t0.term < student_status.term
AND student_status.major LIKE 'ABC%';
select count(distinct sid)
from studentcoursedb.student_status;
return the same number of rows correctly because
You are left joining (left join or left outer join is the same) the same table this mean that the resulting number of rows is ever the same number of the main table
If you want a subset matching you should use inner join (or other join relation)
You are counting a column from the left table that might have duplicate rows as a result of the LEFT JOIN, but certainly no filtered rows.
A LEFT OUTER JOIN keeps all rows in the first table along with matching rows in the second. Hence, it does not filter the first table. You are counting a column from the first table. So, the LEFT OUTER JOIN does not affect the distinct count.
If you want to filter rows, then use INNER JOIN instead. I would also move the conditions to the WHERE clause:
SELECT count(distinct ss.sid)
FROM studentcoursedb.student_status ss INNER JOIN
studentcoursedb.student_status ss2
ON ss2.sid = ss.sid
WHERE ss2.term < ss.term AND ss.major LIKE 'ABC%';
I should note that I don't think you need a self join. Have you considered:
select dense_rank(ss.term) over (order by term)
from studentcoursedb.student_status ss
where ss.major like 'ABC%';
Much simpler and should have better performance.

Different way of writing this SQL query with partition

Hi I have the below query in Teradata. I have a row number partition and from that I want rows with rn=1. Teradata doesn't let me use the row number as a filter in the same query. I know that I can put the below into a subquery with a where rn=1 and it gives me what I need. But the below snippet needs to go into a larger query and I want to simplify it if possible.
Is there a different way of doing this so I get a table with 2 columns - one row per customer with the corresponding fc_id for the latest eff_to_dt?
select cust_grp_id, fc_id, row_number() over (partition by cust_grp_id order by eff_to_dt desc) as rn
from table1
Have you considered using the QUALIFY clause in your query?
SELECT cust_grp_id
, fc_id
FROM table1
QUALIFY ROW_NUMBER()
OVER (PARTITION BY cust_grp_id
ORDER BY eff_to_dt desc)
= 1;
Calculate MAX eff_to_dt for each cust_grp_id and then join result to main table.
SELECT T1.cust_grp_id,
T1.fc_id,
T1.eff_to_dt
FROM Table1 AS T1
JOIN
(SELECT cust_grp_id,
MAX(eff_to_dt) AS max_eff_to_dt
FROM Table
GROUP BY cust_grp_id) AS T2 ON T2.cust_grp_id = T1.cust_grp_id
AND T2.max_eff_to_dt = T1.eff_to_dt
You can use a pair of JOINs to accomplish the same thing:
INNER JOIN My_Table T1 ON <some criteria>
LEFT OUTER JOIN My_Table T2 ON <some criteria> AND T2.eff_to_date > T1.eff_to_date
WHERE
T2.my_id IS NULL
You'll need to sort out the specific criteria for your larger query, but this is effectively JOINing all of the rows (T1), but then excluding any where a later row exists. In the WHERE clause you eliminate these by checking for a NULL value in a column that is NOT NULL (in this case I just assumed some ID value). The only way that would happen is if the LEFT OUTER JOIN on T2 failed to find a match - i.e. no rows later than the one that you want exist.
Also, whether or not the JOIN to T1 is LEFT OUTER or INNER is up to your specific requirements.

Count rows after joining three tables in PostgreSQL

Suppose I have three tables in PostgreSQL:
table1 - id1, a_id, updated_by_id
table2 - id2, a_id, updated_by_id
Users - id, display_name
Suppose I am using the using the following query:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u1.id=100;
I get 50 as count.
Whereas with:
select count(t1.id1) from table1 t1
left join table2 t2 on (t1.a_id=t2.a_id)
full outer join users u1 t1.updated_by_id=u1.id)
full outer join users u2 t2.updated_by_id=u2.id)
where u2.id=100;
I get only 25 as count.
What is my mistake in the second query? What can I do to get the same count?
My requirement is that there is a single user table, referenced by multiple tables. I want to take the complete list of users and get the count of ids from different tables.
But the table on which I have joined alone returns the proper count but rest of them don't return the proper count. Can anybody suggest a way to modify my second query to get the proper count?
To simplify your logic, aggregate first, join later.
Guessing missing details, this query would give you the exact count, how many times each user was referenced in table1 and table2 respectively for all users:
SELECT *
FROM users u
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t1_ct
FROM table1
GROUP BY 1
) t1 USING (id)
LEFT JOIN (
SELECT updated_by_id AS id, count(*) AS t2_ct
FROM table2
GROUP BY 1
) t2 USING (id);
In particular, avoid multiple 1-n relationships multiplying each other when joined together:
Two SQL LEFT JOINS produce incorrect result
To retrieve a single or few users only, LATERAL joins will be faster (Postgres 9.3+):
SELECT *
FROM users u
LEFT JOIN LATERAL (
SELECT count(*) AS t1_ct
FROM table1
WHERE updated_by_id = u.id
) ON true
LEFT JOIN LATERAL (
SELECT count(*) AS t2_ct
FROM table2
WHERE updated_by_id = u.id
) ON true
WHERE u.id = 100;
What is the difference between LATERAL JOIN and a subquery in PostgreSQL?
Explain perceived difference
The particular mismatch you report is due to the specifics of a FULL OUTER JOIN:
First, an inner join is performed. Then, for each row in T1 that does
not satisfy the join condition with any row in T2, a joined row is
added with null values in columns of T2. Also, for each row of T2 that
does not satisfy the join condition with any row in T1, a joined row
with null values in the columns of T1 is added.
So you get NULL values appended on the respective other side for missing matches. count() does not count NULL values. So you can get a different result depending on whether you filter on u1.id=100 or u2.id=100.
This is just to explain, you don't need a FULL JOIN here. Use the presented alternatives instead.

NOT IN converted to LEFT JOIN giving different result

please help on below query
select * from processed_h where c_type not in (select convert(int,n_index) from index_m where n_index <>'0') **-- 902 rows**
select * from processed_h where c_type not in (2001,2002,2003) **-- 902 rows**
select convert(int,n_index) from index_m where n_index <>'0' **--- 2001,2002,2003**
I tried to convert the not in to LEFT JOIN as below but it is giving me 40,000 rows returned what I am doing wrong
select A.* from processed_h A LEFT JOIN index_m B on A.c_type <> convert(int,B.n_index) and B.n_index <>'0' --40,000 + rows
A LEFT JOIN returns ALL rows from the "left-hand" table regardless of whether the condition matches or not, which is why you are getting the "extra" rows.
An INNER JOIN might give you the same number of rows, but if there are multiple matches in the "right-hand" table then you'll still get more rows than you expect.
If NOT IN gives you the expected results then I'd stick with that. You probably aren;t going to see significant improvements with a join. The only reason I would change to an INNER JOIN is if I needed columns from the joined table in my output.
For the equivalent of a NOT IN using a left join, you need to link the tables as though the results in the linked table should be IN the resultset, then select only those records where the outer joined table did not return a record - like so:
select A.* from processed_h A
LEFT JOIN index_m B on A.c_type = convert(int,B.n_index) and B.n_index <>'0'
WHERE B.n_index IS NULL
However, you might get better performance using a NOT EXISTS query instead:
select A.* from processed_h A
where not exists
(select 1 from index_m B where B.n_index <>'0' and A.c_type = convert(int,B.n_index) )

Outer Join with Where returning Nulls

Hi I have 2 tables. I want to list
all records in table1 which are present in
table2
all records in table2 which are not present in table1 with a where condition
Null rows will be returned by table1 in second condition but I am unable to get the query working correctly. It is only returning null rows
SELECT
A.CLMSRNO,A.CLMPLANO,A.GENCURRCODE,A.CLMNETLOSSAMT,
A.CLMLOSSAMT,A.CLMCLAIMPRCLLOSSSHARE
FROM
PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN (
SELECT
B.CLMSRNO,B.UWADVICETYPE,B.UWADVICENO,B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS,B.UWADVICENET,B.UWADVICEKIND,B.UWADVYEAR,
B.UWADVQTR,B.ISMANUAL,B.UWCLMNOREFNO
FROM
PAKRE.UWADVICE B
WHERE
B.ISMANUAL=1
) r
ON a.CLMSRNO=r.CLMSRNO
ORDER BY
A.CLMSRNO DESC;
Which OS are you using ?
Table aliases are case sensistive on some platforms, which is why your join condition ON a.CLMSRNO=r.CLMSRNO fails.
Try with A.CLMSRNO=r.CLMSRNO and see if that works
I'm not understanding your first attempt, but here's basically what you need, I think:
SELECT *
FROM TABLE1
INNER JOIN TABLE2
ON joincondition
UNION ALL
SELECT *
FROM TABLE2
LEFT JOIN TABLE1
ON joincondition
AND TABLE1.wherecondition
WHERE TABLE1.somejoincolumn IS NULL
I think you may want to remove the subquery and put its columns into the main query e.g.
SELECT A.CLMSRNO, A.CLMPLANO, A.GENCURRCODE, A.CLMNETLOSSAMT,
A.CLMLOSSAMT, A.CLMCLAIMPRCLLOSSSHARE,
B.CLMSRNO, B.UWADVICETYPE, B.UWADVICENO, B.UWADVPREMCURRCODE,
B.GENSUBBUSICLASS, B.UWADVICENET, B.UWADVICEKIND, B.UWADVYEAR,
B.UWADVQTR, B.ISMANUAL, B.UWCLMNOREFNO
FROM PAKRE.CLMCLMENTRY A
RIGHT OUTER JOIN PAKRE.UWADVICE B
ON A.CLMSRNO = B.CLMSRNO
WHERE B.ISMANUAL = 1
ORDER
BY A.CLMSRNO DESC;