SQL Server double left join counts are different - sql

Code:
Select a.x,
a.y,
b.p,
c.i
from table1 a left join table2 b on a.z=b.z
left join table3 on a.z=c.z;
When I am using the above code I am not getting the correct counts:
Table1 has 30 records.
After first left join I get 30 records but after 2nd left join I am getting 33 records.
I am having hard time figuring out why I am getting different counts. According to my understanding I should be getting 30 counts even after the 2nd left join.
Can anyone help me understand this difference?
I am using sql server 2012

There are multiple rows in table3 with the same z value.
You can find them by doing:
select z, count(*)
from table3
group by z
having count(*) >= 2
order by count(*) desc;
If you want at most one match, then outer apply can be useful:
Select a.x, a.y, b.p, c.i
from table1 a outer apply
(select top 1 b.*
from table2 b
where a.z = b.z
) b outer apply
(select top 1 c.*
from table3 c
where a.z = c.z
) c;
Of course, top 1 should be used with order by, but I don't know which row you want. And, this is probably a stop-gap; you should figure out why there are duplicates.

In your table table3 contain more then 1 row per 1 row in table1. Check one value which is occured more times in both tables.
You can use group by with max function to make one to one row.

Related

Counting using JOIN in Bigquery

I have two sets A and B. I want to display counts of A as well as counts on A (intersection) B using condition X.
Code I am using
SELECT COUNT(A) as total, COUNT(IF (condition_X)) as chg
FROM A
FULL OUTER JOIN B
ON JOIN KEY Y
I am able to get the intersection but not the count of A in total.
Perhaps you just want a cross join?
select *
from (select count(*) as cnt_a from a) a cross join
(select count(*) as cnt_b
from a join
b
on y
where condition
) b
Simply left join the two
Select count(A.id=B.id),
count(A.id)
from A left join B on A.id=B.id
where condition='x'

Query Optimization : using (Union instead of OR) and (exists instead of null)

i have a Query optimisation issue.
for the context, this query has always been running instantly
but today it took way more time. (3h+)
so i tried to fix it.
The query is Like -->
Select someCols from A
inner join B left join C
Where A.date = Today
And (A.col In ( Select Z.colseekedinA from tab Z) --A.col is the same column for
-- than below
OR
A.col In ( Select X.colseekedinA from tab X)
)
-- PART 1 ---
Select someCols from A
inner join B left join C -- takes 1 second 150 lines
Where A.date = Today
-- Part 2 ---
Select Z.colseekedinA from tab Z
OR -- Union -- takes 1 seconds 180 lines
Select X.colseekedinA from tab X
When i join now the two parts with the In, the query becomes incredibly long.
so i optimized it using union instead or OR and exists instead of in
but it still takes 3 minutes
i want to get it done again down to 5 seconds.
do you see some query issue ?
thank you
Using Union and Exists
Select someCols
from A
inner join B on a.col = b.col
left join C on b.col = c.col
Where A.date = Today
and exists(
Select Z.colseekedinA from tab Z where Z.colseekedinA = A.col
Union
Select X.colseekedinA from tab X where x.colseekedinA = A.col )
Also, if possible change below join to Left join.
inner join B on a.col = b.col
The exists approach may give spurious results as you will get rows that do not match either condition just if 1 row does match. This might be avoided by using exists within a correlated subquery but it isn't something I have experimented with enough to recommend.
For speed I'd go for a cross apply and specify the parent table within the cross apply expression (correlated subquery to create a derived table). That way the join condition is specified before the data is returned, if the columns in question have indexes on them (i.e. they are primary keys) then the optimiser can work out an efficient plan for this.
Union all is used within the cross apply expression as this prevents a distinct sort within the derived table which is generally heavier in terms of cost than bringing the data itself back (union has to identify all rows anyway including duplications).
Finally if this is still slow then potentially you might want to add an index to the date column in table a. This overcomes the lack of sargability inherent in a date column and means the optimiser can leverage the index rather than scanning all of the rows in the result set and testing whether or not the date equals today.
Select someCols from A
inner join B left join C
cross apply (Select Z.colseekedinA from tab Z where a.col=z.colseekedinA
union all
Select X.colseekedinA from tab X where a.col=x.colseekedina) d
Where A.date = Today
You code is confused but for the first part
You could try using a select UNION for the inner subquery ( these with OR )
and avoid the IN clause using a inner JOIN
Select someCols from A
inner join B
left join C
INNER JOIN (
Select Z.colseekedinA from tab Z
UNION
Select X.colseekedinA from tab X
) t on A.col = t.colseekedinA
Where A.date = Today

Query left join without all the right rows from B table

I have 2 tables, A and B.
I need all columns from A + 1 column from B in my select.
Unfortunately, B has multiples rows(all identicals) for 1 row in A
on the join condition.
I tried but I can't isolate one row in A for one row in B with left join for example while keeping my select.
How can I do this query ? Query in ORACLE SQL
Thanks in advance.
This is a good use for outer apply. The structure of the query looks like this:
select a.*, b.col
from a outer apply
(select top 1 b.col
from b
where b.? = a.?
) b;
Normally, you would only use top 1 with order by. In this case, it doesn't seem to make a difference which row you choose.
You can group by on all columns from A, and then use an aggregate (like max or min) to pick any of the identical B values:
select a.*
, b.min_col1
from TableA a
left join
(
select a_id
, min(col1) as min_col1
from TableB
group by
a_id
) b
on b.a_id = a.id

BigQuery Full outer join producing "left join" results

I have 2 tables, both of which contain distinct id values. Some of the id values might occur in both tables and some are unique to each table. Table1 has 10,910 rows and Table2 has 11,304 rows
When running a left join query:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
I get a total of 10,896 rows or 10,896 ids shared across both tables.
However, when I run a FULL OUTER JOIN on the 2 tables like this:
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
I get total of 10,896 rows, but I was expecting all 10,910 rows from table1.
I am wondering if there is an issue with my query syntax.
As you are using EACH - it looks like you are running your queries in Legacy SQL mode.
In BigQuery Legacy SQL - COUNT(DISTINCT) function is probabilistic - gives statistical approximation and is not guaranteed to be exact.
You can use EXACT_COUNT_DISTINCT() function instead - this one gives you exact number but a little more expensive on back-end
Even better option - just use Standard SQL
For your specific query you will only need to remove EACH keyword and it should work as a charm
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
JOIN table2 b on a.id = b.id
and
#standardSQL
SELECT COUNT(DISTINCT a.id)
FROM table1 a
FULL OUTER JOIN table2 b on a.id = b.id
I added the original query as a subquery and counted ids and produced the expected results. Still a little strange, but it works.
SELECT EXACT_COUNT_DISTINCT(a.id)
FROM
(SELECT a.id AS a.id,
b.id AS b.id
FROM table1 a FULL OUTER JOIN EACH table2 b on a.id = b.id))
It is because you count in both case the number of non-null lines for table a by using a count(distinct a.id).
Use a count(*) and it should works.
You will have to add coalesce... BigQuery, unlike traditional SQL does not recognize fields unless used explicitly
SELECT COUNT(DISTINCT coalesce(a.id,b.id))
FROM table1 a
FULL OUTER JOIN EACH table2 b on a.id = b.id
This query will now take full effect of full outer join :)

How to select records from a Table that has a certain number of rows in a related table in SQL Server?

Not quite sure how to ask this, but I have 2 tables that are related in a 1 to many relationship, I need to select all records in the "1" table that have less than three records in the "many' table.
select b.foreignkey,count(b.foreignkey) as bidcount
from b
where b.foreignkey in (select a.id from a) and bidcount< 3
group by b.foreignkey
this doesn't work at all I know but I am at a loss how to do this.
I need to in the end select all the records from the "a" table based on this criteria. Sorry if that is confusing!
Just using your code, not tested:
SELECT
b.foreignkey,
count(b.foreignkey) as bidcount
FROM
b
WHERE
b.foreignkey IN (SELECT a.id FROM a)
GROUP BY
b.foreignkey
HAVING
count(b.foreignkey) < 3
Try this:
SELECT t1.id,COUNT(t2.parentId)
FROM table1 as t1
INNER JOIN table2 as t2
ON t1.id = t2.parentId
GROUP BY t1.id
HAVING COUNT(t2.parentId) < 3
You didn't mention which version of SQL Server you're using - if you're on SQL Server 2005 or newer, you could use this CTE (Common Table Expression):
;WITH ChildRows AS
(
SELECT A.Id, COUNT(b.Id) AS 'BCount'
FROM
dbo.TableA A
INNER JOIN
dbo.TableB B ON B.TableAId = A.Id
)
SELECT A.*, R.BCount
FROM dbo.TableA A
INNER JOIN ChildRows R ON A.Id = R.Id
The inner SELECT lists the Id columns from TableA and the count of the child rows associated with those (using the INNER JOIN to TableB) - and the outer SELECT just builds on top of that result set and shows all fields from table A (and the count from the B table)
if you want to return all fields of your (1) table in one query, I suggest you consider using CROSS APPLY:
SELECT t1.* FROM table_1 t1
CROSS APPLY (SELECT COUNT(*) cnt FROM Table_Many t2 WHERE t2.fk = t1.pk) a
where a.cnt < 3
in some particular cases, based on your indices and db structure, this query may run 4 times faster than the GROUP BY method
you have posted this question in sql server, I have a answer in oracle database system (don't know whether it will run in sql server as well or not)
this is as follow-
select [desired column list] from
(select b.*, count(*) over (partition by b.foreignkey) c_1
from b
where b.foreignkey in (select a.id from a) )
where c_1 < 3 ;
i hope it should work on sql server as well...
if not please let me update ..