Having trouble understand which table is count(*) actually counting when joining two tables together. - sql

If I have the following codes, which table is count(*) counting the number or rows in it?
SELECT COUNT(*)
FROM region r
JOIN sales_rep s
ON r.id = s.region_id
Thanks a lot!!!

It is counting the rows from your join query as a whole. It is the same as:
SELECT COUNT(*)
FROM
(SELECT * FROM region r
JOIN sales_rep s
ON r.id = s.region_id)

It is counting all records of the "resulting" table from the join. The number will depend on the type of relationship between region and sales_rep. Assuming your domain is that a region has zero or many sales_rep and that all sales_rep have one and only one region then the count will always equals to the count of sales_rep. For example:
regions: (id) sales_rep: (id, reg_id)
A 1 A
B 2 A
C 3 B
4 C
count(*) = 4
On the other hand if a sales_rep can have no region then the ones with no region are excluded:
regions: (id) sales_rep: (id, reg_id)
A 1 A
B 2 null
C 3 B
4 C
count(*) = 3
Or if a sales_rep can have more than one region then the result is the same as 1st case but you will have a higher count:
regions: (id) sales_rep: (id, reg_id)
A 1 A
B 1 B
C 2 A
3 B
4 C
count(*) = 5
I'd definitely recommend you to do a select * so you can see by yourself whats happening and then change it to count. Additionally please refer to this article to better understand joints: https://www.w3schools.com/sql/sql_join.asp

Related

SQL - Count rows based on matching columns and value range

Please see below query using T-SQL with SSMS. There are three tables: B, G and L.
B has a column Bname
G has 2 columns Gname, Gross
L has 2 columns Bname, Gname
Gross column is an INT ranging values between 80 and 100.
Table L's columns: Bname and Gname will feature names from the B and G tables on the same row. Where both names feature on the same row, I would like to COUNT this as one item; only if the Gross on Table G ranges between 80 and 100 to the corresponding Gname row.
My current query reads:
SELECT l.bname, (SELECT COUNT(*) FROM g WHERE g.gross BETWEEN 80 AND 90) AS Good
FROM l
INNER JOIN b
ON b.bname=l.bname
INNER JOIN g
ON g.gname=l.gname
GROUP BY l.bname;
The result is nearly there, but it counts all Table G:Gname rows betweeen 80 and 100. Emitting the instances on Table L where the Bname and Gname are on the same row.
Thanks in advance for looking.
I suspect that you want:
SELECT l.bname,
(SELECT COUNT(*)
FROM b INNER JOIN
g
ON g.gname = l.gname
WHERE b.bname = l.bname AND g.gross BETWEEN 80 AND 90
) AS Good
FROM l ;
The outer aggregation is not needed of l.bname is unique.
This would more commonly be calculating using conditional aggregation:
SELECT l.bname,
SUM(CASE WHEN g.gross BETWEEN 80 AND 90 THEN 1 ELSE 0 END) AS Good
FROM l INNER JOIN
b
ON b.bname = l.bname INNER JOIN
g
ON g.gname = l.gname
GROUP BY l.bname;
No subquery is needed.

How to assign zero(0) when the avg of a particular field is null in PostgreSQL

I have two tables:
Table user_ratings
id home_info_id ratings
1 1 3.5
2 2 3.5
3 1 4
4 1 5
5 1 2
6 2 1
7 2 4
Table home_info:
id home_name
1 my_home
2 ur_home
3 his_home
As you can see 'my_home' and 'ur_home' has ratings but 'his_home' is not rated yet. I am calculating the avg of all homes, so I am getting avg of only two homes, i.e. 'my_home' and 'ur_home', as I said 'his_home' is not rated yet, so I am not getting 'his_home' in my query below. I want all the names of homes which are not rated yet. Here is my query:
select u.home_info_id
, avg(u.ratings)
, h.home_name
from user_ratings u
, home_info h
where h.id = u.home_info_id
group by u.home_info_id
, h.home_name;
The output is something like this:
home_info_id ratings home_name
1 4.83 my_home
2 2.83 ur_home
But I want something like this:
home_info_id ratings home_name
1 4.83 my_home
2 2.83 ur_home
3 0 his_home
You can use COALESCE with LEFT JOIN (instead of implicit INNER JOIN):
select h.id
, coalesce(avg(u.ratings), 0)
, h.home_name
from home_info h
left join user_review u on h.id = u.home_info_id
group by h.id
, h.home_name
When scanning the whole table or most of it, it is cheaper to aggregate before you join:
SELECT h.id, h.home_name
, COALESCE(u.avg_rating, 0) AS avg_rating
FROM home_info h
LEFT JOIN (
SELECT home_info_id AS id, avg(ratings) AS avg_rating
FROM user_review
GROUP BY 1
) u USING (id);
Test with EXPLAIN ANALYZE.
How to make a SELECT query in Hibernate includes Subquery COUNT(*)
Aggregate a single column in query with many columns

SQL joining two tables with common row

I have 2 tables in sybase
Account_table
Id account_code
1 A
2 B
3 C
Associate_table
id account_code
1 A
1 B
1 C
2 A
2 B
3 A
3 C
I have this sql query
SELECT * FROM account_table account, associate_table assoc
WHERE account.account_code = assoc.account_code
This query will return 7 rows. What I want is to return the rows from associate_table that is only common to the 3 accounts like this:
account id account_code Assoc Id
1 A 1
2 B 1
3 C 1
Can anyone help what kind of join should I do?
SELECT b.id account_id,a.code account_code,a.id assoc_id
FROM associate a,
account b
WHERE a.code = b.code
AND a.id IN (SELECT a.id
FROM associate a,
account b
WHERE a.code = b.code
GROUP BY a.id
HAVING Count(*) = (SELECT Count(*)
FROM account));
NOTE: this query works only if you have unique values in Id and account_code columns in account table. And also, your associate_table should contain unique combination of (id, account,code). i.e., associate table should not contain (1,A) or any pair twice.
Try this
SELECT AC.ID,AC.account_code,ASS.ID
FROM account_table AC INNER JOIN associate_table AS ASS ON AC.account_code = ASS.account_code
OK so far answer is accepted I'll post simpler one:
SELECT *
FROM account_table AS account,
associate_table AS assoc
WHERE account.account_code = assoc.account_code
HAVING (
SELECT
COUNT(*)
FROM associate_table assoc_2
WHERE assoc_2.id = assoc.id
) = 3
here 3 is the number of codes account table has, if it's gonna be dynamic (changing over time),
you can use (SELECT COUNT(*) FROM account_table) instead of exact number. Also I'm sure it will be cached by database engine, so requires less resources

I need a SQL query for comparing column values against rows in the same table

I have a table called BB_BOATBKG which holds passengers travel details with columns Z_ID, BK_KEY and PAXSUM where:
Z_ID = BookingNumber* LegNumber
BK_KEY = BookingNumber
PAXSUM = Total number passengers travelled in each leg for a particular booking
For Example:
Z_ID BK_KEY PAXSUM
001234*01 001234 2
001234*02 001234 3
001287*01 001287 5
001287*02 001287 5
002323*01 002323 7
002323*02 002323 6
I would like to get a list of all Booking Numbers BK_KEY from BB_BOATBKG where the total number of passengers PAXSUM is different in each leg for the same booking
Example, For Booking number A, A*Leg01 might have 2 Passengers, A* Leg02 might have 3 passengers
Dependent of your RDBMs there might be several options availible. A solution that should work for most is:
SELECT A.Z_ID, A.BK_KEY, A.PAXSUM
FROM BB_BOATBKG A
JOIN (
SELECT BK_KEY
FBB_BOATBKGROM BB_BBK_KEY
GROUP BY BK_KEY
HAVING COUNT( DISTINCT PAXSUM ) > 1
) B
ON A.BK_KEY = B.BK_KEY
If your DBMS support OLAP functions, have a look at RANK() OVER (...)
It's a little counterintuitive, but you could join the table to itself on {BK_KEY, PAXSUM} and pull out only the records whose joined result is null.
I think this does it:
SELECT
a.BK_KEY
FROM
BB_BOATBKG a
LEFT OUTER JOIN BB_BOATBKG b ON a.BK_KEY = b.BK_KEY AND a.PAXSUM = b.PAXSUM
WHERE
b.Z_ID IS NULL
GROUP BY
a.BK_KEY
Edit: I think I missed anything beyond the trivial case. I think you can do it with some really nasty subselecting though, a la:
SELECT
b.BK_KEY
FROM
(
SELECT
a.BK_KEY,
Count = COUNT(*)
FROM
(
SELECT
a.BK_KEY,
a.PAXSUM
FROM
BB_BOATBKG a
GROUP BY
a.BK_KEY,
a.PAXSUM
HAVING
COUNT(*) = 1
) a
GROUP BY
a.BK_KEY
) b
INNER JOIN
(
SELECT
c.BK_KEY,
Count = COUNT(*)
FROM
BB_BOATBKG c
GROUP BY
c.BK_KEY
) c ON b.BK_KEY = c.BK_KEY AND b.Count = c.Count

Joining 3 Tables Using Newest Rows

I have 3 tables in my database: children, families, statuslog
Every time a child is checked in or out of the database, it is updated in the statuslog. I've done this a long time ago, but I can't seem to figure out how to do it anymore. I want to create a new view that joins all 3 tables, but I only want the newest entry from statuslog (by using the highest id).
For example, statuslog looks like this:
childID researcher status id
1 Dr. A out 1
1 Dr. A in 2
1 Dr. B out 3
1 Dr. B in 4
2 Dr. C out 5
2 Dr. C in 6
3 Dr. B out 7
3 Dr. B in 8
This is what I want to do:
SELECT *
FROM children, families, statuslog
WHERE children.familyID = families.familyID AND children.childID = statuslog.childID
Obviously, this will return the children+families tuples coupled with every single log entry, but I can't remember how to only combine it with the newest log entry.
Any help would be appreciated!
Aggregate query with max(id) retrieves last ID given a childID. This is then joined to statuslog to retrieve other columns.
SELECT *
FROM children
INNER JOIN families
ON children.familyID = families.familyID
INNER JOIN
(
SELECT childID, researcher, status
FROM statuslog
INNER JOIN
(
SELECT childID, max(ID) ID
FROM statuslog
GROUP BY childID
) lastSL
ON statuslog.childID = lastSL.childid
AND statuslog.ID = lastSL.ID
) sl
ON children.childID = sl.childID
This seems to be the typical greatest-n-per-group in which the higher id is interpreted as the newest. This query should do the trick:
select * from (
select s1.* from statusLog s1
left join statusLog s2
on s1.childId = s2.childId and s1.id < s2.id
where s2.id is null
) final
join children c on c.childId = final.childId
join families f on f.familyId = c.familyId
Correct any syntactical errors.