Counting rows in multiple tables - sql

I have a mysql database that is tracking hockey stats. What I'd like to do is in one query get the number of goals and assists scored by each player as well as the number of games that they've played in. I'm using Zend Framework and the query that I've build is this:
SELECT `p`.*,
`pxt`.`jersey_number`,
count(pxg.player_x_game_id) AS `games`,
count(goals.scoring_id) AS `goals`,
count(assists.scoring_id) AS `assists`
FROM `players` AS `p`
INNER JOIN `players_x_teams` AS `pxt` ON p.player_id = pxt.player_id
INNER JOIN `teams_x_seasons` AS `txs` ON pxt.team_id = txs.team_id
INNER JOIN `seasons` AS `s` ON txs.season_id = s.season_id
INNER JOIN `games` AS `g` ON g.season_id = s.season_id
INNER JOIN `players_x_games` AS `pxg` ON pxg.game_id = g.game_id
AND pxg.player_id = p.player_id
LEFT JOIN `scoring` AS `goals` ON goals.game_id = g.game_id
AND goals.scorer_id = p.player_id
LEFT JOIN `scoring` AS `assists` ON assists.game_id = g.game_id
AND (assists.assist1_id = p.player_id OR assists.assist2_id = p.player_id)
WHERE (pxt.team_id = 1)
AND (txs.season_id = '23')
AND (pxt.date_added <= s.end_date OR pxt.date_added is null)
AND (pxt.date_removed >= s.start_date OR pxt.date_removed is null)
GROUP BY `p`.`player_id`
This query returns me data, but my counts are off.
+-----------+---------------+-------+-------+---------+
| player_id | jersey_number | games | goals | assists |
+-----------+---------------+-------+-------+---------+
| 2 | 3 | 7 | 1 | 3 |
| 3 | 19 | 6 | 1 | 0 |
| 8 | 8 | 7 | 3 | 2 |
| 9 | 11 | 13 | 10 | 8 |
| 11 | 96 | 6 | 1 | 3 |
| 12 | 14 | 6 | 0 | 3 |
| 13 | 7 | 6 | 0 | 1 |
| 115 | 39 | 9 | 6 | 2 |
| 142 | 68 | 6 | 0 | 1 |
| 143 | 30 | 6 | 0 | 0 |
| 150 | 41 | 11 | 11 | 5 |
| 185 | 17 | 6 | 6 | 3 |
| 225 | 97 | 4 | 1 | 3 |
+-----------+---------------+-------+-------+---------+
In this dataset the most games that should be present are 6, but as you can see I'm getting extras. If I adjust my query to remove the goals and assists fields my games count comes out correct. In fact if I only select one of my counted rows I always get the correct counts, but once I add a second or third count my numbers start to get skewed. What am I doing wrong?

Since you are doing multiple joins which may each match multiple rows and carry over to the next join, you'll need to add distinct in your count. Try this:
SELECT `p`.*,
`pxt`.`jersey_number`,
count(distinct pxg.player_x_game_id) AS `games`,
count(distinct goals.scoring_id) AS `goals`,
count(distinct assists.scoring_id) AS `assists`
FROM `players` AS `p`
INNER JOIN `players_x_teams` AS `pxt` ON p.player_id = pxt.player_id
INNER JOIN `teams_x_seasons` AS `txs` ON pxt.team_id = txs.team_id
INNER JOIN `seasons` AS `s` ON txs.season_id = s.season_id
INNER JOIN `games` AS `g` ON g.season_id = s.season_id
INNER JOIN `players_x_games` AS `pxg` ON pxg.game_id = g.game_id
AND pxg.player_id = p.player_id
LEFT JOIN `scoring` AS `goals` ON goals.game_id = g.game_id
AND goals.scorer_id = p.player_id
LEFT JOIN `scoring` AS `assists` ON assists.game_id = g.game_id
AND (assists.assist1_id = p.player_id OR assists.assist2_id = p.player_id)
WHERE (pxt.team_id = 1)
AND (txs.season_id = '23')
AND (pxt.date_added <= s.end_date OR pxt.date_added is null)
AND (pxt.date_removed >= s.start_date OR pxt.date_removed is null)
GROUP BY `p`.`player_id`

Maybe you need count(DISTINCT pxg.player_x_game_id)...? Looks like there might be duplicates in that humungous megajoin (which I admit I haven't actually taken time to fully reproduce!-)...

Related

Can someone help me figure out if I'm making a mistake in my query?

I'm trying to create a query that returns the names of all people in my database that have less than half of the money of the person with the most money.
These is my query:
select P1.name
from Persons P1 left join
AccountOf A1 on A1.person_id = P1.id left join
BankAccounts B1 on B1.id = A1.account_id
group by name
having SUM(B1.balance) < MAX((select SUM(B1.balance) as b
from AccountOf A1 left join
BankAccounts B1 on B1.id = A1.account_id
group by A1.person_id
order by b desc
LIMIT 1)) * 0.5
This is the result:
+-------+
| name |
+-------+
| Evert |
+-------+
I have the following tables in the database:
+---------+--------+--+
| Persons | | |
+---------+--------+--+
| id | name | |
| 11 | Evert | |
| 12 | Xavi | |
| 13 | Ludwig | |
| 14 | Ziggy | |
+---------+--------+--+
+--------------+---------+
| BankAccounts | |
+--------------+---------+
| id | balance |
| 11 | 525000 |
| 12 | 750000 |
| 13 | 1900000 |
| 14 | 1600000 |
+--------------+---------+
+-----------+-----------+------------+
| AccountOf | | |
+-----------+-----------+------------+
| id | person_id | account_id |
| 301 | 11 | 12 |
| 302 | 13 | 12 |
| 303 | 13 | 14 |
| 304 | 14 | 11 |
| 305 | 14 | 13 |
+-----------+-----------+------------+
What am I missing here? I should get two entries in the result (Evert, Xavi)
I wouldn't approach the logic this way (I would use window functions). But your final having has two levels of aggregation. That shouldn't work. You want:
having SUM(B1.balance) < (select 0.5 * SUM(B1.balance) as b
from AccountOf A1 join
BankAccounts B1 on B1.id = A1.account_id
group by A1.person_id
order by b desc
limit 1
)
I also moved the 0.5 into the subquery and changed the left join to a join -- the tables need to match to get balances.
I would recommend window functions, if your - undisclosed! - database supports them.
You can join and aggregate just once, and then use a window max() to get the top balance. All that is then left to is to filter in an outer query:
select *
fom (
select p.id, p.name, coalesce(sum(balance), 0) balance,
max(sum(balance)) over() max_balance
from persons p
left join accountof ao on ao.person_id = p.id
left join bankaccounts ba on ba.id = ao.account_id
group by p.id, p.name
) t
where balance > max_balance * 0.5

Get right table data on LEFT JOIN

I have a problem with my sql join request.
I need to get lines of left table who are not referenced in right table for ME (User 1) or referenced in right table with status equal to 0 and user equal to 1.
I also need the field status of right table.
Here is my two tables :
Table left
ID | title
1 | Title 1
2 | Title 2
3 | Title 3
4 | Title 4
5 | Title 5
6 | Title 6
Table right
ID | status | user | left_id
1 | 0 | 1 | 1
2 | 0 | 50 | 1
3 | 1 | 1 | 2
4 | 0 | 50 | 2
5 | 0 | 1 | 3
6 | 1 | 50 | 3
7 | 0 | 50 | 4
8 | 1 | 50 | 5
My goal is to get this result :
left.ID | left.title | right.status | right.user
1 | Title 1 | 0 | 1
3 | Title 3 | 0 | 1
4 | Title 4 | NULL | NULL
5 | Title 5 | NULL | NULL
6 | Title 6 | NULL | NULL
Here is my request for the moment :
SELECT l.id, l.title, r.user, r.status
FROM left as l
LEFT JOIN right as r ON l.id = r.left_id
WHERE r.left_id IS NULL or (r.user = 1 AND r.status = 0)
With this request I get lines ID (left table) 1 / 3 / 6. But I also need the ID 4 / 5.
Those lines isn't displayed because another user (50) as a reference, but it's not me (1).
If someone can help me to add line 4 / 5 to my result I would be happy.
Thanks
Small improvement of the query should be sufficient:
SELECT l.id, l.title, r.user, r.status
FROM left as l
LEFT JOIN right as r ON l.id = r.left_id and r.user = 1
WHERE r.left_id IS NULL or r.status = 0
(Select t1.id, t1.title,t2.status,t2.user
from tableLeft t1
right outer join tableRight t2 on t2.left_id=t1.id
where t1.id not in
(select tt2.left_id from tableRight tt2)
)
union
(select t1.id,t1.title,t2.status,t2.user
from tableLeft t1
left join tableRight t2 on t1.id=t2.left_id
where t2.status=0 and t2.user=1
)

Difficult query in either Linq To Sql or SQL

I've been working on this for 2 days and I can't figure it out. I'm hoping someone smarter than me will give this a go.
Let's say I have the following tables:
Rating:
Id | Name
1 | A
2 | B
3 | C
4 | D
5 | E
Inspection:
Id | Date (mm/dd/yyyy)
1 | 01/04/2012
2 | 04/04/2012
3 | 28/03/2012
4 | 04/04/2012
Observation:
Id | InspectionId | RatingId
1 | 2 | 3
2 | 1 | 2
3 | 4 | 3
4 | 2 | 1
5 | 3 | 3
6 | 1 | 2
I want the query to return:
RatingName | Date(mm/dd/yyyy) | ObservationCount
A | 01/04/2012 | 0
B | 01/04/2012 | 1
C | 01/04/2012 | 1
D | 01/04/2012 | 0
E | 01/04/2012 | 0
A | 04/04/2012 | 1
B | 04/04/2012 | 0
C | 04/04/2012 | 2
D | 04/04/2012 | 0
E | 04/04/2012 | 0
A | 28/03/2012 | 0
B | 28/03/2012 | 0
C | 28/03/2012 | 1
D | 28/03/2012 | 0
E | 28/03/2012 | 0
So I need the number of Observations for each rating for each date. And yes I need to have the records which return 0 Observations because I'm using this data in a stacked chart and without them it throws an error. I've managed to get the above table but without the records that return 0 Observations with the following Linq To Sql query, but from this point I get stuck.
MyDataContext DB = new MyDataContext();
var data =
(from r in DB.Ratings
join o in DB.Observations on r.Id equals o.RatingId into ro
from observation in ro.DefaultIfEmpty()
join i in DB.Inspections on observation.InspectionId equals i.Id into roi
from q in roi.DefaultIfEmpty()
group q by new { Name = r.Name, Date = q.Date } into grouped
select
new
{
RatingName = grouped.Key.Name,
Date = grouped.Key.Date,
ObservationCount = grouped.Count(x => x != null)
}).OrderBy(x => x.Date);
I would appreciate an answer in either Linq To Sql or just plain old SQL, thanks!
You should try a CROSS JOIN on the two reference tables, and then OUTER JOIN back to your 'data' table - and then check whether the Observation is null... then group and sum!
SELECT
[Name],
[Date],
SUM([Exists])
FROM
(
SELECT
name,
[Date],
CASE WHEN o.Id IS NULL THEN 0
ELSE 1
END as [Exists]
FROM
Rating r CROSS JOIN
Inspection i
LEFT OUTER JOIN Observation o
ON o.RatingId = r.Id AND o.InspectionId = i.Id
) as [source]
GROUP BY [Name], [Date]
ORDER BY
[Date],
name
Translating to LINQ would be a similar two-step process - get the inner result set (checking whether observation is NULL), before grouping and summing.

Order the join returning multiple rows

In PostgreSQL 8.4 I have a join always returning 3 rows (they represent 3 players playing the game round rid) -
# SELECT r.rid, c2.pos, c2.money, c2.last_ip, c2.quit,
u.id, u.first_name
FROM pref_rounds r
JOIN pref_cards c1 USING (rid)
JOIN pref_cards c2 USING (rid)
JOIN pref_users u ON u.id = c2.id
WHERE c1.id = 'OK336197153288';
rid | pos | money | last_ip | quit | id | first_name
--------+-----+-------+-----------------+------+-----------------------+------------
165684 | 0 | 14 | 77.91.175.242 | f | OK336197153288 | Елена
165684 | 1 | -2 | 195.177.124.218 | f | OK3982469933 | Константин
165684 | 2 | -14 | 92.243.183.44 | f | MR438331292705069453 | Дмитрий
165711 | 2 | 10 | 77.91.175.242 | f | OK336197153288 | Елена
165711 | 0 | -2 | 195.177.124.218 | f | OK3982469933 | Константин
165711 | 1 | -6 | 92.243.183.44 | f | MR438331292705069453 | Дмитрий
165764 | 1 | 13 | 77.91.175.242 | f | OK336197153288 | Елена
165764 | 2 | -17 | 195.177.124.218 | f | OK3982469933 | Константин
165764 | 0 | 3 | 92.243.183.44 | f | MR438331292705069453 | Дмитрий
Unfortunately they are not sorted by the pos (the 2nd column which is a position at a playing table).
Is there please a way in SQL to sort the above sets of 3 rows so that they always have the pos: 0 1 2 then again 0 1 2?
Or do I have to do it in my PHP script?
Use the following sql query. It will give you the required result.
# SELECT r.rid, c2.pos, c2.money, c2.last_ip, c2.quit,
u.id, u.first_name
FROM pref_rounds r
JOIN pref_cards c1 USING (rid)
JOIN pref_cards c2 USING (rid)
JOIN pref_users u ON u.id = c2.id
WHERE c1.id = 'OK336197153288'
Order by r.rid asc, c2.pos asc
;
Maybe something like this:
SELECT r.rid, c2.pos, c2.money, c2.last_ip, c2.quit,
u.id, u.first_name
FROM pref_rounds r
JOIN pref_cards c1 USING (rid)
JOIN pref_cards c2 USING (rid)
JOIN pref_users u ON u.id = c2.id
WHERE c1.id = 'OK336197153288'
ORDER BY r.rid,c2.pos;
You first need to order by the r.rid then by c2.pos to get them order like you want

Left Join on Associative Table

I have three tables
Prospect -- holds prospect information
id
name
projectID
Sample data for Prospect
id | name | projectID
1 | p1 | 1
2 | p2 | 1
3 | p3 | 1
4 | p4 | 2
5 | p5 | 2
6 | p6 | 2
Conjoint -- holds conjoint information
id
title
projectID
Sample data
id | title | projectID
1 | color | 1
2 | size | 1
3 | qual | 1
4 | color | 2
5 | price | 2
6 | weight | 2
There is an associative table that holds the conjoint values for the prospects:
ConjointProspect
id
prospectID
conjointID
value
Sample Data
id | prospectID | conjointID | value
1 | 1 | 1 | 20
2 | 1 | 2 | 30
3 | 1 | 3 | 50
4 | 2 | 1 | 10
5 | 2 | 3 | 40
There are one or more prospects and one or more conjoints in their respective tables. A prospect may or may not have a value for each conjoint.
I'd like to have an SQL statement that will extract all conjoint values for each prospect of a given project, displaying NULL where there is no value for a value that is not present in the ConjointProspect table for a given conjoint and prospect.
Something along the lines of this for projectID = 1
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | NULL
2 | 3 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
I've tried using an inner join on the prospect and conjoint tables and then a left join on the ConjointProspect, but somewhere I'm getting a cartesian products for prospect/conjoint pairs that don't make any sense (to me)
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
INNER JOIN conjoint c ON p.projectID = c.projectid
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
prospectID | conjoint ID | value
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
1 | 1 | 20
1 | 2 | 30
1 | 3 | 50
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
2 | 1 | 10
2 | 2 | 40
3 | 1 | NULL
3 | 2 | NULL
3 | 3 | NULL
Guidance is very much appreciated!!
Then this will work for you... Prejoin a Cartesian against all prospects and elements within that project via a select as your first FROM table. Then, left join to the conjoinprospect. You can obviously change / eliminate certain columns from result, but at least all is there, in the join you want with exact results you are expecting...
SELECT
PJ.*,
CJP.Value
FROM
( SELECT
P.ID ProspectID,
P.Name,
P.ProjectID,
CJ.Title,
CJ.ID ConJointID
FROM
Prospect P,
ConJoint CJ
where
P.ProjectID = 1
AND P.ProjectID = CJ.ProjectID
ORDER BY
1, 4
) PJ
LEFT JOIN conjointProspect cjp
ON PJ.ProspectID = cjp.prospectID
AND PJ.ConjointID = cjp.conjointid
ORDER BY
PJ.ProspectID,
PJ.ConJointID
Your cartesian product is a result of joining by project Id - in your sample data there are 3 prospects with a project id of 1 and 3 conjoints with a project id of 1. Joining based on project id should then result in 9 rows of data, which is what you're getting. It looks like you really need to join via the conjointprospects table as that it what holds the mapping between prospects and conjoint.
What if you try something like:
SELECT p.id, p.name, c.id, c.title, cp.value
FROM prospect p
LEFT JOIN conjointProspect cp ON cp.prospectID = p.id
RIGHT JOIN conjoint c ON cp.conjointID = c.id
WHERE p.projectID = 2
ORDER BY p.id, c.id
Not sure if that will work, but it seems like conjointprospects needs to be at the center of your join in order to correctly map prospects to conjoints.