SQL JOIN with 2 aggregates returning incorrect results

SQL JOIN with 2 aggregates returning incorrect results - sql

I am trying to join 3 different tables to get how many Home Runs a player has in his career along with how many Awards they have recieved. However, I'm getting incorrect results:
Peoples
PlayerId
Battings
PlayerId, HomeRuns
AwardsPlayers
PlayerId, AwardName
Current Attempt
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns, COUNT(AwardsPlayers.PlayerId)
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
JOIN AwardsPlayers ON AwardsPlayers.PlayerId = Battings.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
Result
PlayerID HomeRuns AwardCount
bondsba01 35814 1034
ruthba01 23562 726
rodrial01 21576 682
mayswi01 21120 736
willite01 20319 741
griffke02 18270 667
schmimi01 18084 594
musiast01 16150 748
pujolal01 14559 414
dimagjo01 12996 468
ripkeca01 12499 609
gehrilo01 12325 425
aaronha01 12080 368
foxxji01 11748 462
ramirma02 10545 399
benchjo01 10114 442
sosasa01 9744 304
ortizda01 9738 360
piazzmi01 9394 396
winfida01 9300 460
rodriiv01 9019 667
robinfr02 8790 330
dawsoan01 8760 420
robinbr01 8576 736
hornsro01 8127 648
I am pretty confident it's my second join Do I need to do some sort of subquery or should this work? Barry Bonds definitely does not have 35,814 Home Runs nor does he have 1,034 Awards
If I just do a single join, I get the correct output:
SELECT TOP 25 Peoples.PlayerId, SUM(Battings.HomeRuns) as HomeRuns
FROM Peoples
JOIN Battings ON Battings.PlayerId = Peoples.PlayerId
GROUP BY Peoples.PlayerId
ORDER BY SUM(HomeRuns) desc
bondsba01 762
aaronha01 755
ruthba01 714
rodrial01 696
mayswi01 660
pujolal01 633
griffke02 630
thomeji01 612
sosasa01 609
robinfr02 586
mcgwima01 583
killeha01 573
palmera01 569
jacksre01 563
ramirma02 555
schmimi01 548
ortizda01 541
mantlmi01 536
foxxji01 534
mccovwi01 521
thomafr04 521
willite01 521
bankser01 512
matheed01 512
ottme01 511
What am I doing wrong? I'm sure it's how I'm joining my second table (AwardsPlayers)

I think you have two independent dimensions. The best approach is to aggregate before joining:
SELECT TOP 25 p.PlayerId, b.HomeRuns, ap.cnt
FROM Peoples p LEFT JOIN
(SELECT b.PlayerId, SUM(b.HomeRuns) as HomeRuns
FROM Battings b
GROUP BY b.PlayerId
) b
ON b.PlayerId = p.PlayerId LEFT JOIN
(SELECT ap.PlayerId, COUNT(*) as cnt
FROM AwardsPlayers ap
GROUP BY ap.PlayerId
) ap
ON ap.PlayerId = p.PlayerId
ORDER BY b.HomeRuns desc;
Result
bondsba01 762 47
aaronha01 755 16
ruthba01 714 33
rodrial01 696 31
mayswi01 660 32
pujolal01 633 23
griffke02 630 29
thomeji01 612 6
sosasa01 609 16
robinfr02 586 15
mcgwima01 583 9
killeha01 573 8
palmera01 569 8
jacksre01 563 13
ramirma02 555 19
schmimi01 548 33
ortizda01 541 18
mantlmi01 536 15
foxxji01 534 22
mccovwi01 521 10
thomafr04 521 10
willite01 521 39
bankser01 512 10
matheed01 512 4
ottme01 511 11

Related

how to select a value based on multiple criteria

I'm trying to select some values based on some proprietary data, and I just changed the variables to reference house prices.
I am trying to get the total offers for houses where they were sold at the bid or at the ask price, with offers under 15 and offers * sale price less than 5,000,000.
I then want to get the total number of offers for each neighborhood on each day, but instead I'm getting the total offers across each neighborhood (n1 + n2 + n3 + n4 + n5) across all dates and the total offers in the dataset across all dates.
My current query is this:
SELECT DISTINCT(neighborhood),
DATE(date_of_sale),
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`
WHERE ((offers * accepted_sale_price < 5000000)
AND (offers < 15)
AND (house_bid = sale_price OR
house_ask = sale_price))) as bid_ask_off,
(SELECT SUM(offers)
FROM `big_query.a_table_name.houseprices`) as
total_offers,
FROM `big_query.a_table_name.houseprices`
GROUP BY neighborhood, DATE(date_of_sale) LIMIT 100
Which I am expecting a result like, with date being repeated throughout as d1, d2, d3, etc.:
but am instead receiving
I'm aware that there are some inherent problems with what I'm trying to select / group, but I'm not sure what to google or what tutorials to look at in order to perform this operation.
It's querying quite a bit of data, and I want to keep costs down, as I've already racked up a smallish bill on queries.
Any help or advice would be greatly appreciated, and I hope I've provided enough information.
Here is a sample dataframe.
neighborhood date_of_sale offers accepted_sale_price house_bid house_ask
bronx 4/1/2022 3 323 320 323
manhattan 4/1/2022 4 244 230 244
manhattan 4/1/2022 8 856 856 900
queens 4/1/2022 15 110 110 135
brooklyn 4/2/2022 12 115 100 115
manhattan 4/2/2022 9 255 255 275
bronx 4/2/2022 6 330 300 330
queens 4/2/2022 10 405 395 405
brooklyn 4/2/2022 4 254 254 265
staten_island 4/3/2022 2 442 430 442
staten_island 4/3/2022 13 195 195 225
bronx 4/3/2022 4 650 650 690
manhattan 4/3/2022 2 286 266 286
manhattan 4/3/2022 6 356 356 400
staten_island 4/4/2022 4 361 361 401
staten_island 4/4/2022 5 348 348 399
bronx 4/4/2022 8 397 340 397
manhattan 4/4/2022 9 333 333 394
manhattan 4/4/2022 11 392 325 392

I think that this is what you need.
As we group by neighbourhood we do not need DISTINCT.
We take sum(offers) for total_offers directly from the table and bids from a sub-query which we join to so that it is grouped by neighbourhood.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY neighborhood) s
ON h.neighborhood = s.neighborhood
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;
Or the following which modifies more the initial query but may be more like what you need.
SELECT
h.neighborhood,
DATE(h.date_of_sale) AS date_,
s.bids AS bid_ask_off,
SUM(h.offers) AS total_offers,
FROM
`big_query.a_table_name.houseprices` h
LEFT JOIN
(SELECT
date_of_sale dos,
neighborhood,
SUM(offers) AS bids
FROM
`big_query.a_table_name.houseprices`
WHERE offers * accepted_sale_price < 5000000
AND offers < 15
AND (house_bid = sale_price OR
house_ask = sale_price)
GROUP BY
neighborhood,
date_of_sale) s
ON h.neighborhood = s.neighborhood
AND h.date_of_sale = s.dos
GROUP BY
h.neighborhood,
DATE(date_of_sale),
s.bids
LIMIT 100;

Getting the last 50 rows for each group in group by

I have this query but it is only showing the last 5 rows instead of limiting the amount of rows the group by gets
I only want the last 50 rows for each person to be sum and in the group.
SELECT playerid, SUM(gamesplayed) AS totalgames, SUM(playtimes) AS playtimeTotal, SUM(Kills) AS totalkills
FROM plugin_game
WHERE gamesplayed=1
GROUP BY playerid
ORDER BY totalkills DESC
LIMIT 50
playerid totalgames playtimeTotal totalkills
797749 8 3076 678
53854 8 5982 635
24398 8 3277 575
464657 4 1325 387
65748 4 3390 368
651532 4 3219 354
287378 6 3893 350
753808 4 2565 323
731631 4 1733 256
665338 4 1971 255
569648 2 2041 244
56488 4 2636 157
006985 3 785 93
58640 1 432 72
If i change the LIMIT to 5 it only shows
playerid totalgames playtimeTotal totalkills
797749 8 3076 678
53854 8 5982 635
24398 8 3277 575
464657 4 1325 387
65748 4 3390 368
so if we use 5 games as an example, i only want to get the SUM for the past 5 games for the group

This should work in postgre sql!
SELECT playerid,
SUM(gamesplayed) over w AS totalgames,
SUM(playtimes) over w AS playtimetotal,
SUM(kills) over w AS totalkills,
ROW_NUMBER() over w AS row
FROM plugin_game
window w AS (PARTITION BY playerid ORDER BY totalkills DESC)
WHERE gamesplayed=1 and row <=50

SQL iterative // recursive cte with conditions (substract from previous rows)

I have this query calculating how many products I have to produce to serve my pending orders and the components I need to produce them.
select
l.codart as SKU, --final product
e.codartc as Component, --piece of final product
e.unicompo, --Components needed for each SKU
l1.SKU_pending - s.SKU_STOCK as "SKU to produce",
s2.C_STOCK as "Component stock",
s2.C_STOCK - sum((l1.SKU_pending - s.SKU_STOCK) * e.unicompo)
over (partition by e.codartc order by l.codart) as "Component stock after producing"
from linepedi l --table with sales orders
left join escandallo e on e.codartp = l.codart --table with SKU components
inner join (select l1.codart, sum(l1.unidades - l1.uniservida - l1.unianulada) as "SKU_pending" --pending sales. I called it from a subquery so I don't have to repeat the calculation each time I need it
from linepedi l1
where (l1.unidades - l1.uniservida - l1.unianulada) > 0
group by l1.codart) l1 on l1.CODART = l.codart
left join (select s.codart, sum(s.unidades) as "SKU_STOCK"
from __STOCKALMART s
group by s.codart) s on s.codart = l.codart
left join (select s.codart, sum(s.unidades) as "C_STOCK"
from __STOCKALMART s
group by s.codart) s2 on s2.codart = e.codartc
where l1.SKU_pending - s.SKU_STOCK > 0
group by l.codart, e.codartc, e.unicompo, l1.SKU_pending, s.SKU_STOCK, s2.C_STOCK
order by l.codart
Query returns next table:
SKU
Component
unicompo
SKU to produce
Component stock
Component stock after producing
20611
286
1
50
2021
1971
20611
329
1
50
2759
2709
20611
ARTZD031
1
50
643
593
220178
ARTZD027
1
384
477
93
220178
SICBB005
1
384
845
461
220178
265
1
384
894
510
220185
265
1
200
894
310
220185
SICBB005
1
200
845
261
220185
ARTZD028
1
200
71
-129
220192
ARTZD029
1
200
364
164
220192
SICBB005
1
200
845
61
220192
265
1
200
894
110
When Component stock after producing returns less than 0, I don't want it to substract the SKU to produce, but the mininum Component stock for that SKU, while "saving" this value for the next time I need the same component. I think I would need to make an iteration with conditionals.
This is what I'd like to accomplish:
SKU
Component
unicompo
SKU to produce
Component stock
Component stock after producing
20611
286
1
50
2021
1971
20611
329
1
50
2759
2709
20611
ARTZD031
1
50
643
593
220178
ARTZD027
1
384
477
93
220178
SICBB005
1
384
845
461
220178
265
1
384
894
510
220185
265
1
200
894
439
220185
SICBB005
1
200
845
390
220185
ARTZD028
1
200
71
0
220192
ARTZD029
1
200
364
164
220192
SICBB005
1
200
845
190
220192
265
1
200
894
239
I've been reading some articles and I feel like it might be done with a recursive CTE, but I don't really know how since I didn't find any example similar to mine.
How can achieve this? Any help will be appreciated. Thank you very much

Get nearest date column value from another table in SQL Server

I have two tables A and B,
Table A
PstngDate WorkingDayOutput
12/1/2020 221
12/3/2020 327
12/4/2020 509
12/5/2020 418
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382
12/28/2020 608
Table B
PstngDate HolidayOutput isWorkingDay
12/2/2020 20 0
12/6/2020 24 0
12/13/2020 31 0
12/19/2020 82 0
12/22/2020 507 0
12/27/2020 537 0
Expected output:
PstngDate WorkingDayOutput HolidayOutput
12/1/2020 221 20
12/3/2020 327
12/4/2020 509
12/5/2020 418 24
12/7/2020 390
12/8/2020 431
12/9/2020 244
12/10/2020 246
12/11/2020 314
12/12/2020 301 31
12/14/2020 411
12/15/2020 530
12/16/2020 554
12/17/2020 300
12/18/2020 375 589
12/23/2020 402
12/24/2020 302
12/25/2020 269
12/26/2020 382 537
12/28/2020 608
I want to join TableB to TableA with nearest lesser date column. If you see Expectedoutput table, day 18 row of holidayoutput column is taking sum of day19 and day22 of table B.

I want to join TableB to TableA with nearest lesser date column
This sounds like a lateral join:
select a.*, coalesce(b.holidayquantity, 0) as holidayquantity
from a
outer apply (
select top (1) b.*
from b
where b.pstng_date >= a.pstng_date
order by b.pstng_date
) b

You can use self left join as follows:
Select pstng_date, workingDayQuantity,
HolidayQuantity,
workingDayQuantity + HolidayQuantity as total
From
(Select a.*, b.HolidayQuantity,
Row_number() over (partirion by a.psrng_date order by b.pstng_date) ad rn
From tablea a join tableb b On b.pstng_date > a.pstng_date) t
Where rn=1

SQL Query: How to pull counts of two coulmns from respective tables

Given two tables:
1st Table Name: FACETS_Business_NPI_Provider
Buss_ID NPI Bussiness_Desc
11 222 Eleven 222
12 223 Twelve 223
13 224 Thirteen 224
14 225 Fourteen 225
11 226 Eleven 226
12 227 Tweleve 227
12 228 Tweleve 228
2nd Table : FACETS_PROVIDERs_Practitioners
NPI PRAC_NO PROV_NAME PRAC_NAME
222 943 P222 PR943
222 942 P222 PR942
223 931 P223 PR931
224 932 P224 PR932
224 933 P224 PR933
226 950 P226 PR950
227 951 P227 PR951
228 952 P228 PR952
228 953 P228 PR953
With below query I'm getting following results whereas it is expected to have the provider counts from table FACETS_Business_NPI_Provider (i.e. 3 instead of 4 for Buss_Id 12 and 2 instead of 3 for Buss_Id 11, etc).
SELECT BP.Buss_ID,
COUNT(BP.NPI) PROVIDER_COUNT,
COUNT(PP.PRAC_NO)PRACTITIONER_COUNT
FROM FACETS_Business_NPI_Provider BP
LEFT JOIN FACETS_PROVIDERs_Practitioners PP
ON PP.NOI=BP.NPI
group by BP.Buss_ID
Buss_ID PROVIDER_COUNT PRACTITIONER_COUNT
11 3 3
12 4 4
13 2 2
14 1 0

If I understood it correctly, you might want to add a DISTINCT clause to the columns.
Here is an SQL Fiddle, which we can probably use to discuss further.
http://sqlfiddle.com/#!2/d9a0e6/3

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL JOIN with 2 aggregates returning incorrect results - sql

Related

how to select a value based on multiple criteria

Getting the last 50 rows for each group in group by

SQL iterative // recursive cte with conditions (substract from previous rows)

Get nearest date column value from another table in SQL Server

SQL Query: How to pull counts of two coulmns from respective tables

Categories

Resources