How can I join the SUMS from 2 different tables into 1 - sql

I have 2 tables
Table 1 = LOG
Site Year Quarter SF Seats
------ ------ --------- ------ -------
NYC 2019 Q1 1000 34
NYC 2019 Q1 1289 98
CHI 2019 Q1 976 17
NYC 2019 Q2 3985 986
Table 2 = Headcount
Site Year Quarter HC
------ ------ --------- -------
NYC 2019 Q1 63
NYC 2019 Q1 34
CHI 2019 Q1 73
NYC 2019 Q2 23
I need to be able to join these tables together and display the sum of SF, Seats, and HC for each distinct Site, Quarter, and Year
For example the output should be:
Site Year Quarter HC SF Seats
------ ------ --------- ------- ------ -------
NYC 2019 Q1 97 2289 132
NYC 2019 Q2 23 3985 986
CHI 2019 Q1 73 976 17
Here is my SQL Query:
SELECT DISTINCT SITE,
YEAR,
QUARTER,
SEATS,
SF,
HC
FROM
(SELECT DISTINCT site SITE,
YEAR YEAR,
quarter QUARTER,
sum(SEATS) SEATS,
sum(SF) SF
FROM Headcount
GROUP BY SITE,
YEAR,
QUARTER) A
CROSS JOIN
(SELECT DISTINCT sum(HC) HC
FROM Headcount
GROUP BY site,
YEAR,
quarter, HC) C
But I am getting this error message "Column HC contains an aggregation function, which is not allowed in GROUP BY"
Any idea what I'm doing wrong and why this query isnt working?

The reason for the error is that in the last sub query you have HC in the group by clause, while you also aggregate with sum(HC). That is not allowed. It should be one or the other.
However, a cross join will combine all rows from the first sub query, with all rows from the second. Surely this is not what you need.
Also, distinct is not needed when you use group by. You cannot get duplicates with group by.
I would suggest using union all:
SELECT SITE,
YEAR,
QUARTER,
SUM(HC),
SUM(SEATS),
SUM(SF)
FROM (
SELECT SITE,
YEAR,
QUARTER,
HC,
null AS SEATS,
null AS SF
FROM Headcount
UNION ALL
SELECT SITE,
YEAR,
QUARTER,
null,
SEATS,
SF
FROM Log
) AS base
GROUP BY SITE,
YEAR,
QUARTER

With a N-M relationships between both tables, you would need to do the aggregation in subqueries, and then join the results together :
SELECT h.*, l.SF, l.Seats
FROM
(
SELECT site, year, quarter, SUM(SF) SF, SUM(Seats) Seats
FROM LOG
GROUP BY site, year, quarter
) l
INNER JOIN (
SELECT site, year, quarter, SUM(HC) HC
FROM Headcount
GROUP BY site, year, quarter
) h
ON h.site = l.site AND h.year = l.year AND h.quarter = l.quarter

Related

Find the Age and Name of the Youngest Player for Each Race

Table "participant":
ptcpt_id
ptcpt_name
brt_dt
1
Ana Perez
2001-10-10
2
John Sy
1999-04-03
3
Judy Ann
2001-10-10
Table "race":
race_id
race_name
race_date
1
Vroom Vroom
2023-01-01
2
Fast & Furious
2022-01-01
Table "individual_race_record":
irr_id
ptcpt_id
race_id
run_time
1
1
1
00:59:13
2
1
2
01:19:14
3
2
1
00:48:05
4
2
2
01:01:17
5
3
2
01:31:18
I want to select the name and age of the youngest participant for each race event, as well as the name and year of each race event.
This is what I have so far:
SELECT
r.race_name,
EXTRACT(YEAR FROM r.race_date) AS year,
COALESCE(CAST(min.age AS varchar), 'N/A')
FROM(
SELECT
race_id,
EXTRACT(YEAR FROM MIN(AGE(brt_dt))) AS age
FROM(
SELECT p.ptcpt_id, p.brt_dt, irr.race_id
FROM participant p
INNER JOIN individual_race_record irr
ON p.ptcpt_id = irr.ptcpt_id
) sub
GROUP BY race_id
) min
RIGHT JOIN race r ON r.race_id=min.race_id
ORDER BY year DESC
which resulted to the following table:
race_name
year
age
Vroom Vroom
2023
21
Fast & Furious
2022
21
But what I want is this:
race_name
year
age
ptcpt_name
Vroom Vroom
2023
21
Ana Perez
Fast & Furious
2022
21
Ana Perez
Fast & Furious
2022
21
Judy Ann
The problem is that I can't join it with the participant table. I still need another column for the name of the youngest participant. And if there are multiple youngest participant in a race, I'd like to show them both. When I try to select the ptcpt_id for the 'min' table it resulted to an error saying that I have to also include the ptcpt_id under the GROUP BY function. But I don't need it to be grouped by participants.
I'd appreciate any help and leads on this issue. Thank you.
You can use FETCH FIRST ROWS WITH TIES to gather all records that tie on the first ORDER BY field. Namely, if we use DENSE_RANK to assign a ranking to each person for each race, based on their age, it will allow to get all people with minimum age for each race. Since we're using DENSE_RANK, it will retrieve all people having the minimum age, if there's more than one.
SELECT r.race_name,
EXTRACT(YEAR FROM r.race_date) AS "year",
DATE_PART('year', r.race_date) - DATE_PART('year', p.brt_dt) AS age,
p.ptcpt_name
FROM participant p
INNER JOIN individual_race_record irr ON p.ptcpt_id = irr.ptcpt_id
INNER JOIN race r ON r.race_id = irr.race_id
ORDER BY DENSE_RANK() OVER(
PARTITION BY race_name
ORDER BY DATE_PART('year', r.race_date) - DATE_PART('year', p.brt_dt))
FETCH FIRST 1 ROWS WITH TIES
Output:
race_name
year
age
ptcpt_name
Fast & Furious
2022
21
Ana Perez
Fast & Furious
2022
21
Judy Ann
Vroom Vroom
2023
22
Ana Perez
Check the demo here.

Group by based on field length

I wanted to group number of ids that are of length of 4, 5, 6 bytes based on the year.
ID
year
name
location
geo
new_loc
addr 1
addr 2
addr 3
addr 4
12345
2019
bob
UK
UK-4
basic
dat1
dat11
dat13
dat123
19804
2004
sam
US
US-1
advanced
dat2
dat21
dat23
dat233
19
2000
lister
EU
EU
basic
dat3
dat31
dat33
dat333
190838
2004
harold
US
US-3
basic
dat4
dat41
dat53
dat533
11804
2019
beanie
SK
UK-2
advanced
NULL
NULL
NULL
NULL
Output
ID
year
name
location
new location
num_of_ids_each_year
12345
2019
bob
UK
basic
2
11804
2019
beanie
SK
advanced
2
19804
2004
sam
US
advanced
2
190838
2004
harold
US
basic
2
What I tried:
select ID, year, name, location, [new location], count(year)
from table1
group by ID, year, name, location, [new location], count(year);
Could someone advice on how to include only those ids that has more than 4,5,6 bytes
You can use COUNT() with Partition by Year to get the results without using GROUP BY.
SELECT ID, [year], [name], [location], [new location]
, COUNT(1) OVER (PARTITION BY year) AS num_of_ids_each_year
FROM table1
WHERE LEN(ID) IN (4,5,6)
Thanks #Squirrel, I finally made a way.
select id, Year, name, location, [new location],
count(id) over (partition by year) as num_of_ids_each_year
from table1 where len(id) in (4,5,6);
Please try aggregate function in having clause
e.g.
select ID,
year,
name,
location,
new location,
len(year)
from table1
group by ID, year, name, location, new location
having Len(year) >= 4

Is there a way to select sum on one column based on other DISTINCT column, while grouping by third column(date) only

I have three columns
year | money | id
2020 100 01
2020 100 01
2019 50 02
2018 50 03
2020 40 04
results should be
Year | Money | total people
2020 | 240 | 4
** AS first two ids are the same, I tried it as below
select year, sum(money), Count( Distinct id) from table
group by year
But the result shows 4 people which is the correct but wrong sum, as it is counting all of the money
You can aggregate and then aggregate again:
select max(year), sum(money), count(*)
from (select distinct year, money, id
from t
) t;
You can use SUM() and COUNT(DISTINCT x).
For example:
select
year,
sum(money) as money,
(select count(distinct id) from t) as total_people
from t
where year = 2020
group by year;
Result:
YEAR MONEY TOTAL_PEOPLE
----- ------ ------------
2020 240 4
See running example at db<>fiddle.
Not the most performant, but if you wish to avoid a derived table, you can do
select distinct
max(year) over (),
sum(money) over (),
count(*) over ()
from t
group by year, money, id;
And if you want this grouped by year, you can define the partitions in the over clause

SQL: Filter (rows) with maximum value within groups (columns)

I need to filter for rows based on maximum values of version within month and location. Using SQL.
For example, I have table below where there are version 1 and 2 of June & NYC, I wanted to filter for only the row of version 2 with revenue 11. Or for January & NYC, I wanted to get only the row with revenue 15.
Month Location Version Revenue
June NYC 1 10
June NYC 2 11
June LA 3 12
January NYC 1 13
January NYC 2 14
January NYC 3 15
January LA 1 16
January LA 2 17
Result:
Month Location Version Revenue
June NYC 2 11
June LA 3 12
January NYC 3 15
January LA 2 17
Edit to change name of column to Revenue to remove confusion. I do not need the max value of revenue, only revenue that goes with max version of that month and that location.
You can also use joins as an alternative to correlated subqueries, e.g.:
select t1.* from YourTable t1 inner join
(
select t2.month, t2.location, max(t2.version) as mv
from YourTable t2
group by t2.month, t2.location
) q on t1.month = q.month and t1.location = q.location and t1.version = q.mv
Change YourTable to the name of your table.
A typical method is filtering using a correlated subquery:
select t.*
from t
where t.version = (select max(t2.version)
from t t2
where t2.month = t.month and t2.location = t.location
);
Another alternative that minimizes subqueries is to use the row_number() window function. (You don't mention which database server you're using, but most of them support it.)
SELECT month, location, version, revenue
FROM (SELECT month, location, version, revenue
, row_number() OVER (PARTITION BY month, location ORDER BY version DESC) AS rn
FROM your_table)
WHERE rn = 1;

sql group by find all combinations of two columns distinct values

I have following table
ORDID EMPID ITEMCOST TIME
-------------------------------------
10023 B2690 675 1992
10024 C3467 8078 1992
10025 B2690 15481 1992
10026 C5621 22884 1992
10027 B2109 30287 1992
10030 B3297 52496 1993
10031 C3467 59899 1993
10032 F5621 67302 1993
10033 G3467 74705 1993
and so on many rows.....
I am trying to find out empid who purchased some item in each and every year.
in other words want to find out empid which exist in each and every year in that table.
BTB I am using Oracle 11g Express.
Thanks in advance.
You can do this with a having clause where you compare the number of distinct years for each empid to the number of distinct years in the data:
select empid
from followingtable
group by empid
having count(distinct time) = (select count(distinct time) from followingtable);
Below query will also work.
SELECT TAB.EMPID FROM
(
SELECT A.EMPID, COUNT(DISTINCT A.TIME) YEARCOUNT FROM MY_TABLE A GROUP BY EMPID
) TAB
WHERE TAB.YEARCOUNT = (SELECT COUNT(DISTINCT B.TIME) FROM MY_TABLE B)