Get count per year of data with begin and end dates - sql

I have a set of data that lists each employee ever employed in a certain type of department at many cities, and it lists each employee's begin and end date.
For example:
name city_id start_date end_date
-----------------------------------------
Joe Public 54 3-19-1994 9-1-2002
Suzi Que 54 10-1-1995 9-1-2005
What I want is each city's employee count for each year in a particular period. For example, if this was all the data for city 54, then I'd show this as the query results if I wanted to show city 54's employee count for the years 1990-2005:
city_id year employee_count
-----------------------------
54 1990 0
54 1991 0
54 1992 0
54 1993 0
54 1994 1
54 1995 2
54 1996 2
54 1997 2
54 1998 2
54 1999 2
54 2000 2
54 2001 2
54 2002 2
54 2003 1
54 2004 1
54 2005 1
(Note that I will have many cities, so the primary key here would be city and year unless I want to have a separate id column.)
Is there an efficient SQL query to do this? All I can think of is a series of UNIONed queries, with one query for each year I wanted to get numbers for.
My dataset has a few hundred cities and 178,000 employee records. I need to find a few decades' worth of this yearly data for each city on my dataset.

replace 54 with your parameter
select
<city_id>, c.y, count(t.city_id)
from generate_series(1990, 2005) as c(y)
left outer join Table1 as t on
c.y between extract(year from t.start_date) and extract(year from t.end_date) and
t.city_id = <city_id>
group by c.y
order by c.y
sql fiddle demo

Related

How do I include duplicates? Tried HAVING

I am trying to learn SQL, and I am doing a project based on a provided database about past Superbowls. I wrote the below code to try to return a "yes" or "no" (to show I know how to use CASE) in a new column for teams that beat their opponents by more than 14 points. It worked, in a sense, but only returned each winning team once, AKA removed duplicates for teams that have won multiple times, but I want it to return all duplicates to show all games, HELP! I tried a HAVING clause, but I didn't really know what to put...
display which teams have beaten their opponents by >=14 points
I have Tried this below query:
SELECT Winner, Winner_Pts, Loser, Loser_Pts,Date,
CASE
WHEN (AVG(Winner_Pts-Loser_Pts) >= 14) THEN "yes"
ELSE "no"
END as "won_by_more_than_14"
FROM superbowls
GROUP BY Winner
ORDER BY Winner_Pts DESC
In your scenario, there is no need to aggregate your data in order to find teams that have beaten their opponents by >= 14 points.
If you remove your AVG function and GROUP BY aggregation, you will return teams that have won the super bowl (and more than once); otherwise, your CASE statement is correct.
SELECT Winner,
Winner_Pts,
Loser,
Loser_Pts,
Date,
CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END AS "won_by_more_than_14"
FROM superbowls
ORDER BY Winner_Pts DESC
You can even add your CASE statement to a WHERE clause to only SELECT rows for teams than won by more than 14 points.
WHERE (CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END) = "Yes"
Input Data:
ID
Number
Winner
Winner_Pts
Loser
Loser_Pts
Date
1
LVI
Rams
23
Bengals
20
2022-02-13 00:00:00
2
LV
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
3
LVI
Chiefs
31
49ers
20
2022-02-02 00:00:00
4
LII
Eagles
41
Patriots
33
2018-02-04 00:00:00
5
50
Broncos
24
Panthers
10
2016-02-07 00:00:00
6
XLVIII
Seahawks
43
Denver
8
2014-02-02 00:00:00
7
XXXIV
Rams
23
Titans
16
2000-01-30 00:00:00
8
LIII
Patriots
13
Rams
3
2019-02-03 00:00:00
9
LI
Patriots
34
Falcons
28
2017-02-05 00:00:00
Output Data:
Winner
Winner_Pts
Loser
Loser_Pts
Date
won_by_more_than_14
Seahawks
43
Denver
8
2014-02-02 00:00:00
Yes
Eagles
41
Patriots
33
2018-02-04 00:00:00
No
Patriots
34
Falcons
28
2017-02-05 00:00:00
No
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
Yes
Chiefs
31
49ers
20
2022-02-02 00:00:00
No
Broncos
24
Panthers
10
2016-02-07 00:00:00
Yes
Rams
23
Bengals
20
2022-02-13 00:00:00
No
Rams
23
Titans
16
2000-01-30 00:00:00
No
Patriots
13
Rams
3
2019-02-03 00:00:00
No
See Fiddle here.
Details:
Removing the AVG() function will get the results you want, but that doesn't mean AVG() isn't useful, especially for sports data. If you did want to aggregate your data, please see the following:
The AVG() function is used to find the average of values over records from a table. AVG() belongs to a class of functions known as aggregate functions. An aggregate function returns a single computed result over multiple rows:
Aggregate Function
Example Use Case
SUM()
Find the sum of points by team.
COUNT()
Find the number of bowls by each team.
MAX()
Find the highest point value by each team.
MIN()
Find the lowest point value by each team.
AVG()
Find the average points by team.
The SQL GROUP BY clause is used to group rows together. In most cases, a GROUP BY clause has one or more aggregate functions that calculate one or more metrics for the group.
Let's take this example here, I'm simply returning all Winners and their points:
SELECT
Winner,
Winner_Pts AS 'points'
FROM superbowls
ORDER BY Winner_Pts DESC
Winner
points
Seahawks
43
Eagles
41
Patriots
34
Buccaneers
31
Chiefs
31
Broncos
24
Rams
23
Rams
23
Patriots
13
Now let's aggregate it by Winner to find the average points:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Buccaneers
31
Chiefs
31
Broncos
24
Patriots
24
Rams
23
As you can see between the two queries above, the Rams and Patriots only have a single row (GROUPED BY Winner), and the average is:
(23+23)/2 = 23 (Rams)
(13+34)/2 = 23.5 - Rounded to 24 (Patriots)
(41)/1 = 41 (Eagles)
Source.
If you want to filter your data using GROUP BY, use HAVING. This is different from the WHERE clause because the GROUP BY clause runs after WHERE clauses which means that you can only use WHERE on “raw” data and not on aggregated values. You need to use HAVING on aggregated metrics.
The primary use of the HAVING operation is to filter aggregated data.
You can use it when you summarize your data with GROUP BY into new
metrics, and you want to select the results based on these new values.
Example:
Find teams with average winning points greater than 40:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
HAVING ROUND(AVG(Winner_Pts)) > 40
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Source.

How Do I retrieve most Recent record in different years With Date date in different table

I'm working with a database that isn't structured that well and need to retrieve the row with the latest month used in specific years. The main data is stored is stored in the member table and lists one row per member month. The Date for the member month is not specifically stored here but connected by a foreign Date_Key and linked to a Date table. This is where the column for the Year and Month can be derived based on the Date_Key specified in each table. Each row in the Date table represents 1 new month for a year and each of these rows has a unique sequential date_key.
I am using Microsoft SQL Server Studio as the environment
Member Table
MemberKey
Membe_ID
Date_Key
100
1234
89
101
1234
96
102
1234
97
103
1236
96
104
1236
97
Date Table
Date_Key
Year
Month
89
2020
10
90
2020
11
91
2020
12
92
2021
1
93
2021
2
94
2021
3
95
2021
4
96
2021
5
97
2021
6
Looking for the following Results
Member_ID
Year
Month
1234
2020
10
1234
2021
6
1236
2021
6
2020/11 is NOT a date. It is a year/month pair. But it seems like a simple aggregate - select year, max(month) group by year. You join and include member ID so you include that column in the GROUP BY clause to get one row per member per year.
select mbr.Member_ID, dts.Year, max(dts.Month) as Month
from dbo.Members as mbr
inner join dbo.Dates as dts on mbr.Date_Key = dts.Date_Key
group by mbr.Member_ID, dts.Year
order by mbr.Member_ID, dts.Year
;

Adding rows in a table from data that is not in a column

I'm trying to create a table to add all Medals won by the participant countries in the Olympics.
I scraped the data from Wikipedia and have something similar to this:
Year
Country_Name
Host_city
Host_Country
Gold
Silver
Bronze
1986
146
Los Angeles
United States
41
32
30
1986
67
Los Angeles
United States
12
12
12
And so on
I double-checked the data for some years, and it seems very accurate. The Country_Name has an ID because I have a Country_ID table that I created and updated the names with the ID:
Country_ID
Country_Name
1986
1
1986
2
So far so good. Now I want to create a new table where I'll have all countries in a specific year and the total medals for that country. I managed to easily do that for countries that participated in an edition, here's an example for the 1896 edition:
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1896 AND Year < 1900
Group By a.Country_Name, a.Year
And I'll have this table:
Country_ID
Year
Cumultative_Gold
Cumultative_Silver
Cumultative_Bronze
Total_Medals
6
1986
2
0
0
5
7
1986
2
1
2
5
35
1986
1
2
3
6
46
1986
5
4
2
11
49
1986
6
5
2
13
51
1986
2
3
2
7
52
1986
10
18
19
47
58
1986
2
1
3
6
85
1986
1
0
1
2
131
1986
1
2
0
3
146
1986
11
7
2
20
To add the other editions I just have to edit the dates, "Where a.Year >= 1900 AND Year < 1904", for example.
INSERT INTO Cumultative_Medals_by_Year(Country_ID, Year, Culmutative_Gold, Culmutative_Silver, Culmutative_Bronze, Total_Medals)
SELECT a.Country_Name, a.Year, SUM(a.Gold) As Cumultative_Gold, SUM(a.Silver) As Cumultative_Silver, SUM(a.Bronze) As Cumultative_Bronze, SUM(a.Gold) + SUM(a.Silver) + SUM(a.Bronze) AS Total_Medals
FROM Country_Medals a
Where a.Year >= 1900 AND Year < 1904
Group By a.Country_Name, a.Year
And the table will grow.
But I'd like to also add all the other countries for the year 1896. This way I'll have a full record of all countries. So for example, you see that Country 1 has no medals in the 1896 Olympic edition, but I'd like to also add it there, even if the sum becomes NULL (where I'll update with a 0).
Why do I want that? I'd like to do an Animated Bar Chart Race, and with the data I have, some counties go "away" from the race. For example, the US didn't participate in the 1980 Olympics, so for a brief moment, the Bar for the US in the chart goes away just to return in 1984 (when it participated again). Another example is the Soviet Union, even though they do not participate anymore, it's the second participant with most medals won (only behind the US), but as the country does not have more participation after 1988, the bar just goes away after that year. By keeping a record of medals for all countries in all editions would prevent that from happening.
I'm pretty sure there are lots of countries that have won metals that were not around in 1896. But if you want a row for every country and every year, then generate the rows you want using cross join. Then join in the available information:
select c.Country_Name, y.Year,
SUM(cm.Gold) As Cumulative_Gold,
SUM(cm.Silver) As Cumulative_Silver,
SUM(cm.Bronze) As Cumulative_Bronze,
COALESCE(SUM(cm.Gold), 0) + COALESCE(SUM(cm.Silver), 0) + COALESCE(SUM(cm.Bronze), 0) AS Total_Medals
from (select distinct year from Country_Medals) y cross join
(select distinct country_name from country_medals) c left join
country_medals cm
on cm.year = y.year and
cm.country_name = c.country_name
group By c.Country_Name, y.Year

Replace Id of one column by a name from another table while using the count statement?

I am trying to get the count of patients by province for my school project, I have managed to get the count and the Id of the province in a table but since I am using the count statement it will not let me use join to show the ProvinceName instead of the Id (it says it's not numerical).
Here is the schema of the two tables I am talking about
The content of the Province table is as follow:
ProvinceId
ProvinceName
ProvinceShortName
1
Terre-Neuve-et-Labrador
NL
2
Île-du-Prince-Édouard
PE
3
Nouvelle-Écosse
NS
4
Nouveau-Brunswick
NB
5
Québec
QC
6
Ontario
ON
7
Manitoba
MB
8
Saskatchewan
SK
9
Alberta
AB
10
Colombie-Britannique
BC
11
Yukon
YT
12
Territoires du Nord-Ouest
NT
13
Nunavut
NU
And here is n sample data from the Patient table (don't worry it's fake data!):
SS
FirstName
LastName
InsuranceNumber
InsuranceProvince
DateOfBirth
Sex
PhoneNumber
2
Doris
Patel
PATD778276
5
1977-08-02
F
514-754-6488
3
Judith
Doe
DOEJ7712917
5
1977-12-09
F
418-267-2263
4
Rosemary
Barrett
BARR05122566
6
2005-12-25
F
905-638-5062
5
Cody
Kennedy
KENC047167
10
2004-07-01
M
604-833-7712
I managed to get the patient count by province using the following statement:
select count(SS),InsuranceProvince
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by InsuranceProvince
which gives me the following table:
PatientCount
InsuranceProvince
13
1
33
2
54
3
4
4
608
5
1778
6
25
7
209
8
547
9
649
10
6
11
35
12
24
13
How can I replace the id's with the correct ProvinceShortName to get the following final result?
ProvinceName
PatientCount
NL
13
PE
33
NS
54
NB
4
QC
608
ON
1778
MB
25
SK
209
AB
547
BC
649
YT
6
NT
35
NU
24
Thanks in advance!
So you can actually just specify that in the select. Note that it's best practise to include the thing you group by in the select, but since your question is so specific then...
SELECT ProvinceShortName, COUNT(SS) AS PatientsInProvince
FROM Patient
JOIN Province ON Patient.InsuranceProvince=Province.ProvinceId
GROUP BY InsuranceProvince;
I would suggest:
select pr.ProvinceShortName, count(*)
from Patient p join
Province pr
on p.InsuranceProvince = pr.ProvinceId
group by pr.ProvinceShortName
order by min(pr.ProvinceId);
Notes:
The key is including the columns you want in the select and group by.
You seem to want the results in province number order, so I included an order by.
There is no need to count the non-NULL values of SS. You might as well use count(*).
Table aliases make the query easier to write and to read.
I assume that you need to show the patient count by province.
SELECT
Province.ProvinceShortName AS [ProvinceName]
,COUNT(1) as [PatinetCount]
FROM Patient
RIGHT JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
GROUP BY ProvinceShortName
Just altering your query to
select ProvinceShortName As PatientCount,count(InsuranceProvince) As PatientCount
from Patient
full JOIN Province ON Patient.InsuranceProvince = Province.ProvinceId
group by ProvinceShortName

MS Access selecting by year intervals

I have a table, where every row has its own date (year of purchase), I should select the purchases grouped into year intervals.
Example:
Zetor 1993
Zetor 1993
JOHN DEERE 2001
JOHN DEERE 2001
JOHN DEERE 2001
Means I have 2 zetor purchase in 1993 and 3 john deere purchase in 2001. I should select the count of the pruchases grouped into these year intervals:
<=1959
1960-1969
1970-1979
1980-1989
1990-1994
1995-1999
2000-2004
2004-2009
2010-2013
I have no idea how should I do this.
The result should look like this on the example above:
<=1959
1960-1969 0
1970-1979 0
1980-1989 0
1990-1994 2
1995-1999 0
2000-2004 3
2004-2009 0
2010-2013 0
Create table with intervals:
tblRanges([RangeName],[Begins],[Ends])
Populate it with your intervals
Use GROUP BY with your table tblPurchases([Item],YearOfDeal):
SELECT tblRanges.RangeName, Count(tblPurchases.YearOfDeal)
FROM tblRanges INNER JOIN tblPurchases ON (tblRanges.Begins <= tblPurchases.Year) AND (tblRanges.Ends >= tblPurchases.YearOfDeal)
GROUP BY tblRanges.RangeName;
You may wish to consider Partition for future use:
SELECT Partition([Year],1960,2014,10) AS [Group], Count(Stock.Year) AS CountOfYear
FROM Stock
GROUP BY Partition([Year],1960,2014,10)
Input:
Tractor Year
Zetor 1993
Zetor 1993
JOHN DEERE 2001
JOHN DEERE 2001
JOHN DEERE 2001
Pre 59 1945
1960 1960
Result:
Group CountOfYear
:1959 1
1960:1969 1
1990:1999 2
2000:2009 3
Reference: http://office.microsoft.com/en-ie/access-help/partition-function-HA001228892.aspx