How do I include duplicates? Tried HAVING - sql

I am trying to learn SQL, and I am doing a project based on a provided database about past Superbowls. I wrote the below code to try to return a "yes" or "no" (to show I know how to use CASE) in a new column for teams that beat their opponents by more than 14 points. It worked, in a sense, but only returned each winning team once, AKA removed duplicates for teams that have won multiple times, but I want it to return all duplicates to show all games, HELP! I tried a HAVING clause, but I didn't really know what to put...
display which teams have beaten their opponents by >=14 points
I have Tried this below query:
SELECT Winner, Winner_Pts, Loser, Loser_Pts,Date,
CASE
WHEN (AVG(Winner_Pts-Loser_Pts) >= 14) THEN "yes"
ELSE "no"
END as "won_by_more_than_14"
FROM superbowls
GROUP BY Winner
ORDER BY Winner_Pts DESC

In your scenario, there is no need to aggregate your data in order to find teams that have beaten their opponents by >= 14 points.
If you remove your AVG function and GROUP BY aggregation, you will return teams that have won the super bowl (and more than once); otherwise, your CASE statement is correct.
SELECT Winner,
Winner_Pts,
Loser,
Loser_Pts,
Date,
CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END AS "won_by_more_than_14"
FROM superbowls
ORDER BY Winner_Pts DESC
You can even add your CASE statement to a WHERE clause to only SELECT rows for teams than won by more than 14 points.
WHERE (CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END) = "Yes"
Input Data:
ID
Number
Winner
Winner_Pts
Loser
Loser_Pts
Date
1
LVI
Rams
23
Bengals
20
2022-02-13 00:00:00
2
LV
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
3
LVI
Chiefs
31
49ers
20
2022-02-02 00:00:00
4
LII
Eagles
41
Patriots
33
2018-02-04 00:00:00
5
50
Broncos
24
Panthers
10
2016-02-07 00:00:00
6
XLVIII
Seahawks
43
Denver
8
2014-02-02 00:00:00
7
XXXIV
Rams
23
Titans
16
2000-01-30 00:00:00
8
LIII
Patriots
13
Rams
3
2019-02-03 00:00:00
9
LI
Patriots
34
Falcons
28
2017-02-05 00:00:00
Output Data:
Winner
Winner_Pts
Loser
Loser_Pts
Date
won_by_more_than_14
Seahawks
43
Denver
8
2014-02-02 00:00:00
Yes
Eagles
41
Patriots
33
2018-02-04 00:00:00
No
Patriots
34
Falcons
28
2017-02-05 00:00:00
No
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
Yes
Chiefs
31
49ers
20
2022-02-02 00:00:00
No
Broncos
24
Panthers
10
2016-02-07 00:00:00
Yes
Rams
23
Bengals
20
2022-02-13 00:00:00
No
Rams
23
Titans
16
2000-01-30 00:00:00
No
Patriots
13
Rams
3
2019-02-03 00:00:00
No
See Fiddle here.
Details:
Removing the AVG() function will get the results you want, but that doesn't mean AVG() isn't useful, especially for sports data. If you did want to aggregate your data, please see the following:
The AVG() function is used to find the average of values over records from a table. AVG() belongs to a class of functions known as aggregate functions. An aggregate function returns a single computed result over multiple rows:
Aggregate Function
Example Use Case
SUM()
Find the sum of points by team.
COUNT()
Find the number of bowls by each team.
MAX()
Find the highest point value by each team.
MIN()
Find the lowest point value by each team.
AVG()
Find the average points by team.
The SQL GROUP BY clause is used to group rows together. In most cases, a GROUP BY clause has one or more aggregate functions that calculate one or more metrics for the group.
Let's take this example here, I'm simply returning all Winners and their points:
SELECT
Winner,
Winner_Pts AS 'points'
FROM superbowls
ORDER BY Winner_Pts DESC
Winner
points
Seahawks
43
Eagles
41
Patriots
34
Buccaneers
31
Chiefs
31
Broncos
24
Rams
23
Rams
23
Patriots
13
Now let's aggregate it by Winner to find the average points:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Buccaneers
31
Chiefs
31
Broncos
24
Patriots
24
Rams
23
As you can see between the two queries above, the Rams and Patriots only have a single row (GROUPED BY Winner), and the average is:
(23+23)/2 = 23 (Rams)
(13+34)/2 = 23.5 - Rounded to 24 (Patriots)
(41)/1 = 41 (Eagles)
Source.
If you want to filter your data using GROUP BY, use HAVING. This is different from the WHERE clause because the GROUP BY clause runs after WHERE clauses which means that you can only use WHERE on “raw” data and not on aggregated values. You need to use HAVING on aggregated metrics.
The primary use of the HAVING operation is to filter aggregated data.
You can use it when you summarize your data with GROUP BY into new
metrics, and you want to select the results based on these new values.
Example:
Find teams with average winning points greater than 40:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
HAVING ROUND(AVG(Winner_Pts)) > 40
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Source.

Related

count number of records by month over the last five years where record date > select month

I need to show the number of valid inspectors we have by month over the last five years. Inspectors are considered valid when the expiration date on their certification has not yet passed, recorded as the month end date. The below SQL code is text of the query to count valid inspectors for January 2017:
SELECT Count(*) AS RecordCount
FROM dbo_Insp_Type
WHERE (dbo_Insp_Type.CERT_EXP_DTE)>=#2/1/2017#);
Rather than designing 60 queries, one for each month, and compiling the results in a final table (or, err, query) are there other methods I can use that call for less manual input?
From this sample:
Id
CERT_EXP_DTE
1
2022-01-15
2
2022-01-23
3
2022-02-01
4
2022-02-03
5
2022-05-01
6
2022-06-06
7
2022-06-07
8
2022-07-21
9
2022-02-20
10
2021-11-05
11
2021-12-01
12
2021-12-24
this single query:
SELECT
Format([CERT_EXP_DTE],"yyyy/mm") AS YearMonth,
Count(*) AS AllInspectors,
Sum(Abs([CERT_EXP_DTE] >= DateSerial(Year([CERT_EXP_DTE]), Month([CERT_EXP_DTE]), 2))) AS ValidInspectors
FROM
dbo_Insp_Type
GROUP BY
Format([CERT_EXP_DTE],"yyyy/mm");
will return:
YearMonth
AllInspectors
ValidInspectors
2021-11
1
1
2021-12
2
1
2022-01
2
2
2022-02
3
2
2022-05
1
0
2022-06
2
2
2022-07
1
1
ID
Cert_Iss_Dte
Cert_Exp_Dte
1
1/15/2020
1/15/2022
2
1/23/2020
1/23/2022
3
2/1/2020
2/1/2022
4
2/3/2020
2/3/2022
5
5/1/2020
5/1/2022
6
6/6/2020
6/6/2022
7
6/7/2020
6/7/2022
8
7/21/2020
7/21/2022
9
2/20/2020
2/20/2022
10
11/5/2021
11/5/2023
11
12/1/2021
12/1/2023
12
12/24/2021
12/24/2023
A UNION query could calculate a record for each of 50 months but since you want 60, UNION is out.
Or a query with 60 calculated fields using IIf() and Count() referencing a textbox on form for start date:
SELECT Count(IIf(CERT_EXP_DTE>=Forms!formname!tbxDate,1,Null)) AS Dt1,
Count(IIf(CERT_EXP_DTE>=DateAdd("m",1,Forms!formname!tbxDate),1,Null) AS Dt2,
...
FROM dbo_Insp_Type
Using the above data, following is output for Feb and Mar 2022. I did a test with Cert_Iss_Dte included in criteria and it did not make a difference for this sample data.
Dt1
Dt2
10
8
Or a report with 60 textboxes and each calls a DCount() expression with criteria same as used in query.
Or a VBA procedure that writes data to a 'temp' table.

How can I group and get MS Access query to show only rows with a maximum value in a specified field for a consecutive number of times?

I have a large access table that I need to pull specific data from with a query.
I need to get a list of all the IDs that meet a specific criteria, i.e. 3 months in a row with a cage number less than 50.
The SQL code I'm currently working with is below, but it only gives me which months of the past 3 had a cage number below 50.
SELECT [AbBehWeeklyMonitor Database].AnimalID, [AbBehWeeklyMonitor Database].Date, [AbBehWeeklyMonitor Database].Cage
FROM [AbBehWeeklyMonitor Database]
WHERE ((([AbBehWeeklyMonitor Database].Date)>=DateAdd("m",-3,Date())) AND (([AbBehWeeklyMonitor Database].Cage)<50))
ORDER BY [AbBehWeeklyMonitor Database].AnimalID DESC;
I would need it to look at the past 3 months for each ID, and only output if all 3 met the specific criteria, but I'm not sure where to go from here.
Any help would be appreciated.
Data Sample:
Date
AnimalID
Cage
6/28/2022
12345
50
5/19/2021
12345
32
3/20/2008
12345
75
5/20/2022
23569
4
8/20/2022
23569
4
5/20/2022
44444
71
8/1/2012
44444
4
4/1/2022
78986
30
1/20/2022
78986
1
9/14/2022
65659
59
8/10/2022
65659
48
7/14/2022
65659
30
6/14/2022
95659
12
8/14/2022
91111
51
7/14/2022
91111
5
6/14/2022
91111
90
8/14/2022
88888
4
7/14/2022
88888
5
6/14/2022
88888
15
Consider:
Query1:
SELECT AnimalID, Count(*) AS Cnt
FROM Table1
WHERE (((Cage)<50) AND (([Date]) Between #6/1/2022# And #8/31/2022#))
GROUP BY AnimalID
HAVING (((Count(*))=3));
Query2
SELECT Table1.*
FROM Query1 INNER JOIN Table1 ON Query1.AnimalID = Table1.AnimalID
WHERE ((([Date]) Between #6/1/2022# And #8/31/2022#));
Output:
Date AnimalID Cage
6/14/2022 65659 12
7/14/2022 65659 30
8/10/2022 65659 48
6/14/2022 88888 15
7/14/2022 88888 5
8/14/2022 88888 4
Date is a reserved word and really should not use reserved words as names.

Google BigQuery select max for each day

I have a problem with my BigQuery select, I need to get the max value for the column (students) for each day.
SELECT EXTRACT(DATE FROM timestamp) as date, ARRAY_LENGTH(student_ids) as students from analytics.daily_active_students_count order by timestamp desc
Row
Date
Students
1
2022-05-16
72
2
2022-05-16
33
3
2022-05-16
12
4
2022-05-15
10
5
2022-05-15
84
6
2022-05-15
8
7
2022-05-14
92
8
2022-05-14
105
9
2022-05-14
12
Query should remove duplicated rows for days and take only rows with max number of students.
I want my output looks like this:
Row
Date
Students
1
2022-05-16
72
2
2022-05-15
84
3
2022-05-14
105
Problem was that my backend used different timezone. I solved this issue with casting timestamp:
DATE(DATETIME(TIMESTAMP, "Europe/Zagreb")) as date

calculate 3 days rolling average in sql stored in Google Big Query

My data is stored in Google Big QUery in a database. This is how my table looks like.
IP Age Sex Province Epid_ID
19/05/2020 43 Female Bagmati KTM-20-00206
18/05/2020 33 Male Province1 KTM-20-00205
18/05/2020 30 Male Province1 KTM-20-00204
18/05/2020 32 Male Province1 KTM-20-00203
18/05/2020 63 Male Province1 KTM-20-00202
17/05/2020 33 Male Province2 KTM-20-00201
17/05/2020 23 Male Province2 KTM-20-00200
16/05/2020 22 Male Province2 KTM-20-00199
16/05/2020 23 Male Province2 KTM-20-00198
Here, EpiD_ID is my unique ID. I want to calculate 3 days rolling average for each date. Following is my expected output.
Date Count_Epid_ID 2_days_rolling_avg
16/05/2020 2 0
17/05/2020 2 0
18/05/2020 4 2.66
19/05/2020 1 2.33
Explanation: 0 for the first 2 days and we start calculating the rolling average from the 3rd day. For 18/05/2020, 2.66= (2+2+4)/3, 2.33 = (2+4+1)/3
I tried to use the following question. However, I was not successful.
This is the Query I wrote which would only give me count of epid and not rolling average.
SELECT
IP,
COUNT(*) AS num,
FROM
interim-data.casedata.Interim Reloaded
GROUP BY
IP
You can use window functions -- assuming you have data on every day:
SELECT IP, COUNT(*) AS num,
AVG(COUNT(*)) OVER (ORDER BY IP ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
FROM interim-data.casedata.Interim Reloaded
GROUP BY IP;
It seems strange that a column called IP has a date value, but that seems to be how your data is modelled.

Select Only one row per user and date

i want to select only one row per user and date
so if the data like this
ID User Date
25 3597 2014-09-04 13:37:12.953
26 2100 2014-09-04 13:37:29.820
27 3597 2014-09-04 13:38:12.953
28 2100 2014-09-04 13:38:29.820
29 3597 2014-09-05 13:40:12.953
30 2100 2014-09-05 13:40:29.820
the result should be 4
The result should be 4
If all you need is the count, in SQL, you can use COUNT(DISTINCT), like this:
SELECT COUNT(DISTINCT User, Date) FROM MyTable
In LINQ, you can use GroupBy followed by Count:
int cnt = src.Items.GroupBy(item => new {i.User, i.Date}).Count();