Select Only one row per user and date - sql

i want to select only one row per user and date
so if the data like this
ID User Date
25 3597 2014-09-04 13:37:12.953
26 2100 2014-09-04 13:37:29.820
27 3597 2014-09-04 13:38:12.953
28 2100 2014-09-04 13:38:29.820
29 3597 2014-09-05 13:40:12.953
30 2100 2014-09-05 13:40:29.820
the result should be 4

The result should be 4
If all you need is the count, in SQL, you can use COUNT(DISTINCT), like this:
SELECT COUNT(DISTINCT User, Date) FROM MyTable
In LINQ, you can use GroupBy followed by Count:
int cnt = src.Items.GroupBy(item => new {i.User, i.Date}).Count();

Related

How do I include duplicates? Tried HAVING

I am trying to learn SQL, and I am doing a project based on a provided database about past Superbowls. I wrote the below code to try to return a "yes" or "no" (to show I know how to use CASE) in a new column for teams that beat their opponents by more than 14 points. It worked, in a sense, but only returned each winning team once, AKA removed duplicates for teams that have won multiple times, but I want it to return all duplicates to show all games, HELP! I tried a HAVING clause, but I didn't really know what to put...
display which teams have beaten their opponents by >=14 points
I have Tried this below query:
SELECT Winner, Winner_Pts, Loser, Loser_Pts,Date,
CASE
WHEN (AVG(Winner_Pts-Loser_Pts) >= 14) THEN "yes"
ELSE "no"
END as "won_by_more_than_14"
FROM superbowls
GROUP BY Winner
ORDER BY Winner_Pts DESC
In your scenario, there is no need to aggregate your data in order to find teams that have beaten their opponents by >= 14 points.
If you remove your AVG function and GROUP BY aggregation, you will return teams that have won the super bowl (and more than once); otherwise, your CASE statement is correct.
SELECT Winner,
Winner_Pts,
Loser,
Loser_Pts,
Date,
CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END AS "won_by_more_than_14"
FROM superbowls
ORDER BY Winner_Pts DESC
You can even add your CASE statement to a WHERE clause to only SELECT rows for teams than won by more than 14 points.
WHERE (CASE
WHEN (Winner_Pts-Loser_Pts) >= 14 THEN "Yes"
ELSE "No"
END) = "Yes"
Input Data:
ID
Number
Winner
Winner_Pts
Loser
Loser_Pts
Date
1
LVI
Rams
23
Bengals
20
2022-02-13 00:00:00
2
LV
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
3
LVI
Chiefs
31
49ers
20
2022-02-02 00:00:00
4
LII
Eagles
41
Patriots
33
2018-02-04 00:00:00
5
50
Broncos
24
Panthers
10
2016-02-07 00:00:00
6
XLVIII
Seahawks
43
Denver
8
2014-02-02 00:00:00
7
XXXIV
Rams
23
Titans
16
2000-01-30 00:00:00
8
LIII
Patriots
13
Rams
3
2019-02-03 00:00:00
9
LI
Patriots
34
Falcons
28
2017-02-05 00:00:00
Output Data:
Winner
Winner_Pts
Loser
Loser_Pts
Date
won_by_more_than_14
Seahawks
43
Denver
8
2014-02-02 00:00:00
Yes
Eagles
41
Patriots
33
2018-02-04 00:00:00
No
Patriots
34
Falcons
28
2017-02-05 00:00:00
No
Buccaneers
31
Chiefs
9
2021-02-07 00:00:00
Yes
Chiefs
31
49ers
20
2022-02-02 00:00:00
No
Broncos
24
Panthers
10
2016-02-07 00:00:00
Yes
Rams
23
Bengals
20
2022-02-13 00:00:00
No
Rams
23
Titans
16
2000-01-30 00:00:00
No
Patriots
13
Rams
3
2019-02-03 00:00:00
No
See Fiddle here.
Details:
Removing the AVG() function will get the results you want, but that doesn't mean AVG() isn't useful, especially for sports data. If you did want to aggregate your data, please see the following:
The AVG() function is used to find the average of values over records from a table. AVG() belongs to a class of functions known as aggregate functions. An aggregate function returns a single computed result over multiple rows:
Aggregate Function
Example Use Case
SUM()
Find the sum of points by team.
COUNT()
Find the number of bowls by each team.
MAX()
Find the highest point value by each team.
MIN()
Find the lowest point value by each team.
AVG()
Find the average points by team.
The SQL GROUP BY clause is used to group rows together. In most cases, a GROUP BY clause has one or more aggregate functions that calculate one or more metrics for the group.
Let's take this example here, I'm simply returning all Winners and their points:
SELECT
Winner,
Winner_Pts AS 'points'
FROM superbowls
ORDER BY Winner_Pts DESC
Winner
points
Seahawks
43
Eagles
41
Patriots
34
Buccaneers
31
Chiefs
31
Broncos
24
Rams
23
Rams
23
Patriots
13
Now let's aggregate it by Winner to find the average points:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Buccaneers
31
Chiefs
31
Broncos
24
Patriots
24
Rams
23
As you can see between the two queries above, the Rams and Patriots only have a single row (GROUPED BY Winner), and the average is:
(23+23)/2 = 23 (Rams)
(13+34)/2 = 23.5 - Rounded to 24 (Patriots)
(41)/1 = 41 (Eagles)
Source.
If you want to filter your data using GROUP BY, use HAVING. This is different from the WHERE clause because the GROUP BY clause runs after WHERE clauses which means that you can only use WHERE on “raw” data and not on aggregated values. You need to use HAVING on aggregated metrics.
The primary use of the HAVING operation is to filter aggregated data.
You can use it when you summarize your data with GROUP BY into new
metrics, and you want to select the results based on these new values.
Example:
Find teams with average winning points greater than 40:
SELECT
Winner,
ROUND(AVG(Winner_Pts)) AS 'avg_points'
FROM superbowls
GROUP BY Winner
HAVING ROUND(AVG(Winner_Pts)) > 40
ORDER BY ROUND(AVG(Winner_Pts)) DESC
Winner
avg_points
Seahawks
43
Eagles
41
Source.

SELECTING records based on unique date and counting how many records on that date

I have a table that I'm going to simplify. Here's what it looks like:
tid session pos dateOn
-----------------------------------------------
1 23 0 12/24/2020 1:00:00
2 23 1 12/24/2020 1:01:23
3 12 0 12/24/2020 1:02:43
4 23 2 12/24/2020 1:04:01
5 23 3 12/24/2020 1:04:12
6 45 0 12/26/2020 4:23:15
This table tells me that there were 2 unique sessions 12/24/2020 and 1 on 12/26.
How do I write my SQL statement so that I get a result like this:
date recordCount
----------------------------
12/24/2020 2
12/26/2020 1
You should simply be able to convert to a date and aggregate:
select convert(date, dateon), count(distinct session)
from t
group by convert(date, dateon)
order by convert(date, dateon);

SQL how to count but only count one instance if two columns match?

Wondering how to select from a table:
FIELDID personID purchaseID dateofPurchase
--------------------------------------------------
2 13 147 2014-03-21 00:00:00
3 15 165 2015-03-23 00:00:00
4 13 456 2018-03-24 00:00:00
5 1 133 2018-03-21 00:00:00
6 23 123 2013-03-22 00:00:00
7 25 456 2013-03-21 00:00:00
8 25 456 2013-03-23 00:00:00
9 22 456 2013-03-28 00:00:00
10 25 589 2013-03-21 00:00:00
11 82 147 1991-10-22 00:00:00
12 82 453 2003-03-22 00:00:00
I'd like to get a result table of two columns: weekday and the number of purchases of each weekday, but only count the distinct days of purchases if done by the same person on the same day - for example since personID 25 purchased two things on 2013-03-21, that should only count as one 'thursday' instead of 2.
Basically, if the personID and the dateofPurchase are the same for more than one row, only count it once is what I want.
Here is what I have currently: It does everything correctly except it will count the above scenario under the thursday twice, when I would only want to add one:
SELECT v.wkday as day, COUNT(*) as 'absences'
FROM dbo.AttendanceRecord pr CROSS APPLY
(VALUES (CASE WHEN DATEPART(WEEKDAY, date) IN (1, 7)
THEN 'Weekend'
ELSE DATENAME(WEEKDAY, date)
END)
) v(wkday)
GROUP BY v.wkday;
to clarify:
If an item is purchased for at least one puchaseID on a specific day they will be counted as purchased for that day, and do not need to be counted again for each new purchase ID on that day.
I think you want to count distinct persons, so that would be:
COUNT(DISTINCT personid) as absences
Note that single quotes are not appropriate around column aliases. If you need to escape them, use square braces.
EDIT:
If you want to count distinct person-days, then you can use:
COUNT(DISTINCT CONCAT(personid, ':', dateofpurchase) as absences

How to calculate a running total that is a distinct sum of values

Consider this dataset:
id site_id type_id value date
------- ------- ------- ------- -------------------
1 1 1 50 2017-08-09 06:49:47
2 1 2 48 2017-08-10 08:19:49
3 1 1 52 2017-08-11 06:15:00
4 1 1 45 2017-08-12 10:39:47
5 1 2 40 2017-08-14 10:33:00
6 2 1 30 2017-08-09 07:25:32
7 2 2 32 2017-08-12 04:11:05
8 3 1 80 2017-08-09 19:55:12
9 3 2 75 2017-08-13 02:54:47
10 2 1 25 2017-08-15 10:00:05
I would like to construct a query that returns a running total for each date by type. I can get close with a window function, but I only want the latest value for each site to be summed for the running total (a simple window function will not work because it sums all values up to a date--not just the last values for each site). So I guess it could be better described as a running distinct total?
The result I'm looking for would be like this:
type_id date sum
------- ------------------- -------
1 2017-08-09 06:49:47 50
1 2017-08-09 07:25:32 80
1 2017-08-09 19:55:12 160
1 2017-08-11 06:15:00 162
1 2017-08-12 10:39:47 155
1 2017-08-15 10:00:05 150
2 2017-08-10 08:19:49 48
2 2017-08-12 04:11:05 80
2 2017-08-13 02:54:47 155
2 2017-08-14 10:33:00 147
The key here is that the sum is not a running sum. It should only be the sum of the most recent values for each site, by type, at each date. I think I can help explain it by walking through the result set I've provided above. For my explanation, I'll walk through the original data chronologically and try to explain the expected result.
The first row of the result starts us off, at 2017-08-09 06:49:47, where chronologically, there is only one record of type 1 and it is 50, so that is our sum for 2017-08-09 06:49:47.
The second row of the result is at 2017-08-09 07:25:32, at this point in time we have 2 unique sites with values for type_id = 1. They have values of 50 and 30, so the sum is 80.
The third row of the result occurs at 2017-08-09 19:55:12, where now we have 3 sites with values for type_id = 1. 50 + 30 + 80 = 160.
The fourth row is where it gets interesting. At 2017-08-11 06:15:00 there are 4 records with a type_id = 1, but 2 of them are for the same site. I'm only interested in the most recent value for each site so the values I'd like to sum are: 30 + 80 + 52 resulting in 162.
The 5th row is similar to the 4th since the value for site_id:1, type_id:1 has changed again and is now 45. This results in the latest values for type_id:1 at 2017-08-12 10:39:47 are now: 30 + 80 + 45 = 155.
Reviewing the 6th row is also interesting when we consider that at 2017-08-15 10:00:05, site 2 has a new value for type_id 1, which gives us: 80 + 45 + 25 = 150 for 2017-08-15 10:00:05.
You can get a cumulative total (running total) by including an ORDER BY clause in your window frame.
select
type_id,
date,
sum(value) over (partition by type_id order by date) as sum
from your_table;
The ORDER BY works because
The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
SELECT type_id,
date,
SUM(value) OVER (PARTITION BY type_id ORDER BY type_id, date) - (SUM(value) OVER (PARTITION BY type_id, site_id ORDER BY type_id, date) - value) AS sum
FROM your_table
ORDER BY type_id,
date

Get MAX value of a group for each month TSQL

I have a TSQL query which gives a max value of the trend data for the given result set. I would like to get this data based on each month's result. (I have a datetime column in the result set ). If I select data for three months it has to give this vaules for each month. Right now it looks for the max values and give the same result for all the months.
Below is the expression i use to get the trend result. It's part of select statement with number of other columns.
select col1, col2, sampledatecollected, (Select MAX(AvailMemSlope) FROM SlopeCTE) * MetricID + (Select MAX (AvailMemIntercept) FROM InterceptCTE) AS AvailMemTrend
I think i might need to do something like this but given the expression i'm little confused as to how to get the desired results
select name, max(value)
from tbl1
group by name
CPUTrend id the data i get from the expression i specified in the first query.
sample data:
Date AVGCPU MAXCPU CPUTrend
8/22/2016 20 40 44
8/23/2016 20 40 44
8/24/2016 20 40 44
8/25/2016 20 40 44
9/22/2016 20 50 44
9/23/2016 20 50 44
9/24/2016 20 50 44
Expected result:
Date AVGCPU MAXCPU CPUTrend
8/22/2016 20 40 32
8/23/2016 20 40 32
8/24/2016 20 40 32
8/25/2016 20 40 32
9/22/2016 20 50 44
9/23/2016 20 50 44
9/24/2016 20 50 44
Right now all i get is 44 as it's the maximum value.
I think you just want a subquery:
with t as (
select col1, col2, sampledatecollected,
(Select MAX(AvailMemSlope) FROM SlopeCTE) * MetricID +
(Select MAX (AvailMemIntercept) FROM InterceptCTE) AS AvailMemTrend
from ?? -- the SQL in your question is incomplete
)
select year(sampledatecollected), month(sampledatecollected),
max(AvailMemTrend)
from t
group by year(sampledatecollected), month(sampledatecollected)
order by year(sampledatecollected), month(sampledatecollected);