Get MAX count but keep the repeated calculated value if highest - sql

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please

Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo

You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.

Related

redshift cumulative count records via SQL

I've been struggling to find an answer for this question. I think this question is similar to what i'm looking for but when i tried this it didn't work.
Because there's no new unique user_id added between 02-20 and 02-27, the cumulative count will be the same. Then for 02-27, there is a unique user_id which hasn't appeared on any previous dates (6)
Here's my input
date user_id
2020-02-20 1
2020-02-20 2
2020-02-20 3
2020-02-20 4
2020-02-20 4
2020-02-20 5
2020-02-21 1
2020-02-22 2
2020-02-23 3
2020-02-24 4
2020-02-25 4
2020-02-27 6
Output table:
date daily_cumulative_count
2020-02-20 5
2020-02-21 5
2020-02-22 5
2020-02-23 5
2020-02-24 5
2020-02-25 5
2020-02-27 6
This is what i tried and the result is not quite what i want
select
stat_date,count(DISTINCT user_id),
sum(count(DISTINCT user_id)) over (order by stat_date rows unbounded preceding) as cumulative_signups
from data_engineer_interview
group by stat_date
order by stat_date
it returns this instead;
date,count,cumulative_sum
2022-02-20,5,5
2022-02-21,1,6
2022-02-22,1,7
2022-02-23,1,8
2022-02-24,1,9
2022-02-25,1,10
2022-02-27,1,11
The problem with this task is that it could be done by comparing each row uniquely with all previous rows to see if there is a match in user_id. Since you are using Redshift I'll assume that your data table could be very large so attacking the problem this way will bog down in some form of a loop join.
You want to think about the problem differently to avoid this looping issue. If you derive a dataset with id and first_date_of_id you can then just do a cumulative sum sorted by date. Like this
select user_id, min("date") as first_date,
count(user_id) over (order by first_date rows unbounded preceding) as date_out
from data_engineer_interview
group by user_id
order by date_out;
This is untested and won't produce the full list of dates that you have in your example output but rather only the dates where new ids show up. If this is an issue it is simple to add in the additional dates with no count change.
We can do this via a correlated subquery followed by aggregation:
WITH cte AS (
SELECT
date,
CASE WHEN EXISTS (
SELECT 1
FROM data_engineer_interview d2
WHERE d2.date < d1.date AND
d2.user_id = d1.user_id
) THEN 0 ELSE 1 END AS flag
FROM (SELECT DISTINCT date, user_id FROM data_engineer_interview) d1
)
SELECT date, SUM(flag) AS daily_cumulative_count
FROM cte
ORDER BY date;

How to filter out multiple downtime events in SQL Server?

There is a query I need to write that will filter out multiples of the same downtime event. These records get created at the exact same time with multiple different timestealrs which I don't need. Also, in the event of multiple timestealers for a downtime event I need to make the timestealer 'NULL' instead.
Example table:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
Machine 1
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
Machine 2
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
3
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
What I need the query to return:
Id
TimeStealer
Start
End
Is_Downtime
Downtime_Event
1
NULL
2022-01-01 01:00:00
2022-01-01 01:01:00
1
Malfunction
2
NULL
2022-01-01 00:01:00
2022-01-01 00:59:59
0
Operating
Seems like this is a top 1 row of each group, but with the added logic of making a column NULL when there are multiple rows. You can achieve that by also using a windowed COUNT, and then a CASE expression in the outer SELECT to only return the value of TimeStealer when there was 1 event:
WITH CTE AS(
SELECT V.Id,
V.TimeStealer,
V.Start,
V.[End],
V.Is_Downtime,
V.Downtime_Event,
ROW_NUMBER() OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event ORDER BY ID) AS RN,
COUNT(V.ID) OVER (PARTITION BY V.Start, V.[End], V.Is_Downtime,V.Downtime_Event) AS Events
FROM(VALUES('1','Machine 1',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('2','Machine 2',CONVERT(datetime2(0),'2022-01-01 01:00:00'),CONVERT(datetime2(0),'2022-01-01 01:01:00'),'1','Malfunction'),
('3','NULL',CONVERT(datetime2(0),'2022-01-01 00:01:00'),CONVERT(datetime2(0),'2022-01-01 00:59:59'),'0','Operating'))V(Id,TimeStealer,[Start],[End],Is_Downtime,Downtime_Event))
SELECT ROW_NUMBER() OVER (ORDER BY ID) AS ID,
CASE WHEN C.Events = 1 THEN C.TimeStealer END AS TimeStealer,
C.Start,
C.[End],
C.Is_Downtime,
C.Downtime_Event
FROM CTE C
WHERE C.RN = 1;

How to retrieve other columns when performing an aggregate function?

I've been trying to retrieve other columns from a table in which I'm performing an aggregate function to get the minimum number by date, this is an example of the data:
id resource date quality ask ask_volume
1 1 2020-06-08 10:50 0 6.9 5102
2 1 2020-06-08 10:50 1 6.8 2943
3 1 2020-06-08 10:50 2 6.9 25338
4 1 2020-06-08 10:50 3 7.0 69720
5 1 2020-06-08 10:50 4 7.0 9778
6 1 2020-06-08 10:50 5 7.0 297435
7 1 2020-06-08 10:40 0 6.6 611
8 1 2020-06-08 10:40 1 6.6 4331
9 1 2020-06-08 10:40 2 6.7 1000
10 1 2020-06-08 10:40 3 7.0 69720
11 1 2020-06-08 10:40 4 7.0 9778
12 1 2020-06-08 10:40 5 7.0 297435
...
This is the desired result I'm trying to get, so I can perform a weighted average on it:
date ask ask_volume
2020-06-08 10:50 6.8 2943
2020-06-08 10:40 6.6 4331
...
Though both quality 0 and quality 1 have the same ask, quality 1 shall be chosen because its ask_volume is higher.
I have tried the classic:
SELECT date, min(ask) FROM table GROUP BY date;
But adding ask_volume to the column list will force me to add it to the GROUP BY as well, messing up the result.
The problems are:
How can I get the corresponding ask_volume of the minimum ask displayed in the result?
And, if there are two records with the same ask value on the same date, how can I get ask_volume to show the one with the highest value?
I use PostgreSQL, but SQL from a different database will help me get the idea as well.
In standard SQL, you would use window functions:
select *
from (
select t.*, row_number() over(partition by date order by ask, ask_volume desc) rn
from mytable
) t
where rn = 1
In Postgres this is better suited for distinct on:
select distinct on (date) *
from mytable
order by ask, ask_volume desc
You can do what you want with distinct on:
select distinct on (date) t.*
from (select t.*,
order by date, ask, ask_volume desc;
I find your date column confusing. It has a time component, so the name is misleading.
Other answers are simpler and better, but here is an alternative to get around your aggregation problem. You could use a subquery to only include max ask_volume per date per ask before you get the min ask per date.
select date, min(ask), max(ask_volume)
from t
where (date, ask_volume) in (select date, max(ask_volume)
from t
group by date, ask)
group by date;
DISTINCT ON has already been suggested, but in imperfect ways. (The currently accepted answer is incorrect.) That's how you do it:
SELECT DISTINCT ON (date) *
FROM tbl
ORDER BY date, ask, ask_volume DESC NULLS LAST;
Most importantly, leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON. In other words for the simple case, date must be the first ORDER BY expression.
While null values have not been ruled out (with a NOT NULL constraint), you must add NULLS LAST or get null values first in descending order.
Detailed explanation:
Select first row in each GROUP BY group?

Query to find rows with nearest date in future

I'm trying to display a result set based on a min date value and today's date but can't seem to make it work. It's essentially a date sensitive price list.
Example Data
ID Title Value ExpireDate
1 Fred 10 2019-03-01
2 Barney 15 2019-03-01
3 Fred2 20 2019-06-01
4 Barney2 25 2019-06-01
5 Fred3 30 2019-07-01
6 Barney3 55 2019-07-01
Required Results:
Display records based on minimum date > GetDate()
3 Fred2 20 2019-06-01
4 Barney2 25 2019-06-01
Any assistance would be great - thank you.
Use where clause to filter all future rows and row_number() to find the first row per group:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY Title ORDER BY ExpireDate) AS rn
FROM t
WHERE ExpireDate >= CAST(CURRENT_TIMESTAMP AS DATE)
) AS x
WHERE rn = 1
Based on your revised question, you can simply do this:
SELECT TOP 1 WITH TIES *
FROM t
WHERE ExpireDate >= CAST(CURRENT_TIMESTAMP AS DATE)
ORDER BY ExpireDate

Getting a row with two group by constraints

I have a table
TIMESTAMP ID Name
5/30/2016 11:45 1 Ben
5/30/2016 11:45 2 Ben
5/30/2016 23:15 2 Ben
5/30/2016 7:30 1 Peter
5/30/2016 6:05 1 Peter
5/30/2016 14:40 2 May
5/30/2016 1:05 1 May
Now, I need to get the MIN timestamp for each distinct Name.
Then if there are more than one MIN entry, choose the one with the MAX ID.
So the result should be
TIMESTAMP ID Name
5/30/2016 11:45 2 Ben
5/30/2016 6:05 1 Peter
5/30/2016 1:05 1 May
I tried using the query below:
SELECT MIN(TIMESTAMP),NAME FROM TBLSAMPLE WHERE TIMESTAMP BETWEEN TO_DATE('5/30/2016', 'MM/DD/YYYY' ) AND TO_DATE('5/30/2016', 'MM/DD/YYYY' ) + 1
GROUP BY NAME
and I could get the minimum time. But once I add in MAX(ID) the result return an entry that does not match any of the rows.
Your help are really appreciated.
You can do this with row_number():
select t.*
from (select t.*,
row_number() over (partition by name order by timestamp asc, id desc) as seqnum
from tblsample t
) t
where seqnum = 1;
Your question doesn't specify a condition on the dates. But if you want to add a where clause, then add it to the subquery.