Is there a method to write a SQL query that returns cumulative results based on the count of another column? - sql

I have a query where I am counting the total number of new users signed up to a particular service each day since the service started.
So far I have:
SELECT DISTINCT CONVERT(DATE, Account_Created) AS Date_Created,
COUNT(ID) OVER (PARTITION BY CONVERT(DATE, Account_Created)) AS New_Users
FROM My_Table.dbo.NewAccts
ORDER BY Date_Created
This returns:
Date_Created | New_Users
--------------------------
2020-01-01 1
2020-01-03 3
2020-01-04 2
2020-01-06 5
2020-01-07 9
What I would like is to return a third column with a cumulative total for each day starting from the beginning until the present. So the first day there was only one new user. On January 3rd, three new users signed up for a total of four since the beginning--so on and so forth.
Date_Created | New_Users | Cumulative_Tot
------------------------------------------
2020-01-01 1 1
2020-01-03 3 4
2020-01-04 2 6
2020-01-06 5 11
2020-01-07 9 20
My thought process was to involve the ROW_NUMBER() function so that I can separate and add each consecutive row together, though I am not sure if that is correct. My feeling is that I am probably thinking about this too hard and the logic is simply just escaping me at the moment. Thank you for any help.

As a starter: I would recommend aggregation rather than DISTINCT and a window count. This makes the intent clearer, and is likely more efficient.
Then, you can make use of a window sum to compute the cumulative count.
SELECT
CONVERT(DATE, Account_Created) AS Date_Created,
COUNT(*) AS New_Users
SUM(COUNT(*)) OVER(ORDER BY CONVERT(DATE, Account_Created)) Cumulative_New_Users
FROM My_Table.dbo.NewAccts
GROUP BY CONVERT(DATE, Account_Created)
ORDER BY Date_Created

Related

Counting from monday to friday consecutively bigquery sql

I'm trying to find a result with two conditions in google bigquery.
Employees who worked from Monday through Friday consecutively will get an additional pay of 8 hour amount of wage.
Condition above is valid for workers who worked more than 15 hours (15 hrs <) per week.
id
date
hours
abc123
2020-01-05
12
abc123
2020-01-06
5
abc123
2020-01-07
14
abc123
2020-01-08
7
abc123
2020-01-09
6
abc123
2020-01-10
12
Thanks in advance.
Assuming you have one row per employee per day, then you can handle this by using window functions. The focus will be on Mondays, but the idea is to count the hours and days for a given row and the four days following.
So, to get the Mondays where a given id matches the conditions and is eligible for a bonus:
select id, date
from (select t.*,
count(*) over (partition by id
order by unix_date(date)
range between current row and 4 following
) as day_count,
sum(hours) over (partition by id
order by unix_date(date)
range between current row and 4 following
) as hours_count
from t
) t
where extract(dayofweek from date) = 2 and
day_count = 5 and
hours_count >= 15;

PostgreSQL: how do build a table with an ever-increasing daily total?

Assuming this data:
ID DATE
44 2019-12-31
45 2020-01-01
46 2020-01-01
47 2020-01-02
48 2020-01-03
48 2020-01-03
48 2020-01-03
How do I make a query that returns something like this?
TOTAL DATE
2 2020-01-01
3 2020-01-02
6 2020-01-03
I want all entries after a certain data, but with a counter that starts with the number of entries on the first day, then for every day, it adds the number of entries on that day. I want to plot them on a chart that shows the speed of the growth.
Is this possible? I'm on PostgreSQL 10.
Thanks!
You can use aggregation and window functions:
select date,
count(*) as count_on_date,
sum(count(*)) over (order by min(date)) as running_count
from t
where date >= '2020-01-01'
group by date
order by date;
If you wanted a count going back in time, then you would use:
select greatest(date, '2020-01-01'::date) as date
count(*) as count_on_date,
sum(count(*)) over (order by min(date)) as running_count
from t
group by greatest(date, '2020-01-01'::date)
order by greatest(date, '2020-01-01'::date);

How to retrieve other columns when performing an aggregate function?

I've been trying to retrieve other columns from a table in which I'm performing an aggregate function to get the minimum number by date, this is an example of the data:
id resource date quality ask ask_volume
1 1 2020-06-08 10:50 0 6.9 5102
2 1 2020-06-08 10:50 1 6.8 2943
3 1 2020-06-08 10:50 2 6.9 25338
4 1 2020-06-08 10:50 3 7.0 69720
5 1 2020-06-08 10:50 4 7.0 9778
6 1 2020-06-08 10:50 5 7.0 297435
7 1 2020-06-08 10:40 0 6.6 611
8 1 2020-06-08 10:40 1 6.6 4331
9 1 2020-06-08 10:40 2 6.7 1000
10 1 2020-06-08 10:40 3 7.0 69720
11 1 2020-06-08 10:40 4 7.0 9778
12 1 2020-06-08 10:40 5 7.0 297435
...
This is the desired result I'm trying to get, so I can perform a weighted average on it:
date ask ask_volume
2020-06-08 10:50 6.8 2943
2020-06-08 10:40 6.6 4331
...
Though both quality 0 and quality 1 have the same ask, quality 1 shall be chosen because its ask_volume is higher.
I have tried the classic:
SELECT date, min(ask) FROM table GROUP BY date;
But adding ask_volume to the column list will force me to add it to the GROUP BY as well, messing up the result.
The problems are:
How can I get the corresponding ask_volume of the minimum ask displayed in the result?
And, if there are two records with the same ask value on the same date, how can I get ask_volume to show the one with the highest value?
I use PostgreSQL, but SQL from a different database will help me get the idea as well.
In standard SQL, you would use window functions:
select *
from (
select t.*, row_number() over(partition by date order by ask, ask_volume desc) rn
from mytable
) t
where rn = 1
In Postgres this is better suited for distinct on:
select distinct on (date) *
from mytable
order by ask, ask_volume desc
You can do what you want with distinct on:
select distinct on (date) t.*
from (select t.*,
order by date, ask, ask_volume desc;
I find your date column confusing. It has a time component, so the name is misleading.
Other answers are simpler and better, but here is an alternative to get around your aggregation problem. You could use a subquery to only include max ask_volume per date per ask before you get the min ask per date.
select date, min(ask), max(ask_volume)
from t
where (date, ask_volume) in (select date, max(ask_volume)
from t
group by date, ask)
group by date;
DISTINCT ON has already been suggested, but in imperfect ways. (The currently accepted answer is incorrect.) That's how you do it:
SELECT DISTINCT ON (date) *
FROM tbl
ORDER BY date, ask, ask_volume DESC NULLS LAST;
Most importantly, leading expressions in ORDER BY must be in the set of expressions in DISTINCT ON. In other words for the simple case, date must be the first ORDER BY expression.
While null values have not been ruled out (with a NOT NULL constraint), you must add NULLS LAST or get null values first in descending order.
Detailed explanation:
Select first row in each GROUP BY group?

Sum of item count in an SQL query based on DATE_TRUNC

I've got a table which contains event status data, similar to this:
ID Time Status
------ -------------------------- ------
357920 2019-12-25 09:31:38.854764 1
362247 2020-01-02 09:31:42.498483 1
362248 2020-01-02 09:31:46.166916 1
362249 2020-01-02 09:31:47.430933 1
362300 2020-01-03 09:31:46.932333 1
362301 2020-01-03 09:31:47.231288 1
I'd like to construct a query which returns the number of successful events each day, so:
Time Count
-------------------------- -----
2019-12-25 00:00:00.000000 1
2020-01-02 00:00:00.000000 3
2020-01-03 00:00:00.000000 2
I've stumbled across this SO answer to a similar question, but the answer there is for all the data returned by the query, whereas I need the sum grouped by date range.
Also, I cannot use BETWEEN to select a specific date range, since this query is for a Grafana dashboard, and the date range is determined by the dashboard's UI. I'm using Postgres for the SQL dialect, in case that matters.
You need to remove the time from time component. In most databases, you can do this by converting to a date:
select cast(time as date) as dte,
sum(case when status = 1 then 1 else 0 end) as num_successful
from t
group by cast(time as date)
order by dte;
This assumes that 1 means "successful".
The cast() does not work in all databases. Other alternatives are things like trunc(time), date_trunc('day', time), date_trunc(time, day) -- and no doubt many others.
In Postgres, I would phrase this as:
select date_trunc('day', time) as dte,
count(*) filter (where status = 1) as num_successful
from t
group by dte
order by dte;
How about like this:
SELECT date(Time), sum(status)
FROM table
GROUP BY date(Time)
ORDER BY min(Time)

Get MAX count but keep the repeated calculated value if highest

I have the following table, I am using SQL Server 2008
BayNo FixDateTime FixType
1 04/05/2015 16:15:00 tyre change
1 12/05/2015 00:15:00 oil change
1 12/05/2015 08:15:00 engine tuning
1 04/05/2016 08:11:00 car tuning
2 13/05/2015 19:30:00 puncture
2 14/05/2015 08:00:00 light repair
2 15/05/2015 10:30:00 super op
2 20/05/2015 12:30:00 wiper change
2 12/05/2016 09:30:00 denting
2 12/05/2016 10:30:00 wiper repair
2 12/06/2016 10:30:00 exhaust repair
4 12/05/2016 05:30:00 stereo unlock
4 17/05/2016 15:05:00 door handle repair
on any given day need do find the highest number of fixes made on a given bay number, and if that calculated number is repeated then it should also appear in the resultset
so would like to see the result set as follows
BayNo FixDateTime noOfFixes
1 12/05/2015 00:15:00 2
2 12/05/2016 09:30:00 2
4 12/05/2016 05:30:00 1
4 17/05/2016 15:05:00 1
I manage to get the counts of each but struggling to get the max and keep the highest calculated repeated value. can someone help please
Use window functions.
Get the count for each day by bayno and also find the min fixdatetime for each day per bayno.
Then use dense_rank to compute the highest ranked row for each bayno based on the number of fixes.
Finally get the highest ranked rows.
select distinct bayno,minfixdatetime,no_of_fixes
from (
select bayno,minfixdatetime,no_of_fixes
,dense_rank() over(partition by bayno order by no_of_fixes desc) rnk
from (
select t.*,
count(*) over(partition by bayno,cast(fixdatetime as date)) no_of_fixes,
min(fixdatetime) over(partition by bayno,cast(fixdatetime as date)) minfixdatetime
from tablename t
) x
) y
where rnk = 1
Sample Demo
You are looking for rank() or dense_rank(). I would right the query like this:
select bayno, thedate, numFixes
from (select bayno, cast(fixdatetime) as date) as thedate,
count(*) as numFixes,
rank() over (partition by cast(fixdatetime as date) order by count(*) desc) as seqnum
from t
group by bayno, cast(fixdatetime as date)
) b
where seqnum = 1;
Note that this returns the date in question. The date does not have a time component.